Table 3 Summary of DISCERN, GAS and Readability results of ChatGPT-4 responses after consensus score.

From: Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy

Reliability, quality and readability

n = 19

Reliability

 mDISCERN score (mean ± SD)

30.26 ± 3.14

Quality

 GQS score [median (min–max)]

4 (2–4)

Readability indexes

 FRE [median (min–max)]

32.20 (13.00–37.10)

 FKGL (Mean ± SD)

13.94 ± 1.55

 GFI (Mean ± SD)

16.81 ± 1.69

 CLI (Mean ± SD)

14.89 ± 1.44

 SMOG (Mean ± SD)

12.31 ± 1.18

Reading level

 Fairly easy to read n (%)

0 (0%)

 Standart/avarage n (%)

0 (0%)

 Fairly difficult to read n (%)

0 (0%)

 Difficult to read n (%)

12 (63.1%)

 Very difficult to read n (%)

7 (36.9%)

Readers age

 8–9 years old (fourth and fifth graders) n (%)

0 (0%)

 10–11 years old (fifth and sixth graders) n (%)

0 (0%)

 11–13 years old (sixth and seventh graders) n (%)

0 (0%)

 12–14 years old (seventh and eighth graders) n (%)

0 (0%)

 13–15 years old (eighth and ninth graders) n (%)

0 (0%)

 14–15 years old (ninth and tenth graders) n (%)

1 (5.3%)

 15–17 years old (tenth and eleventh graders) n (%)

1 (5.3%)

 17–18 years old (twelfth graders) n (%)

1 (5.3%)

 18–19 years old (college level entry) n (%)

2 (10.5)

 21–22 years old (college level) n (%)

9 (47.3%)

 College graduate n (%)

5 (26.3%)

  1. mDISCERN Modified DISCERN, GQS Global quality score, FRE Flesch Reading Ease Score, FKGL Flesch-Kincaid grade level, SMOG Simple Measure of Gobbledygook, GFI Gunning FOG Index, CLI Coleman-Liau Index. Normally distributed data are presented as mean ± SD, and non-normally distributed data are presented as median (min–max) in the table. Categorical variables were expressed as n (%).