Table 5 Evaluation of other responses.
DB GPT | PubMed GPT | GPT-4o | E.P | Odds ratio (95%CI)ª | ||
|---|---|---|---|---|---|---|
Scientific consensus, number (%) | Yes | 16 (64.0) | 16 (64.0) | 25 (100) | 21 (84.0) | DB: 0.31 (0.06–1.73) PM: 0.31 (0.06–1.73 4o: not estimable |
No | 9 (36.0) | 9 (36.0) | 0 (0) | 4 (16.0) | ||
Extent of possible harm, number (%) | No/mild | 19 (76.0) | 21 (84.0) | 24 (100) | 21 (84.0) | DBb: 0.60 (0.11–3.42) PMb: 1.00 (0.16–6.43) 4ob: not estimable |
Moderate | 5 (20.0) | 3 (12.0) | 0 (0) | 3 (12.0) | ||
Severe/ fatal | 1 (4.0) | 1 (4.0) | 0 (0) | 1 (4.0) | ||
Evidence of incorrect comprehension, number (%) | Yes | 9 (36.0) | 10 (40.0) | 1 (4.0) | 6 (24.0) | DB: 0.48 (0.09–2.63) PM: 0.39 (0.07–2.10) 4o: 9.27 (0.57–151.30) |
No | 16 (64.0) | 15 (60.0) | 24 (96.0) | 19 (76.0) | ||
Evidence of incorrect retrieval, number (%) | Yes | 7 (28.0) | 8 (32.0) | 0 (0) | 6 (24.0) | DB: 0.80 (0.17–3.91) PM: 0.66 (0.43–3.13) 4o: not estimable |
No | 18 (72.0) | 17 (68.0) | 25 (100) | 19 (76.0) | ||
Evidence of incorrect reasoning, number (%) | Yes | 7 (28.0) | 7 (28.0) | 1 (4.0) | 3 (12.0) | DB: 0.30 (0.04–2.09) PM: 0.30 (0.04–2.09) 4o: 3.50 (0.19–64.35) |
No | 18 (72.0) | 18 (72.0) | 24 (96.0) | 22 (88.0) | ||
Inappropriate content, number (%) | Yes | 5 (20.0) | 6 (24.0) | 1 (4.0) | 4 (16.0) | DB: 0.67 (0.08–5.86) PM: 0.45 (0.05–3.96) 4o: 6.36 (0.31–129.57) |
No | 18 (76.7) | 21 (70.0) | 24 (80.0) | 28 (93.3) | ||
Incorrect content, number (%) | Yes | 7 (28.0) | 6 (24.0) | 1 (4.0) | 7 (28.0) | DB: 1.00 (0.18–5.56) PM: 1.30 (0.23–7.41) 4o: 11.97 (0.74–192.90) |
No | 18 (72.0) | 19 (72.0) | 24 (96.0) | 18 (72.0) | ||
Specialized content, number (%) | Yes | 8 (32.0) | 12 (48.0) | 21 (84.0) | 12 (48.0) | DB: 0.40 (0.08–2.11) PM: 1.00 (0.21–4.75) 4o: 8.30 (1.33–51.71) |
No | 17 (68.0) | 13 (52.0) | 4 (16.0) | 13 (52.0) | ||
Bedside manner (empathy to the user), mean (SD) | 3.76 (1.23) | 3.68 (1.41) | 1.48 (0.65) | 3.12 (1.30) | DBc: 0.39 (0.09–1.77) PMc: 0.49 (0.11–2.11) 4oc: 15.41 (2.00–118.94) | |
Helpfulness of the answer, mean (SD) | 3.12 (0.88) | 2.96 (1.02) | 1.20 (0.50) | 2.72 (0.84) | DBd: 0.40 (0.09–1.78) PMd: 0.72 (0.18–2.90) 4od: 30.54 (2.17–430.35) |