Table 1 ChatGPT performance accuracy across the four domains of the MRCOG part one examination

From: Exploring the capabilities of ChatGPT in women’s health: obstetrics and gynaecology

Domain

Correct

Incorrect

Total

Cell Function

203 (72.8%)

76 (27.2%)

279

Human Structure

135 (69.9%)

58 (30.1%)

193

Illness

148 (80.0%)

37 (20.0%)

185

Measurement and Manipulation

117 (65.7%)

61 (34.3%)

178

Total

603 (72.2%)

232 (27.8%)

835

  1. The overall accuracy was 72.2% (95% CI 69.2–75.3). There was a significant difference in the accuracy of ChatGPT across the four domains (p = 0.02, Chi-squared statistic = 9.85). ChatGPT performed best in the “Illness” domain with an accuracy of 80.0% (95% CI 73.3–85.7) and worst in the “Measurement and Manipulation” domain with an accuracy of 65.7% (95% CI 58.8–72.7). Values in brackets denote the percentage proportion (%).