Abstract
Artificial Intelligence (AI), particularly ChatGPT-4, offers promising applications in medical education, including multiple-choice question (MCQ) development. This study aimed to evaluate and compare the quality of 36 MCQs created by medical faculty with their versions reviewed by ChatGPT-4. A cross-sectional, quantitative approach was used. Ten external health education specialists and four study authors (internal evaluators) assessed the questions based on 38 criteria. While external evaluators found no statistically significant difference in criteria met between versions (p = 0.325), the study authors, who underwent standardization meetings, identified a statistically significant increase in the number of criteria met by ChatGPT-4-reviewed MCQs (p < 0.001). Descriptive statistics, Wilcoxon Signed-Rank Test, and Non-Metric Multidimensional Scaling were employed. The results showed that ChatGPT-4 demonstrated proficiency in modifying questions to reflect greater structural clarity and adherence to basic item-writing principles, resulting in questions with increased clarity and objectivity. However, it struggled to incorporate clinical reasoning and higher-order thinking when these were lacking, particularly given the non-optimized prompt used. Despite these limitations, AI’s revisions were aligned with faculty quality standards, demonstrating its potential to complement faculty efforts, emphasizing the critical role of calibrated human expertise and effective prompt engineering, rather than replacement.
Similar content being viewed by others
Acknowledgements
The authors would like to thank the professor Bruno Caramelli, PhD., from Unit of Interdisciplinary Medicine in Cardiology (InCor-FMUSP), for reviewing the English version of the article and for the suggestions provided.
Funding
This work was conducted without external funding. No grants, contracts, or other forms of financial support were received from government agencies, private foundations, or commercial entities for this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Iembo, T., Cristóvão, H.L.G., Gonçalves, P.C.Z. et al. ChatGPT as a tool for reviewing multiple-choice questions in the health sector. Sci Rep (2026). https://doi.org/10.1038/s41598-026-51988-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-51988-9


