Abstract
Systematic reviews provide the highest level of evidence but remain resource-intensive. We evaluated the performance of a large language model (LLM; ChatGPT, OpenAI) in a PRISMA-guided review of randomized controlled trials on vaginal vault prolapse surgery. Prompts were carefully designed to minimize errors, and outputs were verified. Each task was completed within minutes. For title/abstract screening, recall was 69.8% and precision 85.7% (κ = 0.77); full-text agreement 94.1–100% (κ = 0.82–1); data extraction accuracy 87.5–99.7%. From 18 RCTs (1668 women), sacrocolpopexy (SC) showed higher anatomic success than sacrospinous fixation (SSF) (OR 1.42, 95% CI 0.71–2.84). Transvaginal mesh improved 3-year objective success compared with SSF (OR 1.84, 95% CI 1.13–2.99) but had higher reoperation rates (5–16% vs 2–4%) than SC. We did not find conclusive evidence that any single technique is superior; most comparisons were underpowered, with wide confidence intervals and substantial heterogeneity. All LLM-derived statistical results were identical to those from conventional R analyses, confirming robustness. Validated LLM workflows can enable more efficient and scalable evidence synthesis.
Similar content being viewed by others
Data availability
All data analyzed in this study were derived from published randomized controlled trials included in the systematic review and meta-analysis. No new raw patient-level data were generated. The datasets supporting screening decisions, extracted variables, and ChatGPT-assisted workflow outputs are available from the corresponding author upon reasonable request.
References
Wang, B. et al. Global burden and trends of pelvic organ prolapse associated with aging women: An observational trend study from 1990 to 2019. Front. Pub. Health 10 - 2022, https://doi.org/10.3389/fpubh.2022.975829 (2022).
Nüssler, E., Granåsen, G., Bixo, M. & Löfgren, M. Long-term outcome after routine surgery for pelvic organ prolapse—A national register-based cohort study. Int. Urogynecology J. 33, 1863–1873 (2022).
Løwenstein, E., Møller, L. A., Laigaard, J. & Gimbel, H. Reoperation for pelvic organ prolapse: a Danish cohort study with 15-20 years’ follow-up. Int Urogynecol J. 29, 119–124 (2018).
DeLancey, J. O. What’s new in the functional anatomy of pelvic organ prolapse?. Curr. Opin. Obstet. Gynecol. 28, 420–429 (2016).
Trutnovsky, G., Robledo, K. P., Shek, K. L. & Dietz, H. P. Definition of apical descent in women with and without previous hysterectomy: A retrospective analysis. PLoS One 14, e0213617 (2019).
Brunes, M. et al. Vaginal vault prolapse and recurrent surgery: A nationwide observational cohort study. Acta Obstet. Gynecol. Scand. 101, 542–549 (2022).
Woodruff, A. J., Roth, C. C. & Winters, J. C. Abdominal sacral colpopexy: Surgical pearls and outcomes. Curr. Urol. Rep. 8, 399–404 (2007).
Chaliha, C. & Khullar, V. Management of vaginal prolapse. Women’s. Health (Lond.) 2, 279–287 (2006).
Alshami, A., Elsayed, M., Ali, E., Eltoukhy, A. E. E. & Zayed, T. Harnessing the Power of ChatGPT for Automating Systematic Review Process: Methodology, Case Study, Limitations, and Future Directions. Systems 11, 351 (2023).
Sarkis-Onofre, R., Catalá-López, F., Aromataris, E. & Lockwood, C. How to properly use the PRISMA Statement. Syst. Rev. 10, 117 (2021).
Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7, e012545 (2017).
Luo, X. et al. Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses. J. Med. Internet Res. 26, e56780 (2024).
Maher, C. F. et al. Abdominal sacral colpopexy or vaginal sacrospinous colpopexy for vaginal vault prolapse: a prospective randomized study. Am. J. Obstet. Gynecol. 190, 20–26 (2004).
Culligan, P. J. et al. A randomized controlled trial comparing fascia lata and synthetic mesh for sacral colpopexy. Obstet. Gynecol. 106, 29–37 (2005).
Maher, C. F. et al. Laparoscopic sacral colpopexy versus total vaginal mesh for vaginal vault prolapse: a randomized trial. Am. J. Obstet. Gynecol. 204, 360.e361–367 (2011).
Paraiso, M. F. R., Jelovsek, J. E., Frick, A., Chen, C. C. G. & Barber, M. D. Laparoscopic compared with robotic sacrocolpopexy for vaginal prolapse: a randomized controlled trial. Obstet. Gynecol. 118, 1005–1013 (2011).
Tate, S. B., Blackwell, L., Lorenz, D. J., Steptoe, M. M. & Culligan, P. J. Randomized trial of fascia lata and polypropylene mesh for abdominal sacrocolpopexy: 5-year follow-up. Int Urogynecol J. 22, 137–143 (2011).
Halaska, M. et al. A multicenter, randomized, prospective, controlled study comparing sacrospinous fixation and transvaginal mesh in the treatment of posthysterectomy vaginal vault prolapse. Am. J. Obstet. Gynecol. 207, 301.e301–301.e307 (2012).
Freeman, R. M. et al. A randomised controlled trial of abdominal versus laparoscopic sacrocolpopexy for the treatment of post-hysterectomy vaginal vault prolapse: LAS study. Int. Urogynecology J. Pelvic Floor Dysfunct. 24, 377–384 (2013).
Svabik, K., Martan, A., Masata, J., El-Haddad, R. & Hubka, P. Comparison of vaginal mesh repair with sacrospinous vaginal colpopexy in the management of vaginal vault prolapse after hysterectomy in patients with levator ani avulsion: a randomized controlled trial. Ultrasound Obstet. Gynecol. : Off. J. Int. Soc. Ultrasound Obstet. Gynecol. 43, 365–371 (2014).
Coolen, A. W. M. et al. Laparoscopic sacrocolpopexy compared with open abdominal sacrocolpopexy for vault prolapse repair: a randomised controlled trial. Int Urogynecol J. 28, 1469–1479 (2017).
Ow, L. L. et al. RCT of vaginal extraperitoneal uterosacral ligament suspension (VEULS) with anterior mesh versus sacrocolpopexy: 4-year outcome. Int Urogynecol J. 29, 1607–1614 (2018).
Ferrando, C. A. & Paraiso, M. F. R. A Prospective Randomized Trial Comparing Restorelle Y Mesh and Flat Mesh for Laparoscopic and Robotic-Assisted Laparoscopic Sacrocolpopexy. Female Pelvic Med. Reconstructive Surg. 25, 83–87 (2019).
Galad, J., Papcun, P., Dudic, R. & Urdzik, P. Single‑incision mesh vs sacrospinous ligament fixation in posthysterectomy women at a three-year follow-up: a randomized trial. Bratisl. lekarske listy 121, 640–647 (2020).
Hemming, C. et al. Surgical interventions for uterine prolapse and for vault prolapse: The two VUE RCTs. Health Technol. Assess. 24, 1–219 (2020).
Ferrando, C. A. & Paraiso, M. F. R. A prospective randomized trial comparing Restorelle® Y mesh and flat mesh for laparoscopic and robotic-assisted laparoscopic sacrocolpopexy: 24-month outcomes. Int Urogynecol J. 32, 1565–1570 (2021).
van Oudheusden, A. M. J. et al. Laparoscopic sacrocolpopexy versus abdominal sacrocolpopexy for vaginal vault prolapse: long-term follow-up of a randomized controlled trial. Int Urogynecol J. 34, 93–104 (2023).
van Oudheusden, A. M. J. et al. Laparoscopic sacrocolpopexy versus vaginal sacrospinous fixation for vaginal vault prolapse: a randomised controlled trial and prospective cohort (SALTO-2 trial). Bjog 130, 1542–1551 (2023).
Menefee, S. A. et al. Apical Suspension Repair for Vaginal Vault Prolapse A Randomized Clinical Trial. JAMA Surg. 159, 845–855 (2024).
Andy, U. U. et al. Body Image and sexual function improve following prolapse repair. American J. Obstetrics Gynecology https://doi.org/10.1016/j.ajog.2025.01.042 (2025).
Choong, M. K., Galgani, F., Dunn, A. G. & Tsafnat, G. Automatic evidence retrieval for systematic reviews. J. Med Internet Res 16, e223 (2014).
Kim, J. K., Chua, M. E., Li, T. G., Rickard, M. & Lorenzo, A. J. Novel AI applications in systematic review: GPT-4 assisted data extraction, analysis, review of bias. BMJ Evidence-Based Med., bmjebm-2024-113066, https://doi.org/10.1136/bmjebm-2024-113066 (2025).
Reason, T. et al. Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models. PharmacoEconomics - Open 8, 205–220 (2024).
Chelli, M. et al. Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. J. Med. Internet Res. 26, e53164 (2024).
Myers, S. et al. Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies. J. Am. Med. Inf. Assoc. 32, 357–364 (2025).
Asgari, E. et al. A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. NPJ Digit Med. 8, 274 (2025).
Minozzi, S., Cinquini, M., Gianola, S., Gonzalez-Lorenzo, M. & Banzi, R. The revised Cochrane risk of bias tool for randomized trials (RoB 2) showed low interrater reliability and challenges in its application. J. Clin. Epidemiol. 126, 37–44 (2020).
Lai, H. et al. Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models. JAMA Netw. Open 7, e2412687–e2412687 (2024).
Haltaufderheide, J. & Ranisch, R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). npj Digital Med. 7, 183 (2024).
Hu, D., Guo, Y., Zhou, Y., Flores, L. & Zheng, K. A systematic review of early evidence on generative AI for drafting responses to patient messages. npj Health Syst. 2, 27 (2025).
Mitchell, E., Are, E. B., Colijn, C. & Earn, D. J. D. Using artificial intelligence tools to automate data extraction for living evidence syntheses. PLOS ONE 20, e0320151 (2025).
Cosson, M. et al. A study of pelvic ligament strength. Eur. J. Obstet. Gynecol. Reprod. Biol. 109, 80–87 (2003).
Torosis, M. & Ackerman, A. L. Patient-reported outcomes in pelvic organ prolapse repair: the missing information needed to inform our understanding of what matters most to patients. Gynecology Pelvic Med. 7, 1–3 (2024).
Dyer, O. Transvaginal mesh: FDA orders remaining products off US market. Bmj 365, l1839 (2019).
Coolen, A.-L. W. M. et al. The treatment of post-hysterectomy vaginal vault prolapse: a systematic review and meta-analysis. Int. Urogynecology J. 28, 1767–1783 (2017).
Dindo, D., Demartines, N. & Clavien, P. A. Classification of surgical complications: a new proposal with evaluation in a cohort of 6336 patients and results of a survey. Ann. Surg. 240, 205–213 (2004).
Zeng, G. On the confusion matrix in credit scoring and its analytical properties. Commun. Stat. - Theory Methods 49, 2080–2093 (2020).
Li, M., Gao, Q. & Yu, T. Kappa statistic considerations in evaluating inter-rater reliability between two raters: which, when and context matters. BMC Cancer 23, 799 (2023).
Issaiy, M. et al. Methodological insights into ChatGPT’s screening performance in systematic reviews. BMC Med. Res. Methodol. 24, 78 (2024).
Wohlin, C. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering Article 38 (Association for Computing Machinery, London, England, United Kingdom, 2014).
Lai, H. et al. Language models for data extraction and risk of bias assessment in complementary medicine. npj Digital Med. 8, 74 (2025).
Sterne, J. A. C. et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. Bmj 366, l4898 (2019).
Acknowledgements
The authors acknowledge the Yonsei University Medical Library for support with the literature search. ChatGPT (OpenAI, San Francisco, CA, USA) was used to assist with English language refinement, under the authors’ supervision.
Author information
Authors and Affiliations
Contributions
Y.P. - Conceptualization, literature search, study selection, data extraction, quality assessment, preparation of figures and tables, drafting of the manuscript H.Z. - Statistical analysis, data synthesis, preparation of figures and tables, methodological consultation S.W.B. - Literature search, data extraction, validation, interpretation of findings, Project administration.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Park, Y., Zhang, HS. & Bai, S.W. Large language models in systematic review and meta-analysis of surgical treatments for vaginal vault prolapse. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02431-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-026-02431-w


