Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
A novel approach to preeclampsia early prediction addressing predictive uncertainty due to missing data in clinical dataset
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 12 February 2026

A novel approach to preeclampsia early prediction addressing predictive uncertainty due to missing data in clinical dataset

  • Jin Woo Kim1 na1,
  • Nari Kim2 na1,
  • Ju Yeon Kim1,
  • Hye Ji Han1,
  • Su Ji Yang1,
  • You Jung Han3,
  • Hee Jin Park3,
  • Hye Yeon Boo4,
  • Dong Wook Kwak5,
  • Hyun Jung Lee2,
  • Sang Hee Jung2,
  • Eun Hee Ahn2,
  • Ji Hyae Lim1 &
  • …
  • Hyun Mee Ryu1,2 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational models
  • Risk factors

Abstract

Preeclampsia (PE) is a disease that seriously threatens the health of pregnant women, and early intervention significantly reduce its incidence in high-risk mothers. To identify PE high-risk mother with a certain level of confidence or higher, we aimed to develop a framework for early prediction of preeclampsia (PE), which provides risk scores along with its associated uncertainty score due to missing data in clinical datasets. We built a machine learning model using a multi-center retrospective clinical dataset of 31,235 singleton pregnancies. We assessed the contribution of each variable to prediction variability using Shapley Additive Explanation (SHAP) values in order to quantify uncertainty score resulting from missing data. The score for each sample was calculated by summing the contributions of missing variables. Predictive performance was evaluated using samples with uncertainty scores below specific thresholds, with validation conducted via internal validation and external validation on an independent cohort. Internal validation revealed a strong inverse correlation between uncertainty score thresholds and AUROC (Spearman correlation coefficient: -0.999). At the threshold of 0.11 of the minimum possible level, the AUROC reached 0.978, compared to 0.845 when uncertainty was not considered. In external validation, the AUROC reached 0.994 at the same threshold, compared to 0.693 when uncertainty was not considered. Our framework demonstrated high predictive performance in low-uncertainty samples, emphasizing its stability and effectiveness. This approach reduces the risk of overconfidence in high-uncertainty predictions and represents a more reliable method for PE prediction.

Similar content being viewed by others

Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data

Article Open access 06 June 2022

An early screening model for preeclampsia: utilizing zero-cost maternal predictors exclusively

Article Open access 07 February 2024

Construction of a pathway-level model for preeclampsia based on gene expression data

Article 24 June 2024

Data availability

The datasets generated and/or analysed during the current study are not publicly available due to data protection regulations but are available from the corresponding author on reasonable request.

References

  1. Jeyabalan, A. Epidemiology of preeclampsia: impact of obesity. Nutr. Rev. 71(Suppl 1). https://doi.org/10.1111/nure.12055 (2013). S18-25.

  2. Suksai, M. et al. Preeclampsia and timing of delivery: disease severity, maternal and perinatal outcomes. Pregnancy Hypertens. 37, 101151. https://doi.org/10.1016/j.preghy.2024.101151 (2024).

    Google Scholar 

  3. Sharma, K. J., Alyson, F. E. T., Caughey, A. B. & G., M., B. R. & and Pregnancies complicated by both preeclampsia and growth restriction between 34 and 37 weeks’ gestation are associated with adverse perinatal outcomes. J. Maternal-Fetal Neonatal Med. 30, 2342–2345. https://doi.org/10.1080/14767058.2016.1248394 (2017).

    Google Scholar 

  4. Rolnik, D. L. et al. Aspirin versus placebo in pregnancies at high risk for preterm preeclampsia. New. Engl. J. Med. 377, 613–622. https://doi.org/10.1056/NEJMoa1704559 (2017).

    Google Scholar 

  5. Ansbacher-Feldman, Z. et al. Machine-learning-based prediction of pre-eclampsia using first-trimester maternal characteristics and biomarkers. Ultrasound Obst Gyn. 60, 739–745. https://doi.org/10.1002/uog.26105 (2022).

    Google Scholar 

  6. Tarca, A. L. et al. Prediction of preeclampsia throughout gestation with maternal characteristics and biophysical and biochemical markers: a longitudinal study. Am. J. Obstet. Gynecol. 226 https://doi.org/10.1016/j.ajog.2021.01.020 (2022). 126.e121-126.e122.

  7. Marić, I. et al. Early prediction of preeclampsia via machine learning. Am. J. Obstet. Gynecol. MFM. 2, 100100. https://doi.org/10.1016/j.ajogmf.2020.100100 (2020).

    Google Scholar 

  8. Cohen, J. L. et al. Predictive value of combined serum biomarkers for adverse pregnancy outcomes. Eur. J. Obstet. Gynecol. Reproductive Biology. 181, 89–94 (2014).

    Google Scholar 

  9. Zhou, J., Zhao, X., Wang, Z. & Hu, Y. Combination of lipids and uric acid in mid-second trimester can be used to predict adverse pregnancy outcomes. J. Maternal-Fetal Neonatal Med. 25, 2633–2638. https://doi.org/10.3109/14767058.2012.704447 (2012).

    Google Scholar 

  10. Widmer, M. et al. Accuracy of angiogenic biomarkers at ⩽20weeks’ gestation in predicting the risk of pre-eclampsia: A WHO multicentre study. Pregnancy Hypertension: Int. J. Women’s Cardiovasc. Health. 5, 330–338. https://doi.org/10.1016/j.preghy.2015.09.004 (2015).

    Google Scholar 

  11. Kovacheva, V. P. et al. Preeclampsia prediction using machine learning and polygenic risk scores from clinical and genetic risk factors in early and late pregnancies. Hypertension 81, 264–272. https://doi.org/10.1161/HYPERTENSIONAHA.123.21053 (2024).

    Google Scholar 

  12. Gunderson, E. P. et al. Early pregnancy systolic blood pressure patterns predict Early- and Later‐Onset preeclampsia and gestational hypertension among ostensibly Low‐to‐Moderate risk groups. J. Am. Heart Association. 12, e029617. https://doi.org/10.1161/JAHA.123.029617 (2023).

    Google Scholar 

  13. Lee, S. M. et al. Development of early prediction model for pregnancy-associated hypertension with graph-based semi-supervised learning. Sci. Rep. 12, 15793. https://doi.org/10.1038/s41598-022-15391-4 (2022).

    Google Scholar 

  14. Sandström, A., Snowden, J. M., Bottai, M., Stephansson, O. & Wikström, A. K. Routinely collected antenatal data for longitudinal prediction of preeclampsia in nulliparous women: a population-based study. Sci. Rep. 11, 17973. https://doi.org/10.1038/s41598-021-97465-3 (2021).

    Google Scholar 

  15. Chen, Y. et al. Machine-learning predictive model of pregnancy-induced hypertension in the first trimester. Hypertens. Res. 46, 2135–2144. https://doi.org/10.1038/s41440-023-01298-8 (2023).

    Google Scholar 

  16. Sandström, A., Snowden, J. M., Höijer, J., Bottai, M. & Wikström, A. K. Clinical risk assessment in early pregnancy for preeclampsia in nulliparous women: A population based cohort study. PloS One. 14, e0225716 (2019).

    Google Scholar 

  17. Lahoti, P., Gummadi, K. & Weikum, G. Responsible model deployment via model-agnostic uncertainty learning. Mach. Learn. 112, 939–970. https://doi.org/10.1007/s10994-022-06248-y (2023).

    Google Scholar 

  18. Seoni, S. et al. Application of uncertainty quantification to artificial intelligence in healthcare: A review of last decade (2013–2023). Comput. Biol. Med. 165, 107441. https://doi.org/10.1016/j.compbiomed.2023.107441 (2023).

    Google Scholar 

  19. Martín Vicario, C. et al. Uncertainty-aware deep learning for trustworthy prediction of long-term outcome after endovascular thrombectomy. Sci. Rep. 14, 5544. https://doi.org/10.1038/s41598-024-55761-8 (2024).

    Google Scholar 

  20. Brown, M. A. et al. The hypertensive disorders of pregnancy: ISSHP classification, diagnosis & management recommendations for international practice. Pregnancy Hypertens. 13, 291–310. https://doi.org/10.1016/j.preghy.2018.05.004 (2018).

    Google Scholar 

  21. Cui, Y., Zhu, B. & Zheng, F. Low-dose aspirin at ≤ 16 weeks of gestation for preventing preeclampsia and its maternal and neonatal adverse outcomes: A systematic review and meta-analysis. Exp. Ther. Med. 15, 4361–4369. https://doi.org/10.3892/etm.2018.5972 (2018).

    Google Scholar 

  22. Skjærven, R., Wilcox, A. J. & Lie, R. T. The interval between pregnancies and the risk of preeclampsia. New. Engl. J. Med. 346, 33–38. https://doi.org/10.1056/NEJMoa011379 (2002).

    Google Scholar 

  23. Stekhoven, D. J. & Bühlmann, P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 112–118. https://doi.org/10.1093/bioinformatics/btr597 (2011).

    Google Scholar 

  24. Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv Neur in 30 (2017).

  25. Berisha, V. et al. Digital medicine and the curse of dimensionality. NPJ Digit. Med. 4, 153 (2021).

    Google Scholar 

Download references

Acknowledgements

This research was supported by grants through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant numbers: RS-2022-KH129953 and RS-2025-02213591) and by the National Institute of Health (NIH) research project (Project No. 2025-ER1103-01).

Author information

Author notes
  1. Jin Woo Kim and Nari Kim contributed equally to this work.

Authors and Affiliations

  1. Smart MEC Healthcare R&D Center, CHA Bundang Medical Center, Gyeonggi-do, Republic of Korea

    Jin Woo Kim, Ju Yeon Kim, Hye Ji Han, Su Ji Yang, Ji Hyae Lim & Hyun Mee Ryu

  2. Department of Obstetrics & Gynecology, CHA Bundang Medical Center, CHA University of Medicine, Gyeonggi-do, Republic of Korea

    Nari Kim, Hyun Jung Lee, Sang Hee Jung, Eun Hee Ahn & Hyun Mee Ryu

  3. Department of Obstetrics and Gynecology, CHA Gangnam Medical Center, CHA University School of Medicine, Seoul, Republic of Korea

    You Jung Han & Hee Jin Park

  4. Department of Obstetrics and Gynecology, CHA Ilsan Medical Center, CHA University School of Medicine, Gyeonggi-do, Republic of Korea

    Hye Yeon Boo

  5. Department of Obstetrics and Gynecology, Ajou University School of Medicine, Gyeonggi-do, Republic of Korea

    Dong Wook Kwak

Authors
  1. Jin Woo Kim
    View author publications

    Search author on:PubMed Google Scholar

  2. Nari Kim
    View author publications

    Search author on:PubMed Google Scholar

  3. Ju Yeon Kim
    View author publications

    Search author on:PubMed Google Scholar

  4. Hye Ji Han
    View author publications

    Search author on:PubMed Google Scholar

  5. Su Ji Yang
    View author publications

    Search author on:PubMed Google Scholar

  6. You Jung Han
    View author publications

    Search author on:PubMed Google Scholar

  7. Hee Jin Park
    View author publications

    Search author on:PubMed Google Scholar

  8. Hye Yeon Boo
    View author publications

    Search author on:PubMed Google Scholar

  9. Dong Wook Kwak
    View author publications

    Search author on:PubMed Google Scholar

  10. Hyun Jung Lee
    View author publications

    Search author on:PubMed Google Scholar

  11. Sang Hee Jung
    View author publications

    Search author on:PubMed Google Scholar

  12. Eun Hee Ahn
    View author publications

    Search author on:PubMed Google Scholar

  13. Ji Hyae Lim
    View author publications

    Search author on:PubMed Google Scholar

  14. Hyun Mee Ryu
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Study concept and design: J.W.K. and N.K.; Data acquisition: Y.J.H., H.J.P., H.Y.B., D.W.K., S.H.J. and E.H.A.; Data preprocessing: J.W.K, H.J.H., J.Y.K, and S.J.Y; Machine learning and Performance evaluation: J.W.K.; Analysis and Interpretation: N.K., J.H.L. and H.M.R.; Drafting of the manuscript: J.W.K and N.K; Revision of the manuscript: N.K., J.H.L. and H.M.R.; Study supervision: J.H.L. and H.M.R.

Corresponding authors

Correspondence to Ji Hyae Lim or Hyun Mee Ryu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, J.W., Kim, N., Kim, J.Y. et al. A novel approach to preeclampsia early prediction addressing predictive uncertainty due to missing data in clinical dataset. Sci Rep (2026). https://doi.org/10.1038/s41598-025-27801-4

Download citation

  • Received: 21 January 2025

  • Accepted: 05 November 2025

  • Published: 12 February 2026

  • DOI: https://doi.org/10.1038/s41598-025-27801-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Preeclampsia
  • Early prediction
  • Missing data
  • Predictive uncertainty
  • Clinical data
  • Large-scale retrospective cohort
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing