Multicenter validation and updating of the ELDER-ICU model for severity assessment in elderly critical illness

Duan, Minjie; Liu, Xiaoli; Yeung, Wesley; Hao, Sicheng; Raffa, Jesse; Hu, Pan; Liu, Chao; Chen, Lin; Li, Zitao; Gao, Zhenyue; Li, Tao; Cao, Desen; Zhou, Feihu; Zhang, Zhongheng; Zhang, Zhengbo; Celi, Leo Anthony

doi:10.1038/s41746-026-02472-1

Download PDF

Article
Open access
Published: 09 April 2026

Multicenter validation and updating of the ELDER-ICU model for severity assessment in elderly critical illness

Minjie Duan^1,2^na1,
Xiaoli Liu¹^na1,
Wesley Yeung^3,4^na1,
Sicheng Hao⁴^na1,
Jesse Raffa⁴,
Pan Hu⁵,
Chao Liu⁶,
Lin Chen⁷,
Zitao Li⁸,
Zhenyue Gao¹,
Tao Li¹,
Desen Cao⁹^na2,
Feihu Zhou⁶^na2,
Zhongheng Zhang^10,11,12,13^na2,
Zhengbo Zhang¹^na2 &
…
Leo Anthony Celi^4,14,15

npj Digital Medicine , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The ELDER-ICU model, a machine learning tool for predicting in-hospital mortality in critically ill older adults ( ≥ 65 years), was externally validated across 12 international centers in the US, Austria, South Korea, and China, where we assessed three model updating strategies: recalibration, incremental training, and retraining. While maintaining robust performance in US and Austrian cohorts (AUROC 0.804–0.864), significant drops occurred in Asian sites (South Korea: 0.753; China: 0.698). Incremental training enhanced performance in most centers, while retraining significantly improved AUROC by 0.066 and 0.076 in the two Asian sites (South Korea and China, respectively). Isotonic regression and Platt scaling improved calibration performance globally. This study demonstrates the varying robustness of the ELDER-ICU model and the differential effectiveness of model updating strategies across temporal shifts, populations, and clinical practice environments. Rigorous validation and proactive model adaptation are essential before clinical deployment in settings with heterogeneous populations and clinical practice.

Development of mortality prediction model in the elderly hospitalized AKI patients

Article Open access 26 July 2021

Mortality prediction of patients in intensive care units using machine learning algorithms based on electronic health records

Article Open access 03 May 2022

GRACE-ICU: A multimodal nomogram-based approach for illness severity assessment of older adults in the ICU

Article Open access 13 August 2025

Data availability

The MIMIC and INSPIRE datasets are publicly available under credentialed access on PhysioNet (https://physionet.org/content/mimiciv/, https://physionet.org/content/inspire/1.3/). The SICdb datasets are available for research use following submission of a data access request via the PhysioNet (https://physionet.org/content/sicdb/1.0.8/). The Chinese critical care database is available from the National Genomic Data Center database under accession number PRJCA006118. The eICU-CRD-II dataset is not yet publicly available; please contact the corresponding author for access.

Code availability

Code is available at https://github.com/dmj163/ELDER-ICU-Ext-Val.

References

Ho, L. et al. Performance of models for predicting 1-year to 3-year mortality in older adults: a systematic review of externally validated models. Lancet Healthy Longev. 5, e227–e235 (2024).
Google Scholar
Wang, S. M. et al. Association of multimorbidity patterns and order of physical frailty and cognitive impairment occurrence: a prospective cohort study. Age and Ageing 54, 10 (2025).
Google Scholar
Guidet, B. et al. The trajectory of very old critically ill patients. Intensive Care Med. 50, 181–194 (2024).
Google Scholar
Liu, X. et al. Illness severity assessment of older adults in critical illness using machine learning (ELDER-ICU): an international multicentre study with subgroup bias evaluation. Lancet Digit Health 5, e657–e667 (2023).
Google Scholar
Efthimiou, O. et al. Developing clinical prediction models: a step-by-step guide. Br. Med. J. 386, e078276 (2024).
Google Scholar
Subbaswamy, A. et al. A data-driven framework for identifying patient subgroups on which an AI/machine learning model may underperform. NPJ Digit. Med. 7, 334 (2024).
Google Scholar
Davis, S. E., Dorn, C., Park, D. J. & Matheny, M. E. Emerging algorithmic bias: fairness drift as the next dimension of model maintenance and sustainability. J. Am. Med Inf. Assoc. 32, 845–854 (2025).
Google Scholar
Windecker, D. et al. Generalizability of FDA-approved AI-enabled medical devices for clinical use. JAMA Netw. Open 8, e258052 (2025).
Google Scholar
Lee, C. S. & Lee, A. Y. Clinical applications of continual learning machine learning. Lancet Digit. Health 2, e279–e281 (2020).
Google Scholar
Finlayson, S. G. et al. The clinician and dataset shift in artificial intelligence. N. Engl. J. Med. 385, 283–286 (2021).
Google Scholar
Otokiti, A. U. et al. The need to prioritize model-updating processes in clinical artificial intelligence (AI) models: protocol for a scoping review. JMIR Res. Protoc. 12, e37685 (2023).
Google Scholar
Liou, L. et al. Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system. NPJ Digit Med. 7, 149 (2024).
Google Scholar
de Hond, A. A. H. et al. Predicting readmission or death after discharge from the ICU: external validation and retraining of a machine learning model. Crit. Care Med. 51, 291–300 (2023).
Google Scholar
Barak-Corren, Y. et al. Prediction across healthcare settings: a case study in predicting emergency department disposition. NPJ Digit. Med. 4, 169 (2021).
Google Scholar
Futoma, J., Simons, M., Panch, T., Doshi-Velez, F. & Celi, L. A. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit. Health 2, e489–e492 (2020).
Google Scholar
Reddy, S. et al. Evaluation framework to guide implementation of AI systems into healthcare settings. BMJ Health & Care Informatics 28, e100444 (2021).
Google Scholar
van Amsterdam, W. A. C., van Geloven, N., Krijthe, J. H., Ranganath, R. & Cinà, G. When accurate prediction models yield harmful self-fulfilling prophecies. Patterns 6, 101229 (2025).
Google Scholar
Leuchter, R. K., Turner, W. B. & Ouyang, D. Evaluating translational AI: a two-way moving target problem. NEJM AI 2, AIp2500705 (2025).
Google Scholar
Janse, R. J. et al. When impact trials are not feasible: alternatives to study the impact of prediction models on clinical practice. Nephrol. Dial. Transpl. 40, 27–33 (2024).
Google Scholar
Collins, G. S. et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Br. Med. J. 385, e078378 (2024).
Google Scholar
Moons, K. G. M. et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. Br. Med. J. 388, e082505 (2025).
Google Scholar
Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).
Google Scholar
Pollard, T. J. et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data 5, 180178 (2018).
Google Scholar
Rodemund, N., Wernly, B., Jung, C., Cozowicz, C. & Koköfer, A. Harnessing big data in critical care: exploring a new European Dataset. Sci. Data 11, 320 (2024).
Google Scholar
Lim, L. et al. INSPIRE, a publicly available research dataset for perioperative medicine. Sci. Data 11, 655 (2024).
Google Scholar
Jin, S. et al. Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center. Sci. Data 10, 49 (2023).
Google Scholar
van der Laan, L., Ulloa-Pérez, E., Carone, M. & Luedtke, A. Causal isotonic calibration for heterogeneous treatment effects. Proc. Mach. Learn Res. 202, 34831–34854 (2023).
Google Scholar
Jiang, X., Osl, M., Kim, J. & Ohno-Machado, L. Smooth isotonic regression: a new method to calibrate predictive models. AMIA Jt Summits Transl. Sci. Proc. 2011, 16–20 (2011).
Google Scholar

Download references

Acknowledgements

The study was supported by the Beijing Natural Science Foundation (7252298), National Natural Science Foundation of China (82502525 and 62571550), Beijing Municipal Science and Technology Project (Z241100007724003), Project of Drug Clinical Evaluate Research of Chinese Pharmaceutical Association (NO.CPA-Z06-ZC-2021-004), a collaborative scientific project co-established by the Science and Technology Department of the National Administration of Traditional Chinese Medicine and the Zhejiang Provincial Administration of Traditional Chinese Medicine (GZY-ZJ-KJ-24082), Project of Zhejiang University Longquan Innovation Center (ZJDXLQCXZCJBGS2024016), National Institute of Health (R01 EB017205), DS-I Africa U54 TW012043-01 and Bridge2AI OT2OD032701, and National Science Foundation (ITEST #2148451).

Author information

These authors contributed equally: Minjie Duan, Xiaoli Liu, Wesley Yeung, Sicheng Hao.
These authors jointly supervised this work: Desen Cao, Feihu Zhou, Zhongheng Zhang, Zhengbo Zhang.

Authors and Affiliations

Medical Innovation Research Department, Chinese PLA General Hospital, 100853, Beijing, China
Minjie Duan, Xiaoli Liu, Zhenyue Gao, Tao Li & Zhengbo Zhang
Chinese PLA Medical School, 100853, Beijing, China
Minjie Duan
Department of Cardiology, National University Heart Centre, 119074, Singapore, Singapore
Wesley Yeung
Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 02139, Cambridge, MA, USA
Wesley Yeung, Sicheng Hao, Jesse Raffa & Leo Anthony Celi
Department of Anesthesiology, The 920 Hospital of Joint Logistic Support Force of Chinese PLA, 650100, Kunming, Yunnan, China
Pan Hu
Department of Critical Care Medicine, The First Medical Center, Chinese PLA General Hospital, 100853, Beijing, China
Chao Liu & Feihu Zhou
Department of Neurosurgery Intensive Care Unit, Department of Neurosurgery, Affiliated Jinhua Hospital, Zhejiang University School of Medicine, 365, Renmin East Road, Jinhua, Zhejiang Province, China
Lin Chen
School of Biomedical Engineering, ShanghaiTech University, 201210, Shanghai, China
Zitao Li
Department of Biomedical Engineering, Chinese PLA General Hospital, 100853, Beijing, China
Desen Cao
Department of Emergency Medicine, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, 310016, Hangzhou, China
Zhongheng Zhang
Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, 310016, Hangzhou, China
Zhongheng Zhang
School of Medicine, Shaoxing University, 312000, Shaoxing, China
Zhongheng Zhang
Longquan Industrial Innovation Research Institute, 323000, Lishui, China
Zhongheng Zhang
Department of Medicine, Beth Israel Deaconess Medical Center, 02215, Boston, MA, USA
Leo Anthony Celi
Department of Biostatistics, Harvard T.H. Chan School of Public Health, 02115, Boston, MA, USA
Leo Anthony Celi

Authors

Minjie Duan
View author publications
Search author on:PubMed Google Scholar
Xiaoli Liu
View author publications
Search author on:PubMed Google Scholar
Wesley Yeung
View author publications
Search author on:PubMed Google Scholar
Sicheng Hao
View author publications
Search author on:PubMed Google Scholar
Jesse Raffa
View author publications
Search author on:PubMed Google Scholar
Pan Hu
View author publications
Search author on:PubMed Google Scholar
Chao Liu
View author publications
Search author on:PubMed Google Scholar
Lin Chen
View author publications
Search author on:PubMed Google Scholar
Zitao Li
View author publications
Search author on:PubMed Google Scholar
Zhenyue Gao
View author publications
Search author on:PubMed Google Scholar
Tao Li
View author publications
Search author on:PubMed Google Scholar
Desen Cao
View author publications
Search author on:PubMed Google Scholar
Feihu Zhou
View author publications
Search author on:PubMed Google Scholar
Zhongheng Zhang
View author publications
Search author on:PubMed Google Scholar
Zhengbo Zhang
View author publications
Search author on:PubMed Google Scholar
Leo Anthony Celi
View author publications
Search author on:PubMed Google Scholar

Contributions

M.D., X.L., W.Y., and S.H. collected data, validated and updated models, and drafted the paper. J.R. and T.L. edited the paper and checked the results. P.H., C.L., L.C., Z.L., and Z.G. provided the expertise for the validation study design. D.C., F.Z., Zho. Z., Zhe. Z., and L.A.C. designed the study and critically reviewed the core content of the paper. All authors contributed to the methodology, results analysis, and discussions of the modeling process. All authors had full access to the datasets used in this study and confirmed the fidelity of the results. All authors had final responsibility for the decision to submit for publication.

Corresponding authors

Correspondence to Desen Cao, Feihu Zhou, Zhongheng Zhang or Zhengbo Zhang.

Ethics declarations

Competing interests

Zhongheng Zhang is an Editorial Board Member of NPJ Digital Medicine; he was not involved in the peer-review process or in any editorial decisions related to this paper. The remaining authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Duan, M., Liu, X., Yeung, W. et al. Multicenter validation and updating of the ELDER-ICU model for severity assessment in elderly critical illness. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02472-1

Download citation

Received: 19 August 2025
Accepted: 12 February 2026
Published: 09 April 2026
DOI: https://doi.org/10.1038/s41746-026-02472-1