Abstract
Frailty is a significant health concern in the aging global population, particularly among middle-aged and older adults with gastrointestinal disease (GID). Early detection of individuals at increased risk is critical for implementing timely preventive and therapeutic interventions. This study aimed to develop and validate an interpretable machine learning (ML) model to assess frailty risk in this population. To overcome the “black box” nature of conventional ML models, we integrated Shapley Additive exPlanations (SHAP), which helps identify key predictors of frailty and improve the interpretability of the model’s decision-making process. This study analyzed data from the 2013-2015 survey waves of the China Health and Retirement Longitudinal Study (CHARLS). To identify the most predictive variables for frailty, we employed a dual-method approach combining the Boruta algorithm and Least Absolute Shrinkage and Selection Operator (LASSO) regression. We applied ten different ML algorithms to develop prediction models: Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Classifier (GBC), Light Gradient Boosting Machine (LightGBM), K-Nearest Neighbors (KNN), Decision Tree (DT), Multilayer Perceptron (MLP), Naive Bayes (NB), and Adaptive Boosting (AdaBoost). The area under the receiver operating characteristic (ROC) curve (AUC) served as the primary performance metric. Additional metrics, including sensitivity, specificity, precision, and the F1-score, were used to comprehensively evaluate model accuracy. Calibration curves were generated to assess the consistency between predicted probabilities and observed risk, and the Brier score was used as a quantitative measure of calibration accuracy. Decision curve analysis (DCA) was performed to evaluate the clinical net benefit of each model. To understand the impact of individual predictors on the output, the SHAP method was used to provide transparent insights into each feature’s contribution to the estimated frailty risk. A total of 1,404 participants met the eligibility criteria for this study, of whom 444 (31.62%) were classified as frail. Using the Boruta algorithm and LASSO regression, we identified 10 key predictors of frailty. Among all the ML models tested, the LR model showed the best overall performance, achieving an AUC of 0.759 (95% CI: 0.711–0.806). Shapley Additive exPlanations (SHAP) analysis further revealed the top five predictors of frailty in this population: depression, grip strength, education level, the total number of chronic diseases, and self-rated health. This study introduces an interpretable ML model that effectively detects frailty risk among middle-aged and older adults with GID. The model demonstrates strong predictive accuracy and transparency, supporting its potential as a clinical decision-support tool pending further external validation and real-world deployment. Such proactive measures could improve patient care and promote better long-term health outcomes in this population.
Similar content being viewed by others
Acknowledgements
The authors thank Jie Liu, PhD (Chinese PLA General Hospital), Qilin Yang (The Second Affiliated Hospital of Guangzhou Medical University), and Haibo Li, PhD (Capital Institute of Pediatrics) for their valuable contributions to statistical analysis, manuscript review, and critical feedback. The authors also sincerely thank CHARLS, its participants, and staff for supporting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics
The CHARLS study protocol complies with the Declaration of Helsinki and received approval from the Institutional Review Board of Peking University (Approval No. IRB00001052-11015). All participants provided written informed consent prior to enrollment.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, Y., Chen, M. An interpretable machine learning model predicts frailty risk in middle-aged and older adults with gastrointestinal disease: a longitudinal study. Sci Rep (2026). https://doi.org/10.1038/s41598-026-50348-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-50348-x


