Abstract
Chronic kidney disease (CKD) is a common complication of type 2 diabetes mellitus (T2DM), with limited predictive tools for individualized prognosis, particularly in Asian populations. We developed deep learning-based prognostic models using a 17-year longitudinal electronic health record dataset from 569,680 individuals across 165 public healthcare facilities in Hong Kong. By integrating clinical, biochemical, and prescription history data, the models achieved robust time-dependent predictions of CKD progression at 2-, 5-, and 10-year intervals, with the area under the receiver operating characteristic curve (AUC) of 87.1%, 85.3%, and 84.7%, respectively. Shapley Additive exPlanations (SHAP) revealed key predictors, including serum creatinine, sex, age, and angiotensin prescription history. External validation in the UK Biobank and China Health and Retirement Longitudinal Study (CHARLS) cohorts confirmed generalizability, with AUCs ranging from 74.6% to 82.0%. These models provide a scalable and interpretable framework for early risk stratification and personalized intervention for T2DM-related CKD progression.
Similar content being viewed by others
Data Availability
The electronic health record (EHR) data used in this study were obtained from the HADCL in Hong Kong. Due to institutional policies and patient confidentiality regulations, the datasets are not publicly available. Access to the data requires approval from the Hospital Authority and can be requested through a formal application to the HADCL.
Code availability
The code used for model development, training, and evaluation is not publicly available at this time but can be made available from the corresponding author upon reasonable request.
References
Shi, L. et al. Prevalence and Risk Factors of Chronic Kidney Disease in Patients With Type 2 Diabetes in China: Cross-Sectional Study. JMIR Public Health Surveill 10, e54429 (2024).
Morales, J. & Handelsman, Y. Cardiovascular outcomes in patients with diabetes and kidney disease JACC review topic of the week. J. Am. Coll. Cardiol. 82, 161–170 (2023).
Jin, R., Grunkemeier, G. L., Brown, J. R. & Furnary, A. P. Estimated glomerular filtration rate and renal function. Ann. Thorac. Surg. 86, 1–3 (2008).
Erman, A. et al. The urine albumin-to-creatinine ratio: assessment of its performance in the renal transplant recipient population. Clin. J. Am. Soc. Nephrol. 6, 892–897 (2011).
Slieker, R. C. et al. Performance of prediction models for nephropathy in people with type 2 diabetes: systematic review and external validation study. Br. Med. J. 374, n2134 (2021).
Wang, F. et al. Prevalence and risk factors for CKD: a comparison between the adult populations in China and the United States. Kidney Int. Rep. 3, 1135–1143 (2018).
Hsu, C.-Y., Lin, F., Vittinghoff, E. & Shlipak, M. G. Racial differences in the progression from chronic renal insufficiency to end-stage renal disease in the United States. J. Am. Soc. Nephrol. 14, 2902–2907 (2003).
Whaley-Connell, A., Nistala, R. & Chaudhary, K. The importance of early identification of chronic kidney disease. Mo Med, 108, 25–28 (2011).
Whittaker, C. F., Miklich, M. A., Patel, R. S. & Fink, J. C. Medication safety principles and practice in CKD. Clin. J. Am. Soc. Nephrol. 13, 1738–1746 (2018).
Mortazavi, B. J. et al. Analysis of machine learning techniques for heart failure readmissions. Circ. Cardiovasc. Qual. Outcomes 9, 629–640 (2016).
Bai, Q., Su, C., Tang, W. & Li, Y. Machine learning to predict end stage kidney disease in chronic kidney disease. Sci. Rep. 12, 8377 (2022).
Song, X. et al. Longitudinal risk prediction of chronic kidney disease in diabetic patients using a temporal-enhanced gradient boosting machine: retrospective cohort study. JMIR Med Inf. 8, e15510 (2020).
Cheng, L. C., Hu, Y. H. & Chiou, S. H. Applying the temporal abstraction technique to the prediction of chronic kidney disease progression. J. Med. Syst. 41, 85 (2017).
Zhu, Y. T., Bi, D. H., Saunders, M. & Ji, Y. Prediction of chronic kidney disease progression using recurrent neural network and electronic health records. Sci. Rep.-Uk. 13, 22091 (2023).
Mahesh, T. R. et al. Transformative breast cancer diagnosis using CNNs with optimized ReduceLROnPlateau and early stopping enhancements. Int. J. Comput. Int. Syst. 17, 14 (2024).
Cao, J., Zhao, D., Tian, C. L., Jin, T. & Song, F. Adopting improved Adam optimizer to train dendritic neuron model for water quality prediction. Math. Biosci. Eng. 20, 9489–9510 (2023).
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural Inf. 30, 4768–4777 (2017).
Liu, E. W., Liu, R. Y. & Lim, K. Using the Weibull accelerated failure time regression model to predict time to health events. Appl. Sci.—-Basel 13, 13041 (2023).
Dai, L. et al. A deep learning system for predicting time to progression of diabetic retinopathy. Nat. Med. 30, 584–594 (2024).
Ruan, X., Wang, L., Thongprayoon, C., Cheungpasitporn, W. & Liu, H. GRU-D-Weibull: a novel real-time individualized endpoint prediction. Artif. Intell. Med, 146, 102696 (2023).
Dudley, W. N., Wickham, R. & Coombs, N. An introduction to survival statistics: Kaplan-Meier analysis. J. Adv. Pract. Oncol. 7, 91–100 (2016).
Huang, Y. X., Li, W. T., Macheret, F., Gabriel, R. A. & Ohno-Machado, L. A tutorial on calibration measurements and calibration models for clinical prediction models. J. Am. Med. Inf. Assn. 27, 621–633 (2020).
Liang, Y., Chao, H., Zhang, J., Wang, G. & Yan, P. Unbiasing fairness evaluation of radiology AI model. Meta Radiol. 2, 100084 (2024).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Lau, I. T. A clinical practice guideline to guide a system approach to diabetes care in Hong Kong. Diab. Metab. J. 41, 81–88 (2017).
Han, H., Wang, W. Y. & Mao, B. H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Lect. Notes Comput. Sci. 3644, 878–887 (2005).
He, H. B., Bai, Y., Garcia, E. A. & Li, S. T. ADASYN: adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IJCNN), 1322–1328 (IEEE, 2008).
Zhao, Y. H., Hu, Y. S., Smith, J. P., Strauss, J. & Yang, G. H. Cohort Profile: the China Health and Retirement Longitudinal Study (CHARLS). Int. J. Epidemiol. 43, 61–68 (2014).
Mukhopadhyay, P. et al. Log-rank test vs MaxCombo and difference in restricted mean survival time tests for comparing survival under nonproportional hazards in immuno-oncology trials: a systematic review and meta-analysis. JAMA Oncol. 8, 1294–1300 (2022).
Royston, P. & Parmar, M. K. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat. Med. 30, 2409–2421 (2011).
Mao, D. et al. Risk associations of long-term HbA1c variability and obesity on cancer events and cancer-specific death in 15,286 patients with diabetes—a prospective cohort study. Lancet Reg. Health-West. Pac. 18, 100315 (2022).
Ndumele, C. E. et al. Cardiovascular-kidney-metabolic health: a presidential advisory from the American Heart Association. Circulation 148, 1606–1635 (2023).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Lu, J. et al. An electronic health record-linked machine learning tool for diabetes risk assessment in adults with prediabetes. Innov. Med. 3, 100106 (2025).
Blazek, K., van Zwieten, A., Saglimbene, V. & Teixeira-Pinto, A. A practical guide to multiple imputation of missing data in nephrology. Kidney Int. 99, 68–74 (2021).
Yang, L. et al. Risk factors of chronic kidney diseases in Chinese adults with type 2 diabetes. Sci. Rep. 8, 14686 (2018).
Cai, J., Luo, J. W., Wang, S. L. & Yang, S. Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018).
Chen, Z. et al. Feature selection may improve deep neural networks for the bioinformatics problems. Bioinformatics 36, 1542–1552 (2020).
Moons, K. G. M. et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. Br. Med. J. 388, e082505 (2025).
Acknowledgments
We thank the Hong Kong Hospital Authority Data Collaboration Lab for providing the electronic health record data and their dedicated work of coordinating the research requirements of devices, data access, and other IT support. The study was supported by the Dean’s reserve fund at the Hong Kong Polytechnic University.
Author information
Authors and Affiliations
Contributions
L.Y., D.H., J.L., and D.H.K.S. conceived and supervised the study design. Y.Z. designed the methodology, developed the prediction models, performed data analysis, and prepared tables and figures. Y.Z., L.Y., S.L., J.L., C.W.L., and M.K.W. curated the data and coordinated data access from the Hospital Authority. T.L. provided external validation data, and H.R. and X.L. contributed to external validation efforts and modeling strategy. L.X. and F.W. assisted in the literature review. Y.Z. and L.Y. wrote the main paper text. D.H.K.S. and L.Y. provided project oversight and secured funding. All authors critically reviewed, edited, and approved the final version of the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhao, Y., Lu, S., Lu, J. et al. Risk Prediction of Chronic Kidney Disease Progression in Type 2 Diabetes Mellitus Across Diverse Populations. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02439-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-026-02439-2


