Abstract
This study was developed and evaluated deep learning model for detecting chronic kidney disease (CKD) by retinal fundus images. This study included 42,963 clinical visits from 17,442 patients who underwent retinal fundus examination between October 19, 2006, and September 13, 2018, with estimated glomerular filtration rate (eGFR) measurements available within a 7-day interval of the imaging examination. We developed and compared three model configurations: using a single fundus image (Model A), combining a single image with demographic features (Model B), and integrating bilateral fundus images (Model C). We compared two base architectures, EfficientNet-B3 and EfficientNetV2-S, and evaluated the impact of different training strategies: a single model versus a 5-fold cross-validation (CV) ensemble. Model performance was assessed using the Area Under the Curve (AUC), sensitivity, specificity, Positive Predictive Value (PPV) and Negative Predictive Value (NPV). Among all evaluated models, the bilateral-image model (Model C) utilizing the EfficientNet-B3 architecture with a 5-fold CV ensemble strategy demonstrated the best overall performance, achieving an AUC of 0.868, with a sensitivity of 0.792 and a specificity of 0.788 on an independent test set. The performance of this ensemble strategy was statistically superior to its single-model counterpart trained on the full dataset (AUC 0.850, pā<ā0.001). Among single models, Model B yielded the highest AUC (0.857) and sensitivity (0.794), while Model C offered the highest specificity (0.799), revealing a clinical trade-off between the different approaches. Furthermore, benchmarking against the newer EfficientNetV2-S architecture did not yield a performance benefit in this study. The study exhibited a superior performance in detecting advanced chronic kidney disease in patients with diabetes mellitus through retinal fundus image.
Introduction
Hyperglycemia is a hallmark of diabetes and is directly responsible for endothelial dysfunction, which leads to increased vascular permeability and abnormal capillary proliferation. These pathophysiological changes contribute to the development of microvascular complications, notably diabetic retinopathy (DR), diabetic macular edema (DME), and diabetic kidney disease (DKD)1. Chronic hyperglycemia accelerates the formation of advanced glycation end products (AGEs) and triggers inflammatory pathways that exacerbate vascular damage2. Both retinal and renal microvessels, being susceptible to hyperglycemia-induced damage, suffer from endothelial dysfunction, leading to impaired oxygen and nutrient exchange, which are characteristic of both DR and DKD3. The systemic nature of these complications highlights the interconnectivity between the visual and renal systems in diabetic patients, warranting further investigation into their shared mechanisms.
Diabetic vasculopathy, characterized by microvascular changes such as endothelial dysfunction, basement membrane thickening, and capillary non-perfusion, is commonly observed in patients with diabetes mellitus (DM). These pathologic changes are not only seen in the retinal microvasculature but also in the glomeruli of the kidneys, indicating a shared pathophysiology between DR and DKD3. Studies have shown that the same mechanisms, including oxidative stress, inflammation, and hyperglycemia-induced endothelial damage, affect both retinal and glomerular vessels4. The presence of diabetic vasculopathy in both organs suggests that progression of DR may parallel that of DKD, making it plausible that these conditions are linked through shared vascular pathophysiological changes5.
Several epidemiological studies have investigated the association between DR and DKD, with varying results. Some studies have reported a strong correlation between the severity of DR and the progression of DKD5. For instance, the presence of moderate or severe DR has been linked to a higher risk of developing DKD and vice versa. These studies suggest that DR severity may reflect the degree of kidney involvement, with more advanced stages of DR correlating with higher levels of albuminuria and reduced glomerular filtration rate (GFR) in diabetic patients5,6,7,8,9,10,11. Despite these findings, other studies have failed to demonstrate a consistent correlation, highlighting the complexity of the relationship between these two condition5,12,13,14,15,16.
Urine albumin and eGFR are widely accepted as the primary clinical markers for diagnosing and monitoring DKD. Elevated urine albumin levels and decreased eGFR are strong indicators of kidney dysfunction and the progression of DKD. However, their role in predicting the progression of both DR and DKD has yielded conflicting results. While some studies suggest that albuminuria and eGFR correlate with the severity of DR, other studies show inconsistent associations5,12,13,14. This discrepancy may be due to the multifactorial nature of DR and DKD, where other variables, such as glycemic control and hypertension, might influence the progression of both conditions.
While the association between DR and DKD has been investigated, the ability to use changes in DR to predict renal function status, particularly eGFR, remains to be further investigated. Existing literature has focused on identifying clinical and biochemical markers, such as albuminuria and serum creatinine, to track DKD progression5,16,17. However, retinal changes, including microaneurysms, hemorrhages, and macular edema, have not been consistently applied as predictive markers for the deterioration of kidney function18. A more thorough exploration of the predictive value of retinal changes could provide an accessible, non-invasive tool for monitoring kidney function in diabetic patients, potentially offering an early warning system for those at risk of DKD progression.
In this study, we aim to explore the potential of DR as a predictive marker for renal function status, specifically eGFR, in patients with DKD. Traditional methods of assessing kidney function rely on serum markers such as creatinine and albuminuria, but these measures often fail to capture early changes in renal function. Artificial intelligence (AI) analysis of retinal images offers an innovative approach to identifying subtle changes in DR that may correlate with renal function deterioration. By leveraging AI to analyze retinal images, we hope to develop a predictive tool that can monitor the progression of advanced CKD and offer more timely interventions for those with progressing DKD.
Methods and materials
Data collection
This study is a retrospective investigation of patients who underwent fundus imaging examinations at China Medical University Hospital between October 19, 2006, and September 13, 2018. The complete data filtering process is illustrated in Supplemental Fig.Ā 1. The initial database contained 124,550 clinical visits from 46,036 patients. We sequentially selected records meeting the following criteria: [1] availability of eGFR measurement within 7 days before or after the fundus imaging examination; [2] availability of HbA1c measurement within 1Ā day of the eGFR test. After further excluding patients younger than 18 years old, the final sample included 42,963 clinical visits from 17,442 patients.
Given the 12-year retrospective data collection period, the fundus images from these patient visits were captured using a variety of fundus cameras. Although the specific model for each device could not be retrieved from the original DICOM files, records indicate that the primary equipment manufacturers included INFINITT, VBTEC Inc., and GE Healthcare.
To mitigate the potential variations in field-of-view, resolution, and image quality arising from device heterogeneity, and to ensure consistency for model training, we adopted a standardized image preprocessing pipeline. This pipeline first involves cropping and padding the raw images to create a square format, after which all images are resized to a uniform dimension of 512 Ć 512 pixels. Finally, we applied the image processing method proposed by Graham19 to standardize brightness and enhance fine details, such as blood vessels.
Model development
According to the KDIGO 2024 CKD Guidelines, targeted group of CKD stages 4ā5 are defined as an eGFR less than 30 mL/min/1.73Ā m²20. In this study, we defined eGFRā<ā30 mL/min/1.73Ā m² as Class 1 (targeted group of CKD patients as case group) and eGFRāā„ā30 mL/min/1.73Ā m² as Class 0 (non-targeted group of CKD patients as control group). We used patientsā fundus images as input to train CKD classification models.
This study developed and compared several deep learning approaches based on two convolutional neural network (CNN) architectures. The primary architecture used was EfficientNet-B321. To provide a more comprehensive benchmark against a newer architecture, we also implemented and evaluated EfficientNetV2-Small (EfficientNet-V2-S)22. Using these base architectures, we developed three different model configurations for comparison, as shown in Fig.Ā 1. [1] Model A: Uses only a single fundus image as input. [2] Model B: Combines a single fundus image with demographic features (age and sex) as input. Since the evaluation of model performance is conducted at the patient level, for both Model A and B, the final prediction for a patient is determined by averaging the outputs from their respective left and right eye images, which are input into the model separately. [3] Model C: Simultaneously uses fundus images from both eyes as input. This model first extracts features from the left and right eye fundus images separately through CNN, then merges these features into fully connected layers, ultimately outputting the prediction result.
A significant challenge in this study was the severe class imbalance in the dataset, with a case-to-control (positive-to-negative) ratio of approximately 1:13, which could bias the model towards the majority (negative) class. To address this issue, we conducted preliminary experiments exploring various techniques, including training on the full dataset, oversampling the minority class, undersampling the majority class, and using a weighted loss function, as well as combinations thereof.
We found that a hybrid approach combining undersampling with a weighted loss function was the most effective strategy to prevent the model from being overly biased towards the negative class. Therefore, when training our final models, we applied this strategy to the training set. Specifically, we retained all positive samples and randomly selected negative samples equivalent to twice the number of positive samples, resulting in a positive-to-negative ratio of 1:2 in the training data. Furthermore, we utilized the BCEWithLogitsLoss function and set the pos_weight parameter to 1.5 to give additional importance to the positive class during training.
All models were trained on a single Nvidia Tesla V100-SXM3 32GB GPU. We used a batch size of 16 and the NAdam optimizer with an initial learning rate of 0.001. To dynamically adjust the learning rate during training, we employed the ReduceLROnPlateau scheduler, which reduces the learning rate when the validation loss stops improving. The models were trained for a maximum of 300 epochs, and an early stopping mechanism with a patience of 30 epochs was implemented to prevent overfitting. This strategy halts the training if validation performance does not improve for 30 consecutive epochs and retains the model weights that achieved the best performance on the validation set.
Dataset split
We performed a patient-level data split, allocating 80% of patients (13,954/17,442, comprising 34,443 records) to the training set and the remaining 20% (3,488/17,442, comprising 8,520 records) to the test set.
Our model development process involved two main stages. First, we performed hyperparameter tuning using 5-fold cross-validation within the 80% training set. We employed the StratifiedGroupKFold strategy to ensure two critical conditions during this process: [1] all records from the same patient appeared exclusively in either the training fold or the validation fold; and [2] a consistent class distribution was maintained across all folds. The class imbalance handling techniques, as described in the Model Development section, were applied exclusively to the training portion of each fold.
After identifying the optimal hyperparameter combination from the cross-validation stage, we proceeded to train a single, final model. For this stage, the entire 80% training set was further partitioned into a training subset (90%) and a validation subset (10%). The model was trained on the training subset, to which the aforementioned imbalance handling strategies were also applied, while the validation subset was used to monitor performance and implement an early stopping strategy. This approach allowed us to save the model weights that achieved the best performance on the validation set, thereby preventing overfitting. The final modelās classification performance was then exclusively evaluated on the test set.
Results
TableĀ 1 provides a detailed overview of the demographic characteristics of the patients included in this study, comparing those with case group (Targeted chronic kidney disease; T-CKD) and those in control group (Non-Targeted CKD, NT-CKD). A total of 42,963 records from 17,442 patients were analyzed, with 2,909 in case group of targeted CKD and 40,054 in control group of non-targeted CKD. The mean age for patients in case group of T-CKD was significantly higher at 69.7 years compared to 62.6 years for patients in control group of NT-CKD (Pā<ā0.001). The analysis also showed that hypertension (HT) was much more prevalent among patients in case group of T-CKD (93.2%) compared to patients in NT-CKD (67.2%, Pā<ā0.001), highlighting the common comorbidities associated with kidney disease. The eGFR was much lower in the T-CKD group (18.6 mL/min/1.73Ā m²) compared to the NT-CKD group (80.4 mL/min/1.73Ā m², Pā<ā0.001), confirming the disease status of the cohort. Additionally, the HbA1c levels were similar across both groups, with a mean of 7.5%, indicating that glycemic control was comparable, although other factors such as hypertension and coronary artery disease (CAD) may contribute to the progression of CKD in these patients.
The performance of all evaluated model configurations on the independent test set is summarized in TableĀ 2. We first benchmarked the performance of three single-model configurations using the EfficientNet-B3 architecture. To address the potential violation of the independent and identically distributed (i.i.d.) assumption, we performed a sensitivity analysis, with detailed results available in the Supplementary Material (Supplemental Table 1). The analysis result revealing that training on bilateral images as if they were independent samples led to a statistically significant inflation of AUC for Model A (pā=ā0.006) or fundamentally skewed the sensitivity-specificity balance of Model B. This finding establishes the necessity of the i.i.d.-compliant training approach used in our subsequent analyses.
Based on this rigorous approach, the single-image model (Model A) achieved an AUC of 0.814. Model B, which combined a single fundus image with demographic features, demonstrated significantly improved performance with an AUC of 0.857, the highest among the three single models (pā<ā0.05 compared to Model C). The bilateral-image model (Model C) yielded an AUC of 0.850, which was not statistically different from that of Model A. A deeper analysis of the metrics revealed that Model B had the highest sensitivity (0.794), whereas Model C possessed the highest specificity (0.799), indicating their respective advantages in identifying and ruling out target patients.
Next, we explored the impact of a 5-fold cross-validation ensemble (soft voting) strategy on the three model configurations. The results showed that the ensemble strategy universally and significantly improved the performance of all models compared to their single-model counterparts. The AUC of Model A increased from 0.849 to 0.860, Model Bās AUC improved from 0.857 to 0.862, and Model Cās AUC rose from 0.850 to 0.868. Among the ensemble models, Model C achieved the highest overall AUC, which was statistically significant compared to both ensemble Model A and Model B (pā<ā0.05 for both), while there was no significant difference between the AUCs of ensemble Model A and B (pā=ā0.125). Consistent with the single-model findings, the ensemble Model B maintained the highest sensitivity (0.796), while the ensemble Model C had the highest specificity (0.788). Furthermore, for Model C, the ensemble strategy substantially boosted sensitivity from 0.721 to 0.792 compared to its single-model version, at the cost of a slight decrease in specificity (0.799 vs. 0.788), effectively balancing its detection and exclusion capabilities.
To provide an intuitive understanding of the best modelās clinical performance in absolute numbers, Figs. 2 Ā 3 shows its confusion matrix on the test set. The matrix reveals that out of 8,520 cases, the model correctly identified 454 T-CKD cases (True Positives) and correctly ruled out 6,260 NT-CKD cases (True Negatives). Concurrently, the model generated 1,687 false positive cases and 119 false negative cases.
Finally, we evaluated a more advanced architecture, EfficientNetV2-S. In this study, however, the newer architecture did not yield a performance benefit. The AUCs of Model B and Model C with the EfficientNetV2-S architecture decreased to 0.845 and 0.842, respectively, both of which were significantly lower than their EfficientNet-B3 counterparts (pā<ā0.05). Although Model C (V2) exhibited the highest sensitivity among all models (0.810), its specificity was also notably lower.
In summary, while the single-model configurations showed varied performance, the 5-fold CV ensemble strategy was a key driver of performance, consistently elevating the results for all models. Among them, Model C, utilizing the EfficientNet-B3 architecture with this ensemble strategy, demonstrated the best overall performance in this study, achieving a statistically superior AUC of 0.868 with a well-balanced sensitivity and specificity.
Subgroup analysis
Supplemental Table 2 presents the subgroup analysis results of the best Model C (EfficientNet-B3, 5-fold CV Ensemble) on the test dataset. In age stratification, we observed significant performance variations: the model demonstrated the lowest sensitivity (0.667) but highest specificity (0.967) in the younger age group (18ā40 years); with increasing age, sensitivity showed a steady increase while specificity gradually decreased. In the elderly population (>ā65 years), sensitivity improved to 0.818, but specificity notably declined to 0.635. Gender comparison revealed better overall performance in the male subgroup (AUCā=ā0.882, sensitivityā=ā0.773, specificityā=ā0.839) compared to the female subgroup (AUCā=ā0.855, sensitivityā=ā0.811, specificityā=ā0.733). Notably, the female subgroup showed slightly higher sensitivity but markedly lower specificity. In the comparison of HbA1c levels, the model performed similarly in subgroups with HbA1cā<ā6.5% (AUCā=ā0.877, Sensitivityā=ā0.795, Specificityā=ā0.798) and HbA1cāā„ā6.5% (AUCā=ā0.865, Sensitivityā=ā0.792, Specificityā=ā0.785), suggesting that the degree of diabetes control had relatively limited impact on model prediction performance.
Regarding comorbidity analysis, the model performed excellently in the non-hypertensive subgroup (AUCā=ā0.915, sensitivityā=ā0.73, specificityā=ā0.896), significantly outperforming the hypertensive subgroup (AUCā=ā0.839, sensitivityā=ā0.797, specificityā=ā0.733). Similarly, overall performance in the non-coronary artery disease (CAD) subgroup (AUCā=ā0.866, sensitivityā=ā0.772, specificityā=ā0.797) was slightly superior to the CAD subgroup (AUCā=ā0.853, sensitivityā=ā0.872, specificityā=ā0.694), despite the latter having higher sensitivity. The most pronounced difference appeared in heart failure analysis, where the non-heart failure subgroup (AUCā=ā0.864, sensitivityā=ā0.774, specificityā=ā0.797) significantly outperformed the heart failure subgroup (AUCā=ā0.822, sensitivityā=ā0.858, specificityā=ā0.609).
The subgroup analysis results highlight that, except for HbA1c levels, notable performance differences exist across subgroups of demographic characteristics and comorbidity status. These disparities indicate that future research should focus on optimizing model design to reduce performance gaps between different subgroups, particularly for elderly patients, female patients, and patient populations with cardiovascular comorbidities.
Saliency maps
Figure 2presents saliency maps generated using the Grad-CAMā+ā+ātechnique to visualize the key regions the model utilizes for classification23. For a representative true positive case (Fig.Ā 2B), the modelās attention is primarily concentrated on the optic disc and its surrounding vasculature, which are highlighted as high-intensity areas. To further investigate the importance of these features, we performed an ablation study. When the optic disc was computationally removed (Fig.Ā 2C), the model adapted by shifting its focus to the remaining vascular network and other retinal features. Conversely, when only the vascular structures were removed (Fig.Ā 2D), the modelās attention remained strongly focused on the optic disc. Finally, when both the optic disc and vessels were removed (Fig.Ā 2E), the model was forced to seek information from the remaining retinal background, resulting in a more diffuse activation pattern.
These visualizations collectively demonstrate that the model learns to prioritize the optic disc and vascular structures for CKD prediction. This behavior is consistent with established pathophysiological links between the retinal microvascular abnormalities visible in the fundus and the renal microvascular damage characteristic of chronic kidney disease.
Feature ablation experiments
To analysis the contribution of specific retinal structures, we designed a feature ablation experiment to observe the difference of model performance by removing specific retinal structure. Removal of the optic disk decreased the AUC from 0.849 to 0.842 (Pā<ā0.001), and decreased Sensitivity from 0.752 to 0.689, indicating that the optic disk provides information to the model for identifying T-CKD. In contrast, the removal of blood vessels significantly reduced the AUC to 0.82 (Pā<ā0.001), and the Specificity and PPV were also significantly reduced, suggesting that the characteristics of blood vessels play an important role in the identification of NT-CKD. When both the optic disc and the vasculature were removed, model performance was further reduced (AUCā=ā0.813, Pā<ā0.001), demonstrating the cumulative importance of these structures in model performance (Supplemental Table 3).
Discussion
Our deep learning models demonstrated promising performance in predicting CKD using retinal fundus images, achieving an AUC of 0.868 for the best-performing model (Model C utilizing the EfficientNet-B3 architecture with a 5-fold CV ensemble strategy). This result is consistent with, and in some cases, exceeds the performance seen in previous studies24,25,26.
In this study, we discovered a significant performance trade-off among different modeling strategies for predicting advanced CKD. In a standardized comparison of single models, the model combining a unilateral fundus image with demographic features (Model B) achieved the highest AUC (0.857) and sensitivity (0.794), whereas the model that directly integrated bilateral images (Model C) yielded the highest specificity (0.799). This finding challenges the intuitive assumption that bilateral information necessarily leads to superior overall performance and instead reveals the distinct advantages of each approach: Model B shows greater potential as a high-sensitivity preliminary screening tool, while Model C is more valuable in scenarios requiring high specificity to reduce false positives. It is crucial to note, however, that these findings are based on data from a single center, and this observed trade-off requires further investigation with more diverse, external datasets to confirm its generalizability.
Additionally24, developed a deep learning model to detect CKD from retinal photographs, reporting an AUC of 0.90, which is similar to our best Model C. Their study also emphasized the feasibility of using retinal imaging as a non-invasive tool for progression monitoring in advanced CKD, supporting our findings. Furthermore, Zhang et al25. reported an AUC of 0.88 in their study using deep learning to predict both CKD and T2D from retinal images, aligning with our own results and reinforcing the notion that DR can serve as a predictive marker for CKD. A recent study by Betzler et al26. also focused on using deep learning algorithms to detect DKD from retinal photographs in multiethnic populations. Their model achieved strong performance with an AUC of 0.91, further supporting the potential of retinal imaging in diagnosing DKD across diverse patient populations. Betzler et al.ās work emphasizes the robustness of deep learning models in identifying CKD in multiethnic cohorts, which complements our findings by highlighting the applicability of retinal imaging-based algorithms in global populations with diabetes. Their work, like ours, illustrates that retinal imaging could serve as an effective, non-invasive alternative to clinical biomarkers such as serum creatinine and albuminuria for progression monitoring in advanced CKD in diabetic patients.
One of the novel findings of our study lies in the subgroup analysis (Supplemental Table 2), which reveals that the modelās performance is significantly higher in younger populations, particularly in the 18ā40 and 41ā65 age groups (AUCā=ā0.974 and AUCā=ā0.908, respectively). This finding is consistent with the idea that retinal changes associated with DKD are more readily detectable in younger patients before the progression to end-stage kidney failure. In previous studies, there have been reports indicating that early-stage CKD is more easily detected in younger diabetic populations, likely due to fewer compounding comorbidities and less vascular aging in younger individuals24,26. This supports the notion that retinal imaging can be an effective early diagnostic tool in younger diabetic patients, making this result in line with established literature. However, we observed lower performance in the age group over 65 (AUCā=ā0.795), which was an important and novel observation. The sensitivity and specificity of the model decreased significantly in this subgroup, indicating that retinal changes may not always correlate as strongly with kidney function in older adults. This finding has not been consistently reported in previous studies, but some literature suggests that age-related changes in the renal vasculature and retinamight contribute to less reliable associations between diabetic retinopathy and kidney dysfunction in elderly populations27,28,29,30. Aging and multimorbidity may complicate the detection of early-stage CKD in older adults, as kidney function may decline due to other factors beyond DR, such as arterial stiffness and chronic hypertension. Our findings support the idea that elderly populations may need more tailored approaches when using retinal imaging for CKD prediction. Overall, the high performance in the younger population is particularly promising, as it supports the potential of using retinal imaging as a non-invasive tool for the early detection of CKD, especially in diabetic individuals who are at risk of developing kidney disease at an earlier age. These results underscore the importance of personalized screening strategies, as younger patients could benefit the most from retinal-based monitoring, while elderly individuals may require more targeted approaches to improve predictive accuracy.
One of the major advantages of our AI-driven retinal image analysis approach over traditional methods, such as blood tests (e.g., serum creatinine, albuminuria), lies in its non-invasive nature. Traditional blood tests are often invasive, time-consuming, and may lead to patient discomfort, especially for those who require frequent monitoring. Moreover, blood tests are associated with the risk of infection and medical waste. In contrast, fundus imaging is a quick, non-invasive, and patient-friendly method that can be easily performed in clinical settings without the need for patient discomfort or risks. This reduces the burden on patients and healthcare providers, allowing for more frequent screenings and early detection of diabetic kidney disease. Additionally, the retinal changes associated with DR and DN occur simultaneously and can be detected early through retinal imaging, making this approach more time-efficient and accessible compared to regular blood tests. Our findings highlight the clinical utility of retinal imaging as an alternative that can be integrated into routine diabetes management practices.
Given the strong association between DR and DKD, it is crucial for diabetes healthcare providers to maintain a high index of suspicion for CKD in diabetic patients. Diabetic vasculopathy, a common pathophysiological factor shared by both conditions, results in microvascular damage that affects both the retina and kidneys. As our study shows, retinal changes could serve as an early warning system for predicting renal function deterioration. Progression monitoring in advanced CKD in diabetic patients can lead to timely interventions that may prevent or slow the progression to end-stage renal disease (ESRD). Therefore, diabetes societies and healthcare professionals should consider retinal imaging as part of the routine screening for CKD in diabetic patients, particularly those with advanced stages of diabetic retinopathy. The potential for AI analysis of retinal images to predict renal function status provides a non-invasive tool to identify patients at risk for DKD before traditional markers, such as serum creatinine or albuminuria, show significant changes. By incorporating this technology into standard clinical care, healthcare providers can take a more proactive approach to managing diabetic kidney disease.
Our findings are robust; nevertheless, some limitations should be addressed further. Firstly, we note that the low PPV arises primarily from the relatively low prevalence of advanced CKD in our dataset. While the high NPV (0.981) supports use as a rule-out tool, the low PPV limits immediate clinical applicability as a stand-alone screening method. We also highlight potential strategies to mitigate this issue, including combining fundus analysis with additional biomarkers or targeted screening in high-risk populations. Second, we acknowledge that the lack of external validation is a limitation of this study. To understand the modelās generalizability across different ethnicities and fundus cameras, further testing on more diverse datasets is required.
Conclusion
In conclusion, our study demonstrates that AI-driven analysis of retinal images can serve as a powerful tool for progression monitoring in advanced CKD in diabetic patients. The high performance of the models, especially using bilateral fundus images, along with the non-invasive nature of retinal imaging, supports its potential as a valuable alternative to traditional blood examinations. Given the shared vascular pathophysiology between DR and DKD, integrating retinal imaging into clinical practice could progressively monitor in advanced CKD and preventative management of CKD, particularly in high-risk diabetic populations.
Data availability
All available data are presented in the text of the paper.
Abbreviations
- DR:
-
diabetic retinopathy
- DME:
-
diabetic macular edema
- DKD:
-
diabetic kidney disease
- AGEs:
-
advanced glycation end products
- DM:
-
diabetes mellitus
- eGFR:
-
estimated glomerular filtration rate
- AI:
-
Artificial intelligence
- CNN:
-
convolutional neural network
- T-CKD:
-
Targeted chronic kidney disease
- NT-CKD:
-
Non-Targeted chronic kidney disease
- HT:
-
hypertension
- CAD:
-
coronary artery disease
References
Vithian, K. & Hurel, S. Microvascular complications: pathophysiology and management. Clin. Med. (Lond). 10, 505ā509 (2010).
Khalid, M., Petroianu, G. & Adem, A. Advanced glycation end products and diabetes mellitus: mechanisms and perspectives. Biomolecules 12, 542 (2022).
Yang, J. & Liu, Z. Mechanistic pathogenesis of endothelial dysfunction in diabetic nephropathy and retinopathy. Front. Endocrinol. (Lausanne). 13, 816400 (2022).
Cheng, H. & Harris, R. C. Renal endothelial dysfunction in diabetic nephropathy. Cardiovasc. Hematol. Disord Drug Targets. 14, 22ā33 (2014).
Fang, J., Luo, C., Zhang, D., He, Q. & Liu, L. Correlation between diabetic retinopathy and diabetic nephropathy: a two-sample Mendelian randomization study. Front. Endocrinol. (Lausanne). 14, 1265711 (2023).
Chen, X. et al. The link between diabetic retinal and renal microvasculopathy is associated with dyslipidemia and upregulated Circulating level of cytokines. Front. Public. Health. 10, 1040319 (2023).
Dash, S., Chougule, A. & Mohanty, S. Correlation of albuminuria and diabetic retinopathy in type-II diabetes mellitus patients. Cureus 14, e21927 (2022).
Park, H. C. et al. Diabetic retinopathy is a prognostic factor for progression of chronic kidney disease in the patients with type 2 diabetes mellitus. PLoS One. 14, e0220506 (2019).
Hsing, S. C. et al. The severity of diabetic retinopathy is an independent factor for the progression of diabetic nephropathy. J. Clin. Med. 10, 3 (2021).
Yamanouchi, M. et al. Retinopathy progression and the risk of end-stage kidney disease: results from a longitudinal Japanese cohort of 232 patients with type 2 diabetes and biopsy-proven diabetic kidney disease. BMJ Open. Diabetes Res. Care. 7, e000726 (2019).
Gupta, M. et al. Diabetic retinopathy is a predictor of progression of diabetic kidney disease: a systematic review and meta-analysis. Int J Nephrol. 3922398 (2022). (2022).
Singh, S., Patel, P. S. & Archana, A. Heterogeneity in kidney histology and its clinical indicators in type 2 diabetes mellitus: a retrospective study. J. Clin. Med. 12, 1778 (2023).
Dos Reis, M. A. et al. Clinical features most frequently present in patients with concomitant diabetic kidney disease and diabetic retinopathy. Arch. Endocrinol. Metab. 68, e230377 (2024).
Bermejo, S. et al. The coexistence of diabetic retinopathy and diabetic nephropathy is associated with worse kidney outcomes. Clin. Kidney J. 16, 1656ā1663 (2023).
Cao, X., Gong, X. & Ma, X. Diabetic nephropathy versus diabetic retinopathy in a Chinese population: a retrospective study. Med. Sci. Monit. 25, 6446ā6453 (2019).
Hung, C. C. et al. Diabetic retinopathy and clinical parameters favoring the presence of diabetic nephropathy could predict renal outcome in patients with diabetic kidney disease. Sci. Rep. 7, 1236 (2017).
Retinopathy progression. And the risk of end-stage kidney disease: results from a longitudinal Japanese cohort of 232 patients with type 2 diabetes and biopsy-proven diabetic kidney disease. BMJ Open. Diab Res. Care. 7, e000726 (2019).
Rico-Fontalvo, J. et al. Novel biomarkers of diabetic kidney disease. Biomolecules 13 (4), 633 (2023).
Graham, B. Kaggle diabetic retinopathy detection competition report. (2015). https://kaggle-forum-message-attachments.storage.googleapis.com/88655/2795/competitionreport.pdf
Kidney disease. Improving global outcomes (KDIGO) CKD work group. KDIGO 2024 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int. 105 (4 Suppl), S117āS314. https://doi.org/10.1016/j.kint.2023.10.018 (2024).
Tan, M. EfficientNet: rethinking model scaling for convolutional neural networks. ArXiv Preprint arXiv:1905. 11946, 6105ā6114 (2019).
Tan, M. and Quoc Le. Efficientnetv2: smaller models and faster training. International conference on machine learning. PMLR, 10096ā10106 (2021).
Chattopadhyay, A., Sarkar, A., Howlader, P. & Balasubramanian, V. N. Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 839ā847 (2017).
Sabanayagam, C. et al. A deep learning algorithm to detect chronic kidney disease from retinal photographs in community-based populations. Lancet Digit. Health. 2, e295āe302 (2020).
Zhang, K. et al. Deep-learning models for the detection and incidence prediction of chronic kidney disease and type 2 diabetes from retinal fundus images. Nat. Biomed. Eng. 5, 533ā545 (2021).
Betzler, B. K. et al. Deep learning algorithms to detect diabetic kidney disease from retinal photographs in multiethnic populations with diabetes. J. Am. Med. Inf. Assoc. 30 (12), 1904ā1914 (2023).
Leley, S. P., Ciulla, T. A. & Bhatwadekar, A. D. Diabetic retinopathy in the aging population: a perspective of pathogenesis and treatment. Clin. Interv Aging. 16, 1367ā1378 (2021).
Deva, R. et al. Vision-threatening retinal abnormalities in chronic kidney disease stages 3 to 5. Clin. J. Am. Soc. Nephrol. 6, 1866ā1871 (2011).
SmĆ„brekke, S. et al. The retinal vasculature and risk of age-related GFR decline ā the renal Iohexol clearance survey. Kidney Int. Rep. 10, 1384ā1392 (2025).
Gao, L. Q. et al. Diabetic retinopathy and chronic kidney disease: associations and comorbidities in a large diabetic population - the Tongren health care study. Am. J. Nephrol. 55, 175ā186 (2024).
Acknowledgements
We are grateful to China Medical University, Taiwan for providing administrative, technical and funding support. The funders had no role in the study design, data collection and analysis, the decision to publish, or preparation of the manuscript. No additional external funding was received for this study.
Author information
Authors and Affiliations
Contributions
All authors have contributed significantly, and that all authors are in agreement with the content of the manuscript: Conceptualization: Chuan-Fan Hsu, Tung-Min Yu, Ya-Lun Wu, Wei-Chun Wang, Jun-Sing Wang, Shih-Sheng ChangMethodology: Chuan-Fan Hsu, Tung-Min Yu, Ya-Lun Wu, Wei-Chun Wang, Jun-Sing Wang, Shih-Sheng ChangSoftware: Chuan-Fan Hsu, Shih-Sheng ChangValidation: Chuan-Fan Hsu, Tung-Min Yu, Ya-Lun Wu, Wei-Chun Wang, Jun-Sing Wang, Shih-Sheng ChangFormal Analysis: Chuan-Fan Hsu, Shih-Sheng ChangInvestigation: Chuan-Fan Hsu, Shih-Sheng ChangResources: Chuan-Fan Hsu, Shih-Sheng ChangData Curation: Chuan-Fan Hsu, Shih-Sheng ChangWriting ā Original Draft Preparation: Chuan-Fan Hsu, Tung-Min Yu, Shih-Sheng ChangWriting ā Review and Editing: Chuan-Fan Hsu, Tung-Min Yu, Ya-Lun Wu, Wei-Chun Wang, Jun-Sing Wang, Shih-Sheng ChangVisualization: Shih-Sheng ChangSupervision: Shih-Sheng Chang.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics statement
This study was approved by the Ethics Committee of China Medical University Hospital [Approval number: CMUH114-REC3-054] and was conducted in accordance with the ethical principles of the Declaration of Helsinki. The dataset consists of previously de-identified data used for research purposes, with all personally identifiable information encrypted. Due to the retrospective nature of the study and the anonymization of all data prior to analysis, the requirement for informed consent was waived by the Ethics Committee of China Medical University Hospital [Approval number: CMUH114-REC3-054]. All methods were carried out in accordance with relevant guidelines and regulations.
Guarantor statement
Shih-Sheng Chang is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Additional information
Publisherās note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleās Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleās Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hsu, CF., Yu, TM., Wu, YL. et al. Prediction of advanced chronic kidney disease through retinal fundus images by deep learning. Sci Rep 15, 37318 (2025). https://doi.org/10.1038/s41598-025-21366-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-21366-y


