Introduction

Diabetic foot ulcers (DFUs) are one of the most serious complications of diabetes. It affects approximately 18.6 million people worldwide each year and is associated with increased rates of amputation and death1. Studies have shown a high mortality rate after major lower limb amputations due to diabetes and peripheral vascular disease. The 5-year mortality rate was very high among patients with any amputation (major and minor combined), ranging from 53% to 100%2. In light of the significant challenges posed by DFUs, including high rates of disability and mortality, as well as substantial treatment costs, it is crucial to proactively prevent the occurrence of DFUs. Existing studies have shown that diabetic neuropathy (DN) is closely related to DFUs3. DN is the most common complication of diabetes mellitus. It is characterized by a glove and stocking-like presentation of distal sensory or motor function loss4. In most cases, sensory neuropathy is accompanied by reduced perception of temperature and the sensation of pain. Consequently, the risk for trauma is significantly higher5,6,7,8. Due to the missing pain symptomatology, serious ulcerations are underestimated by both patients and doctors9,10. Therefore, DN emerges as a significant factor contributing to the development of DFUs, and prioritizing DN prevention can yield substantial benefits in terms of reducing amputation rates and patient mortality.

Plenty of studies have demonstrated that various factors, including age, duration of diabetes, HbA1c, fasting blood glucose (FBG), High triglyceride (HTG), total cholesterol (TC), hypertension, urine protein concentration, Diabetic retinopathy (DR), Diabetic kidney disease (DKD), and Cardiovascular diseases (CVD), can be used to predict the risk of DN11,12,13,14,15. However, most of these studies have focused on risk factors for the occurrence of DN, and few have looked at risk factors for death in patients with DN. Therefore, exploring the risk factors for death in patients with DN and establishing a prediction model for mortality risk can lead to early intervention and treatment of patients with DN to mitigate their poor prognosis and thus reduce their mortality.

Existing research shows that artificial intelligence methods can develop effective diagnostic and predictive tools to identify various diseases16. In recent years, machine learning (ML) algorithms have been found to have excellent predictive performance17. Unfortunately, due to the “black box” nature of ML algorithms, the lack of interpretability has been a major barrier to the implementation of ML models in healthcare18. Therefore, in this study, we not only built a model using ML algorithms to predict DN mortality risk factors and validated its performance, but also used SHapley Additive exPlanations (SHAP) analysis to comprehensively evaluate and interpret the ML model19.

Methods

Database

The present study is a secondary analysis of the existing data extracted from the Medical Information Mart for Intensive Care (MIMIC-IV), which covers the period from 2008 to 2019. MIMIC-IV is the result of a collaboration between Beth Israel Deaconess Medical Center (BIDMC) and Massachusetts Institute of Technology (MIT). Data collected at BIDMC as part of routine clinical care is deidentified, transformed, and made available to researchers who have completed training in human research and signed a data use agreement20. A researcher who has completed the Collaborative Institutional Training Initiative examination (Certification number 61042076 for author Huang) can access this database. Consent for data acquisition in the MIMIC project was approved by the institutional review boards of the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center. Therefore, patient informed consent and ethical approval were exempted from the present study.

Study population

We first obtained raw data using Structured Query Language (version 14.0) with Navicat Premium software (version 15.0.12). Structured Query Language was used to extract patient data, including sociodemographic characteristics, vital signs, laboratory parameters, complications, and microbiological information21. Patients in the database who met the following criteria were selected for the present study: [1] met the diagnostic criteria for DN. ICD-9 (24960, 24961, 25060, 25061, 25062 and 25063) and ICD-10 (E1040, E1042, E1043, E1140, E1141, E1142, E1143 and E1149) codes were used to identify patients with DN in the MIMIC-IV database; [2] first ICU admission at first hospitalization; [3] hospital length of stay > 48 h; [4] age of > 18 years; [5] data of complete blood counts were not missing. Data from these patients were used as the training cohort for model establishment. The data extraction procedure is illustrated in Fig. 1.

Fig. 1
figure 1

Flowchart of the patient selection process.

Data extraction

We extracted a total of 56 variables. Six of the variables are combination inflammatory indicators, which are calculated by extracting existing complete blood counts and performing a formula. The predictors included: (a) demographic information: age and gender; (b) comorbidities: myocardial infarct, congestive heart failure, chronic pulmonary disease, renal disease, hypertension, obesity; (c) the first-day laboratory results in the ICU: the mean level of glucose, the lowest and highest levels of anion gap, the lowest and highest levels of bicarbonate, the lowest and highest levels of creatinine, the lowest and highest levels of blood urea nitrogen (BUN), hemoglobin, platelets, white blood cells (WBC), red blood cells (RBC), red blood cell distribution width (RDW), lymphocytes, neutrophils, monocytes, the lowest and highest levels of calcium, the lowest and highest levels of chloride, the lowest and highest levels of sodium, the lowest and highest levels of potassium, Maximum and minimum values of partial thromboplastin time (PTT), Maximum and minimum values of prothrombin time (PT), and Maximum and minimum values of international normalized ratio (INR); (d) the first-day vital signs in the ICU: the mean levels of heart rate, mean blood pressure (Meanbp), the mean levels of respiratory rate, and the mean levels of temperature; (e)organ scores: simplified acute physiology score Ⅱ(SAPS-Ⅱ), acute physiology score Ⅲ (APS-Ⅲ), logistic organ dysfunction system (LODS), oxford acute severity of illness score (OASIS), Glasgow coma scale (GCS), and charlson comorbidity index (CCI); (f)combination inflammatory indicators: neutrophil–lymphocyte ratio (NLR), platelet–lymphocyte ratio (PLR), lymphocyte-to-monocyte ratio (LMR), platelet–neutrophil ratio (PNR), platelet–WBC ratio (PWR), and systemic immune inflammation index (SII).

Statistical analysis

The data were analyzed using SPSS 27.0 software. Normality testing was conducted using the Shapiro-Wilks test. Continuous variables with normal distributions are presented as mean (SD, standard deviation) and compared using independent samples t-tests. Non-normally distributed continuous variables are expressed as median (interquartile range) and compared using the Kruskal–Wallis test. Categorical variables are described as percentages and compared using the chi-square test. Variables with more than 20% missing values were excluded from further analysis, while those with 20% or fewer missing values were imputed. Missing data in continuous variables were imputed using the mean, while the counting data were imputed using mode. We used the Replace Missing Values function in SPSS 27.0 software to handle the missing data. Patients were classified into “survival” and “non-survival” groups based on their survival status. The results of variable comparisons are displayed in Table 1.

Model development and validation

In the present study, ML algorithms were used to construct a model for predicting the risk factors of death in DN. The LASSO method was used for model selection to reduce model complexity and the risk of overfitting and to optimize model training speed. LASSO is based on regression analysis methods, which select the most useful predictive features from the original data set by constructing a penalty function for the variable. Nonzero coefficient predictors of the coefficients were selected in the LASSO regression model to construct the subsequent DN death risk prediction model22,23,24. In this study, to address the issue of data imbalance between the groups of survival and non-survival, we used a stratified sampling method to randomly divide the overall dataset into two groups, allocating 80% to the training cohort (n = 1050) and 20% to the internal validation cohort (n = 263). Then, RF, XGBoost, SVM, and LR were used to establish a mortality risk prediction model. To minimize overfitting and identify the optimization hyperparameters, 10-fold cross-validation was also performed25. In addition, we presented various parameters associated with model performance in this study, including the area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score. After comparing the performance of four different ML models, the best-performing model is selected as the final model. Subsequently, we comprehensively evaluated and interpreted the ML models using SHAP analysis. The programming written using R software (package version 4.3.2) was used to conduct data analysis.

Table 1 Characteristic at baseline between the survival and Non-survival group.

Results

Patients’ characteristics

In total, 1,313 patients diagnosed with DN were eventually enrolled in our study. Patients were categorized into “survival” (n = 812) and “non-survival” (n = 501) groups, according to their survival status. Table 1 shows the overall baseline characteristics, vital signs, and laboratory parameters of the two groups. In univariate analysis, age at admission; congestive heart failure; chronic pulmonary disease; renal disease; hypertension; obesity; glucose_mean; aniongap_min; aniongap_max; creatinine_min; creatinine_max; BUN_min; BUN_max; hemoglobin_mean; WBC_mean; RBC_mean; RDW_mean; lymphocytes_mean; neutrophils_mean; monocytes_mean; chloride_min; chloride_max; PTT_min; PTT_max; PT_min; PT_max; INR_min; INR_max; meanbp_mean; resprate_mean; SAPS-Ⅱ; APS-Ⅲ; LODS; OASIS; GCS; CCI; NLR; PLR; LMR; PNR; PWR and SII were considered significant between the groups (Table 1).

Features selected in models

This study used LASSO to select features from the data of the training cohort (Fig. 2). Nonzero coefficient predictors of the coefficients were chosen in the LASSO regression model to construct the subsequent DN death risk prediction model. Figure 2.A shows that the “a” line represents the minimum optimal harmonic coefficient of Mean Squared Error (MSE) (λ = 0.01055914) and a total of 21 variables are screened while the “b line represents the optimal harmonic coefficient within 1 SE of MSE (λ = 0.02687268) and a total of 12 variables are screened. Then, we use the boxplot to compare the model constructed with the two λ values. From the median of the boxplots of the two models, lambda. min is closer to the actual outcome and slightly more accurate than lambda.1se. However, it is advisable to reduce the number of feature variables in the model to mitigate overfitting. The screened variables corresponding to lambda.1se are ultimately selected. The 12 variables are: age, obesity, aniongap_min, RDW_mean, neutrophils_mean, chloride_max, PTT_min, PT_min, APS-Ⅲ, CCI, LMR, PWR.

Fig. 2
figure 2

Screening of death predictors in DN patients based on LASSO regression. A: The relationship between log(λ) and MSE after tenfold cross-verification. The “a” line represents the minimum optimal harmonic coefficient of MSE (λ = 0.01055914) and a total of 21 variables are screened. The “b” line represents the optimal harmonic coefficient within 1 SE of MSE (λ = 0.02687268) and a total of 12 variables are screened. B: The relationship between log(λ) and LASSO regression coefficient.

Model comparison

In the model development and validation stage, we first determined the optimal hyperparameters of the RF model: mtry = 2, ntree = 9, max_depth = 10, min_samples_split = 30, min_samples_leaf = 10, max_features=’ sqrt ‘, random_state = 42. The optimal hyperparameters of the SVM model were gamma = 0.1 and cost = 1. Additionally, the optimal hyperparameters of the XGBoost model: num_boost_round = 50, max_depth = 1, eta = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1, and subsample = 1. The final models were trained using optimized hyperparameters. The four ML models (RF, SVM, XGBoost and LR) demonstrated good discriminative power with AUCs (95%CI) of 0.999(0.998, 1.000), 0.762 (0.735, 0.788), 0.837 (0.813, 0.861), 0.792 (0.765, 0.819) in the training cohort, and 0.780 (0.723, 0.837), 0.679 (0.625, 0.734), 0.732 (0.672, 0.793), 0.773(0.718, 0.829) in the internal validation cohort, respectively.

Receiver operating characteristic (ROC) curve comparisons of the four models in the training cohort and the internal validation cohort are shown in Fig. 3. Four ML-based models have been developed and validated, and the RF model performed the strongest in predicting mortality risk of DN with the highest AUC value (0.999) in the training cohort (Fig. 3.A). At the same time, the RF model also performed best in the internal validation queue, with the highest AUC value (0.780) (Fig. 3.B). Results of the accuracy, precision, recall, and F1-score of the four models are shown in Fig. 4. Upon internal validation, the RF model demonstrated superior performance to the other three models, achieving accuracy, precision, recall, and F1 scores of 0.721, 0.735, 0.858, and 0.792 respectively (Fig. 4.B). Therefore, the RF model was selected as the final model. Furthermore, decision curve analysis (DCA) (Fig. 5.A) and clinical impact curve (CIC) (Fig. 5.B) were also performed to further demonstrate the performance of the RF model. Clinical intervention guided by the RF model provided a greater net benefit in the validation cohort when the threshold probability was 0.1–0.7 (Fig. 5.A). The CIC analysis demonstrated the clinical efficacy of this predictive model. When the threshold probability was greater than 60% of the prediction score probability value, the prediction model identified a high-risk population for DN death that closely matched the actual population, confirming the robust clinical effectiveness of the prediction model (Fig. 5.B).

Model interpretation

To reveal the distribution of the effect of each input acting on RF, SHAP analysis was utilized. The significance of each feature was assessed using the Shapley value, which compared the model’s prediction with and without the feature utilizing the “shapviz” package (Fig. 6.A and B). A feature importance analysis was conducted to interpret the significance of the variables, revealing that the top 5 contributing variables were RDW_mean, neutrophils_mean, CCI, chloride_max, and PT_min in descending order. Thus, the RDW_mean was found to be the most important variable in predicting the mortality risk in patients with DN. Among the important variables, neutrophils_mean, CCI, chloride_max, and PT_min are included (Fig. 6.A). The updated SHAP analysis revealed notable changes in feature importance rankings compared to the original model. While RDW_mean and neutrophils_mean maintained their top positions, consistent with their established roles in inflammation and oxidative stress, chloride_max and PT_min emerged as new top-five predictors, replacing APS III and age. The persistence of RDW_mean as the most important feature reinforces its clinical relevance in critical illness, while the elevated importance of chloride_max reflects electrolyte imbalance and PT_min indicates coagulation dysfunction, suggesting these variables may provide additional prognostic value. The consistency of CCI’s high ranking across both analyses supports its well-documented role in mortality prediction. In each feature important row, the attributions of all patients to the outcome were drawn with dots of different colors, where the yellow dots represent the high-risk value and the purple dots represent the low-risk value. Elevated levels of RDW and neutrophils, higher scores on the CCI, lower serum chloride levels, and increased PT were all associated with an increased predicted risk of mortality (Fig. 6.B). The SHAP explanatory force plots in Fig. 6.C-E were utilized to analyze three cases within the validation group (#156, #994, and #1144). The patient in case 156 had a survival outcome; case 994 resulted in hospital death, and the patient in case 1144 experienced non-hospital death within one year. The Shapley value of each variable is depicted by an arrow, where the yellow arrow signifies a positive impact, i.e., the feature enhances the predicted value; whereas the purple arrow denotes a negative impact, indicating that the feature diminishes the predicted value. The force plots also display the primary variables and their corresponding values. In the three cases in Fig. 6.C-E, the contribution values of the variables are different, indicating that the influential variables on the prediction vary from case to case.

Fig. 3
figure 3

Comparing the different ML models’ AUC in the training (A) and validation (B) cohorts ML, machine learning; AUC, area under the receiver operating characteristic curve; RF, Random Forest; SVM, Support vector Machine; XGBoost, Extreme Gradient Boosting; LR, Logistic regression. Note. Figure 3 reflects updated AUC values for the four models.

Fig. 4
figure 4

The comparison of performance in the four models in training (A) and validation (B) cohorts RF, Random Forest; SVM, Support vector Machine; XGBoost, Extreme Gradient Boosting; LR, Logistic regression. Note. Figure 4 presents the updated performance metrics, including accuracy, precision, recall, and F1 scores, across the four evaluated models.

Fig. 5
figure 5

DCA (A) and CIC (B) of the RF model DCA, decision curve analysis; CIC, clinical impact curve; RF, Random Forest. Note. Figure 5 presents the updated DCA and CIC for the RF model.

Fig. 6
figure 6figure 6

The model’s interpretation. A: The importance ranking of the 12 variables according to the mean (|SHAP value|). B: The importance ranking of the 12 risk factors with stability and interpretation using the optimal model. In the right figure, the higher SHAP value of a feature is given, the higher risk of death the patient would have, while it is the opposite in the left one. The yellow part in the feature value represents a higher value. C: SHAP value force plot of case 156 was utilized for the individual prediction of characteristic variables. D: SHAP value force plot of case 994 was utilized for the individual prediction of characteristic variables. E: SHAP value force plot of case 1144 was utilized for the individual prediction of characteristic variables. Note. Figure 6 presents the revised feature importance ranking and corresponding SHAP explanatory force plots.

Discussion

This study has made several contributions. First, we introduced the RF algorithm, one of the state-of-the-art ML methods for developing prediction models25,26. The RF algorithm is particularly well-suited for analyzing large-scale datasets and demonstrates superior performance in generating high-accuracy classification predictions. These capabilities make it a valuable tool for supporting personalized medical decision-making, particularly in applications such as disease risk assessment27. To optimize model performance, hyperparameter tuning was conducted using the “caret” package. In this study, comparative analyses with conventional risk prediction models, including traditional LR, revealed that the RF model achieved significantly enhanced predictive performance across multiple key metrics. Specifically, the RF algorithm exhibited notable improvements in AUC, accuracy, precision, recall, and F1-score, underscoring its robustness and superiority over classical approaches. Furthermore, DCA and CIC were conducted to further illustrate the performance of the RF model. Finally, ML is a good predictive tool and usually has high accuracy. However, it has always been regarded as a “black box,” indicating poor interpretability28. In our study, we conducted SHAP analysis to interpret one of our best-performing models and identify several predictors most strongly associated with risk factors for death in patients with DN.

Initially, RDW was ranked as the most important variable among the 12 predictors based on the mean |SHAP value|. Previous studies have indicated that the development and progression of DN are associated with oxidative stress and inflammation4,29,30,31. Additionally, elevated RDW has been linked to higher rates of amputation and mortality in DN patients32,33. Although the exact mechanisms underlying the relationship between high RDW and poor prognosis in DN are not fully understood, inflammation and oxidative stress are likely contributing factors. Inflammation can disrupt bone marrow function and iron metabolism, which inhibits red blood cell production, shortens erythrocyte half-life, and promotes anisocytosis. These changes ultimately lead to an increased RDW34. Similarly, oxidative stress directly damages erythrocytes, reducing their survival and further elevating RDW levels35,36. High oxidative stress may lower RBC survival and promote the release of immature erythrocytes of varying sizes into the bloodstream, leading to anisocytosis37,38. RDW elevation occurs in response to both inflammatory stimuli and oxidative stress—both of which are associated with worse outcomes in DN. Therefore, increased RDW is closely related to an unfavorable prognosis in DN, underscoring its value as a predictive marker for patient outcomes.

Subsequently, our results indicate that inflammatory markers such as neutrophils, LMR, and PWR are associated with mortality risk in DN, meriting further study. Previous retrospective studies have established a strong link between inflammation and diabetic neuropathies39,40. Diabetes and neurodegeneration involve inflammatory pathways such as NF-κB and MAPK41. Activation of these pathways promotes the release of pro-inflammatory cytokines like IL-1, IL-6, and TNF-α, which contribute to apoptosis and nerve damage. These substances further amplify inflammation through positive feedback loops with necrotic tissue, initiating a cascade response42. Therefore, the inflammatory response is closely associated with the prognosis of DN. Meanwhile, Tuna Demirdal et al.43, as well as YunXu et al.44, demonstrated that some combinations of inflammatory markers, such as PLR, NLR, and LMR, are associated with the risk of amputation in patients with diabetic foot ulcers. In summary, inflammatory markers play a significant role in predicting mortality risk in patients with DN.

Additionally, the CCI is a tool specifically designed to predict long-term mortality. The clinometric sensitivity of the CCI has been demonstrated in a variety of medical conditions, with stepwise increases in the CCI associated with stepwise increases in mortality45. The APS-Ⅲ can predict vital status at hospital discharge with the use of data recorded at ICU admission46. It always shows excellent mortality prediction performance. In this study, CCI and APS-Ⅲ were also important characteristic variables in predicting the risk of death in patients with DN. Age is also an essential factor that cannot be ignored. The older the age, the higher the risk of death in DN patients.

Furthermore, low serum chloride levels are associated with higher mortality risk in patients with DN. Serum chloride plays a vital role in maintaining acid-base balance and plasma electroneutrality47,48. A study by Kai Zhang et al.49 found that lower serum chloride concentrations were negatively correlated with in-hospital mortality among diabetics. Additionally, a cross-sectional analysis by Bhagwant G. Pawar et al.50 showed that reduced chloride levels in type 2 diabetes patients were strongly associated with poor glycemic control and longer disease duration. Mechanistically, low serum chloride may affect muscle cell volume and pH regulation by activating with-no-lysine (WNK) kinases51,52 and enhancing Na–K–2Cl and Na–Cl co-transporter activity53. Such dysregulation of cardiomyocyte pH can lead to arrhythmias and impaired myocardial contractility54. Therefore, decreased serum chloride may contribute to poor prognosis in diabetic neuropathy patients.

Moreover, our study suggests that specific coagulation parameters are associated with mortality risk in DN patients. Abnormal coagulation function has been implicated in the pathological progression of DN, with a hypercoagulable state contributing to diabetic vascular complications55. For instance, elevated plasma D-dimer levels have been linked to impaired peripheral nerve conduction and a higher prevalence of DPN in type 2 diabetes, indicating its potential as a prognostic marker in DN56. However, the relationship between PT and PTT and the risk of mortality in patients with DN has seldom been reported. This study effectively supplemented this point, proving that coagulation function indicators such as PT and PTT can be used as predictors of death risk in patients with DN, laying a foundation for further exploration of the mechanism related to coagulation function indicators and death risk in patients with DN.

Ultimately, obesity is a crucial and pertinent factor in our predictive model. Some studies have shown that57,58 obesity is an important risk factor for early diabetic neuropathy. Simultaneously, research has indicated that the absence of blood-brain (nerve) barrier protection renders the peripheral nervous system (PNS) susceptible to dysfunction induced by obesity, such as loss of peripheral sensory neurons and small intraepidermal nerve fibers, distribution of sensory polyneuropathy in a glove-sock pattern, and decreased motor and sensory nerve function59. The mechanism may involve obesity’s effects on anabolic and catabolic processes mediated through the autonomic nervous system (ANS), a component of the PNS60. Obesity is thus instrumental in the development and progression of diabetic neuropathy. Our study further supports that obesity contributes to mortality in patients with diabetic neuropathy. These results highlight the importance of weight management in clinical care to improve prognosis in this population.

This study has several important limitations that must be acknowledged. First, we observed a substantial performance gap (ΔAUC = 0.219) between the training set (AUC 0.999, 95% CI 0.998–1.000) and the validation set (AUC 0.780, 95% CI 0.723–0.837), indicating significant overfitting in the RF model. This discrepancy suggests that while the model achieves near-perfect discrimination in the training data, its generalizability to unseen data is suboptimal. The underlying reasons may include: (1) the limited dataset size (n = 1,313), which restricts the model’s ability to learn robust patterns; (2) potential feature selection bias, where certain predictors may be spuriously correlated with outcomes in the training cohort but not in broader populations; and (3) data heterogeneity, as the MIMIC-IV ICU dataset may not fully represent real-world variability in patient characteristics. Clinically, the observed 21.9% decline in AUC (from 0.999 to 0.780) raises concerns about the model’s reliability in new ICU cohorts. If deployed without further refinement, the model could produce overly optimistic risk predictions, potentially leading to misinterpretations in clinical decision-making. To address these limitations, we propose the following steps: (1) External validation: We will seek authorization to assess the model’s generalizability by utilizing the eICU Collaborative Research Database, which contains diverse ICU patient data collected from multiple hospitals across the United States. (2) Multi-center data collaboration: We plan to collect additional institutional data and establish partnerships with other hospitals to expand the external validation cohort. (3) Incorporation of alternative models (e.g., gradient boosting, neural networks) in follow-up studies. Second, missing data were present in this study, leading to the exclusion of variables with more than 20% missing values. However, we addressed these limitations by employing interpolation methods using SPSS 27.0 software. Finally, it is crucial to note that this is a retrospective study based on a public database and thus carries inherent limitations.

Conclusion

In conclusion, machine learning methods are reliable tools for predicting mortality risk in patients with DN. We utilized data from the MIMIC-IV database to develop a machine learning-based model for predicting mortality risk in DN patients. The results indicated that RDW_mean, neutrophils_mean, CCI, chloride_max, and PT_min were the most important predictors of mortality risk in DN. The interpretable model demonstrated robust performance, showing potential to assist in clinical decision-making and improve prognostic outcomes. However, it should be noted that the applicability of the proposed random forest model remains to be further validated in external cohorts, and results should be interpreted with caution until such validation is performed.