Introduction

Acute kidney injury (AKI) is a complex heterogeneous syndrome. It presents as a rapid deterioration in kidney function and is frequently observed in intensive care units (ICUs)1,2. Currently, AKI has become a significant global public health issue. The incidence rate is as high as 60-70%3,4. The incidence of heart failure (HF) continues to increase with population growth and aging, affecting more than 40 million people worldwide5. It is important to note that AKI is increasingly prevalent among hospitalized patients with cardiovascular disease. It occurs in up to 47% of patients with HF6,7. In recent years, there has been increasing evidence of a strong association between diabetes mellitus (DM) and AKI. DM is an independent risk factor for AKI, with approximately 10–20% of DM patients experiencing this condition8,9. Mina et al.10 demonstrated that patients with DM, especially older people and those with other comorbidities like heart failure, have a notably higher risk of developing AKI. DM not only increases HF risk, but also leads to a poor prognosis, especially in ICU patients11,12. DM and HF often occur simultaneously, significantly increasing the risk of AKI in patients. The mechanism may be related to various factors such as altered renal hemodynamics, inflammatory response, oxidative stress, and drug toxicity (Fig. 1)13,14.

Fig. 1
Fig. 1
Full size image

Diabetes mellitus combined with heart failure leading to acute kidney injury.

AKI not only leads to increased medical costs and prolonged hospital stays but also results in higher risks of mortality and adverse renal events. The mortality for hospitalized patients with AKI is 10.8%15,16. Therefore, early identification and diagnosis of AKI can provide valuable references for clinicians. This enables clinicians to reverse early AKI through timely intervention, thereby improving patient prognosis and reducing AKI-related mortality17.

Traditional AKI risk assessment methods, such as clinical scoring systems like the risk injury failure loss of function end stage renal disease criteria and the acute kidney injury network criteria, are primarily employed to assess the severity of AKI. However, these methods are not very effective in predicting the risk of developing AKI18. In addition, traditional predictive models, such as logistic regression (LR) and Cox proportional risk models, may fail to capture complex nonlinear relationships due to their linear assumptions, limiting predictive performance19. Compared with traditional prediction models, machine learning (ML), as a powerful data analysis tool, can better handle complex nonlinear relationships and shows great potential in disease prediction20. ML algorithms can learn complex patterns from a large number of patient data, identify potential risk factors, and be used to build prediction models, which have shown good prediction performance in predicting cardiovascular diseases and other diseases20,21,22.

Currently, there is no existing risk prediction model for AKI in DM patients with HF. As of August 2024, a search utilizing the keywords “acute kidney injury”, “diabetes”, “heart failure”, “prediction”, and “machine learning” in the PubMed, Google Scholar, Embase, and Web of Science databases found no published AKI risk prediction models for this specific population. Therefore, this study aims to develop six ML models for predicting AKI risk during hospitalization in DM patients with HF. Ultimately, the goal is to identify the model that demonstrates the best predictive performance. This model will help clinicians better identify high-risk patients and take timely interventions to improve patient prognosis.

Methods

Data sources

The data were sourced from the large critical care database of MIMIC-IV 2.2. The database used in this study received approval from both the Beth Israel Deaconess Medical Center (BIDMC) and the Massachusetts Institute of Technology Institutional Review Board. It encompasses detailed clinical data on patients admitted to the BIDMC ICU between 2008 and 2019, including demographic information, laboratory indicators, medication records, and vital signs. Data privacy was achieved by removing patient identification numbers for anonymization and offset over time. Therefore, patient consent was not required for this study23. After signing a data use agreement, completing the National Institutes of Health web-based training course and the protecting human study exam, Guojing Li (certification No.: 12410114) was certified and entitled to data access and use privileges.

Study population

Patients meeting the following criteria were included: (i) aged 19 to 89 years old; (ii) admitted to ICU for the first time and stayed in the hospital for more than 24 h; (iii) DM patients with HF diagnosed according to the codes in the International Classification of Diseases 9th edition (ICD-9) (249, 250, and 428) and ICD-10 (E08, E09, E10, E11, E13, and I50); and (iv) DM patients with HF that were assessed for the development of AKI within 30 days of hospitalization according to the criteria of the Kidney Disease: Improving Global Outcomes guideline (Table 1)2. Patients were excluded due to: (i) no ICU hospitalization record; (ii) follow-up time less than 24 h or greater than 30 days; (iii) previous history of AKI; and (iv) renal replacement therapy (RRT) or continuous RRT (CRRT) on admission. In the end, we included 1,457 patients. The study population screening flow chart is shown in Fig. 2.

Table 1 Definition and staging standard of acute kidney injury in the kidney disease: improving global outcomes.
Fig. 2
Fig. 2
Full size image

Patient selection process flow chart.

Data extraction and missing value processing

This study used structured query language to extract complete electronic medical record data from the MIMIC-IV database. The data pertained to the first ICU admission of DM patients with HF who met the inclusion criteria. In the study, the following basic characteristics of the patients were extracted: (i) demographic characteristics, including age, gender, race, marriage, type of admission, length of hospitalization, and whether or not patients died in-hospital; (ii) vital signs, including temperature, heart rate (HR), respiratory rate (RR), blood pressure (BP), and oxygen saturation (SpO2); (iii) laboratory parameters, including serum creatinine, lactate, PH, anion gap, oxygen partial pressure, carbon dioxide partial pressure, bicarbonate, base excess, potassium, sodium, chloride, calcium, phosphate, white blood cell count, red blood cell count, platelet count, red blood cell distribution width, mean red blood cell volume, and hematocrit, hemoglobin, glucose, blood urea nitrogen, prothrombin time (PT), international normalized ratio (INR), partial thromboplastin time (PTT), triglycerides, N-terminal pro-brain natriuretic peptide (NT-PROBNP), aspartate aminotransferase, glycated hemoglobin A 1c, and urinary output; (iv) comorbidities, including anemia, hyperlipidemia, hypertension, respiratory failure, pneumonia, hypotension, sepsis (Sepsis 3), cerebrovascular disease, atrial fibrillation, atherosclerosis, myocardial infarction, obesity, electrolyte disorders, sleep disorders, pulmonary heart disease, urinary tract obstruction, and urinary tract infection; (v) personal history, including alcoholism and smoking; (vi) surgical history, including percutaneous coronary intervention, revascularization, coronary artery bypass grafting, intra-aortic balloon pump, and urologic surgery; (vii) scores related to the severity of diseases, including the Glasgow coma scale (GCS) score, the acute physiology score III (APSIII), the Oxford acute severity of illness score, the simplified acute physiology score (SAPSII), the systemic inflammatory response syndrome (SIRS) criteria, and the sequential organ failure assessment (SOFA) score; and (viii) drugs and interventions, including antiarrhythmics, diuretics, angiotensin-converting enzyme inhibitors (ACEIs)/angiotensin II receptor blockers (ARBs), aldosterone receptor antagonists, beta-blockers, statins, nonsteroidal anti-inflammatory drugs, insulin, positive inotropic agents, anticoagulants, albumin, calcium gluconate, antibiotics, calcium channel blockers, mechanical ventilation, RRT, and extracorporeal membrane oxygenation (ECMO). The estimated glomerular filtration rate (eGFR) was determined using the equation: 175 * serum creatinine − 1.154 * age − 0.203 (* 0.742 if female)24.

For laboratory indicators that were measured multiple times within 24 h of admission, the first measurement was used in this study. To mitigate reverse-causation bias, data collected after the occurrence of an outcome event were deemed invalid. Additionally, to minimize the bias stemming from missing data, this study employed multiple imputation (MI) based on random forest (RF) for variables with less than 20% missing values. The mice package in R software was leveraged for imputation, generating five imputed datasets25,26. Variables with missing values ≥ 20%, including lactate, pH, INR, SOFA, and GCS scores, were kept in the model owing to their clinical relevance, and missing data were handled during model training. Variables with missing data exceeding 20% and considered to have lower significance for AKI prediction were excluded from the model. Sepsis was diagnosed through Sepsis 3.0, and other comorbidities were diagnosed through ICD-9 and ICD-1027. The primary study endpoint was the occurrence of AKI within 30 days of ICU admission in DM patients with HF.

Statistical methods

This study used a multistep approach to develop and validate ML models for predicting AKI risk in DM patients with HF. The data were initially tested for normality by the Shapiro-Wilk method. Descriptive statistics and hypothesis testing were also used according to the type of data. Categorical variables were expressed as percentages (%). Differences were tested using Pearson’s chi-square test or Fisher’s exact probability method. Continuous variables were described using mean ± standard deviation or interquartile range [IQR], respectively, depending on normal distribution. In addition, Student’s t-test or Wilcoxon rank sum test/Mann-Whitney U-test was employed to compare differences between groups, respectively28.

To screen for key risk factors of AKI, LASSO regression, and stepwise LR models were used for feature selection. The LASSO regression model was employed to identify the optimal λ value based on the principle of minimum mean square error and one-standard error. Variables with non-zero coefficients were then screened. Subsequently, the screened variables were included in a stepwise LR model. Moreover, the ORs and 95% CIs of the stepwise LR model were calculated to further screen for significant risk factors for AKI using the Akaike information criteria (AIC) as a criterion (P < 0.05)29.

Based on the screened features, six ML algorithms were utilized in this study to develop AKI prediction models, namely LR, light gradient boosting (LightGBM), RF, gradient boosting (GBM), neural network (NN), and K nearest neighbors (KNN). To prevent overfitting and optimize model performance, 5-fold cross-validation, and 100-bootstrap resampling were used to select parameters for models. Model performance was ultimately comprehensively assessed by receiver operating characteristic (ROC) curve, calibration curve, and decision curve analysis (DCA), including metrics such as the area under the ROC curve (AUC), sensitivity, specificity, calibration, and net clinical benefit. A nomogram was plotted to visualize the predicted results of the models. To further interpret the predicted results of models and identify key risk factors, the best-performing model was selected in this study. Moreover, the contribution of each feature to AKI prediction was analyzed using the Shapley additive explanations (SHAP) method. SHAP values can visually demonstrate the importance of each feature. The association between certain features and predicted results was analyzed in depth through partial dependence plots. Additionally, a clinical impact curve (CIC) based on the optimal model was plotted in this study to assess the clinical practicality and applicability of the model.

All statistical analyses and modeling procedures were done using R software (version 4.0.5). A two-sided P value of less than 0.05 was deemed to be statistically significant.

Results

Baseline characteristics

This study classified 1,457 DM patients with HF into the AKI group (n = 1212) and non-AKI group (n = 245) based on KDIGO standards. Table 2 presents the baseline characteristics of both groups, including median (interquartile range) or frequency (percentage) for each variable. Firstly, a marked difference in racial distribution was found (P = 0.004). While white patients accounted for 63.8% of the total population, they represented 65.1% of the AKI group, showing an increase compared to the non-AKI group (57.1%). Meanwhile, the proportion of Asian and Black patients in the AKI group was lower, indicating that race could be a factor influencing the incidence of AKI. Notably, the hospital mortality in the AKI group was noticeably higher than in the non-AKI group (14.5% vs. 6.1%, P = 0.001), suggesting a link between AKI and adverse outcomes. In terms of comorbidities, the incidence of anemia was notably higher in AKI patients compared to non-AKI patients (66.3% vs. 55.5%, P = 0.002). Additionally, the incidence of atrial fibrillation, arrhythmias, and obesity was also evidently higher in the AKI group (atrial fibrillation: 57.3% vs. 43.3%, P < 0.001; arrhythmias: 70.9% vs. 57.1%, P < 0.001; obesity: 31.9% vs. 18.8%, P < 0.001). The incidence of chronic obstructive pulmonary disease was noticeably higher in the AKI group compared to the non-AKI group (18.4% vs. 24.9%, P = 0.024), indicating that such chronic conditions may elevate the risk of AKI. Moreover, urine output was markedly lower in the AKI group than in the non-AKI group (70.00 ml vs. 130.00 ml, P < 0.001), emphasizing the severity of AKI, as this parameter is a critical indicator of kidney function. Finally, the SOFA scores and APACHE II scores, among other critical indicators, in the AKI group were notably higher than those in the non-AKI group (P < 0.001). This finding suggested a close connection between the severity of illness and the occurrence of AKI.

Table 2 Baseline characteristics of patients in AKI group and non-AKI group.

Furthermore, to construct and validate the prediction model, this study randomly divided 1,457 patients into a training set (n = 1020) and a validation set (n = 437) in a ratio of 7:3 (Table 3). P-values of both sets for features such as demographics, laboratory indices, vital signs, comorbidities, medications, and interventions were greater than 0.05. This demonstrated that no significant statistical differences in features were observed between the two groups of patients. Moreover, the two groups were comparable. Overall, AKI occurred within 30 days of hospitalization in 1,212 patients (83.2%), and 191 patients (13.1%) died during hospitalization. Demographically, 868 patients (59.6%) were male, accounting for the majority of the total cohort. Moreover, 929 patients (63.8%) were white. 567 patients (38.9%) were admitted as emergency cases. The patients had a median age of 72.8 years old [IQR: 64.9–80.4]. The median follow-up time was 1.65 days [IQR: 0.71–4.79], and the median eGFR value was 58.14 mL/min [38.78, 84.79] (P = 0.098). DM patients with HF were more likely to suffer from anemia (64.5%), atrial fibrillation (55.0%), hyperlipidemia (70.0%), hypertension (88.3%), and sepsis (69.4%).

Table 3 Baseline characteristics of patients in the training group and the validation group.

Feature selection

The results of cross-validation and coefficient curves for variable screening by LASSO are shown in Fig. 3. Figure 3a is a trend diagram of model error changing with penalty coefficient λ in the cross-validation process of LASSO regression. Figure 3b illustrates the variation curve of variable coefficients in LASSO regression. Specifically, the x-axis at the bottom of Fig. 3b represents the log(λ) values, while the x-axis at the top indicates the number of selected variables. Each line represents the coefficient trend of a variable, while the y-axis displays the coefficient for that variable. With an increase in the log(λ) value, the coefficient of the variable gradually tends toward 0, reflecting a diminishing importance of the feature. Once the optimal λ value is selected, the number of selected variables decreases, accomplishing the objective of feature selection. In order to obtain a high-performance and streamlined model, this study used 10-fold cross-validation on the LASSO regression model, selecting the number of variables (n = 40) and parameter λ (λ = 0.0198) corresponding to one standard deviation from the minimum mean square error (vertical dotted line on the right side of Fig. 3a). There were 40 risk factors with non-zero coefficients that were associated with AKI within 30 days of hospitalization in DM patients with HF.

Fig. 3
Fig. 3
Full size image

Coefficient curves and cross-validation plots for acute kidney injury risk-related variable screening based on LASSO regression models.

40 variables were analyzed by the stepwise LR. Variables with P ≥ 0.05 were excluded through bidirectional stepwise selection based on AIC. The final prediction models included 20 features identified as independent predictors of AKI in DM patients with HF, including admission type, anemia, atrial fibrillation, hypotension, obesity, pneumonia, abnormal creatinine values, antiarrhythmics, diuretics, anticoagulants, ACEI/ARB, aspirin, statins, calcium gluconate, β lactam antibiotics, RRT, ECMO, APSIII score, sepsis, and a history of cardiovascular surgery. Specific parameters can be found in Table 4. Regarding admission type, the results indicated that, compared to emergency admissions (baseline category), patients admitted for surgery had a markedly reduced risk of AKI (OR = 0.265, p = 0.042). No notable differences in AKI risk were noted among patients admitted electively, for observation, or via the emergency department. This indicated that admission type was related to AKI risk, and emergency admission may indicate more critical conditions, requiring further study. Comorbidities such as anemia, obesity, atrial fibrillation, and sepsis 3, with OR > 1, suggested that these were risk factors for the development of AKI. For instance, the risk of AKI in obese patients was 2.252 times higher than in non-obese patients (OR = 2.252, P < 0.001). Medications such as antiarrhythmics, diuretics, and statins, with an OR < 1, indicated that they were connected to a lower risk of AKI. As an example, patients taking antiarrhythmics had a notably lower risk of developing AKI compared to those not using the medication (OR = 0.424, P < 0.001). For each 1-point increased in the APSIII score, the risk of AKI raised by 2% (OR = 1.020, P = 0.003). These 20 selected predictors were employed to build subsequent machine learning models.

Table 4 Multivariate logistic regression analysis of risk factors for AKI in patients with diabetes and heart failure.

Construction of the AKI predictive risk nomogram and models

In order to improve the interpretability and clinical practicality of models, a nomogram (Fig. 4) based on 20 independent predictors screened by LR analysis was built. The nomogram was employed to predict AKI risk in DM patients with HF within 30 days during hospitalization. The nomogram linked each predictor to its respective risk score. The patient’s total points were calculated by summing the individual points of all the predictors. The total points could be located on the nomogram to determine the corresponding AKI risk probability. Higher total points correlated with an increased risk of AKI. The total points on this nomogram mainly ranged from 115 to 400, corresponding to an AKI risk from 0 to 99.9%. Importantly, the nomogram was designed to provide an easy-to-use risk assessment tool by consolidating multiple predictors into a single risk score for clinical application.

Fig. 4
Fig. 4
Full size image

Nomogram of predicted 30-day risk of acute kidney injury in patients with diabetes mellitus combined with heart failure. ACEI/ARB: angiotensin-converting enzyme inhibitor/angiotensin II receptor blocker, RRT: renal replacement therapy, ECMO: extracorporeal membrane oxygenation, APSIII: acute physiology score III.

Six ML models were constructed to predict AKI risk during hospitalization in DM patients with HF, including GBM, KNN, LightGBM, LR, NN, and RF models. Each model underwent five-fold cross-validation and 100-bootstrap resampling for parameter optimization. The final optimal parameters for each model were as follows: ntree = 405, mtry = 5, maxnodes = 93, and nodesize = 4 for the RF model; three-layer structure, 21 input features, 28 nodes in the hidden layer and 1 node in the output layer for the NN model; size = 2, decay = 2, and maxit = 8778 for the LR model; n.trees = 200, interaction.depth = 1, and shrinkage = 0.3 for the GBM model; k = 13 for the KNN model; learning_rate = 0.0195, num_leaves = 37, n_estimators = 156, feature_fraction = 0.945, bagging_fraction = 0.745, and bagging_freq = 3 for the LightGBM model.

Performance evaluation and validation of AKI risk prediction models

In this study, AUC was leveraged to evaluate the discrimination of models. The calibration of models was assessed by calibration curves (Table 5; Fig. 5). The six ML models show good discrimination in both the training and validation sets. The AUCs were all above 0.75. Among them, the LightGBM model and RF model performed the best in the training set, with AUCs of 0.973 and 0.987, respectively. However, the RF model showed a decrease in AUC to 0.799 in the validation set, indicating a possible degree of overfitting. The LightGBM model demonstrated the best performance in the validation set, with an AUC of 0.804, demonstrating stronger generalization ability. The calibration curves for each model were close to the reference line. The results of the Hosmer-Lemeshow test indicated P-values greater than 0.05, suggesting no notable difference between the predicted probabilities and the observed probabilities in the models. This finding indicated good calibration.

Table 5 Assessment of the six prediction models.
Fig. 5
Fig. 5
Full size image

ROC curve and calibration curve of the nomogram of the risk of acute kidney injury in patients with diabetes mellitus combined with heart failure in the training and validation sets. (a) ROC curve of the nomogram in the training set; (b) ROC curve of the nomogram in the validation set; (c) calibration curve of the nomogram in the training set; (d) calibration curve of the nomogram in the validation set. ROC: receiver operating characteristic curve, AUC: area under the ROC curve, CI: confidence interval, TPR: true positive rate, FPR: false positive rate, HLGOF: Hosmer-Lemeshow goodness-of-fit test.

In this study, DCA was employed to evaluate the clinical performance of each model for predicting AKI risk in DM patients with HF (Fig. 6). The results showed that all models had better net benefits than the full intervention and no intervention strategies within a wide range of threshold probabilities, confirming their clinical predictive performance. The LightGBM model showed the highest net benefit in both the training and validation sets, indicating its optimal clinical performance.

Fig. 6
Fig. 6
Full size image

Clinical decision curve of the nomogram of the risk of acute kidney injury in patients with diabetes mellitus combined with heart failure in the training and validation sets. (af) Clinical decision curves of the nomogram in the training set; (gl) clinical decision curves of the nomogram in the validation set.

In this study, SHAP values were used to quantify the significance of the top 15 features in the LightGBM model (Fig. 7). The results revealed that the use of ACEI/ARB drugs notably impacted AKI risk prediction, followed by cardiovascular surgery, diuretics, APSIII score, and aspirin. Attention should be paid to these important features in risk assessment and treatment planning in clinical practice.

Fig. 7
Fig. 7
Full size image

Results of SHAP analysis. (a) Summary chart: the impact of the top 15 features in the LightGBM model calculated by SHAP values on acute kidney injury prediction; (b) bar chart: importance ranking of the features in the LightGBM model that were most relevant to acute kidney injury prediction. The color indicates the importance of the feature (yellow for high, purple for low) and each point represents a sample. SHAP: Shapley additive explanations, ACEI/ARB: angiotensin-converting enzyme inhibitor/angiotensin II receptor blocker, APSIII: acute physiology score III.

The SHAP plot, based on the RF algorithm, illustrated the contribution of each feature to the prediction of the model. Unlike the nomogram, the SHAP plot was not intended for the direct calculation of risk probabilities. Rather, it served to identify the features that had the most significant influence on the prediction of the model. In the bar chart, positive values represented a positive contribution to AKI risk, negative values indicated a negative contribution, and zero values denoted no contribution.

For an in-depth analysis of the association between the features and AKI risk, the study plotted their associated partial dependence plots based on the four important features of ACEI/ARB use, diuretic dose, atrial fibrillation, and APSIII score (Fig. 8). The results demonstrated that ACEI/ARB use reduced the risk of AKI. Moreover, the protective effect may be dose-dependent. The impact of diuretics on AKI risk presented a bidirectional effect, with low doses potentially reducing risk and high doses potentially increasing risk. Atrial fibrillation was identified as an independent risk factor for AKI. However, APSIII score was positively correlated with AKI risk. The above results indicated that the LightGBM model can effectively identify key predictors and reveal their complex relationship with AKI risk, providing a reference for clinical risk assessment and personalized treatment.

Fig. 8
Fig. 8
Full size image

Partial dependence plots of the LightGBM model used to predict the risk of acute kidney injury within 30 days of hospitalization in patients with diabetes mellitus combined with heart failure. SHAP: Shapley additive explanations, ACEI/ARB: angiotensin-converting enzyme inhibitor/angiotensin II receptor blocker, APSIII: acute physiology score III.

To evaluate the clinical applicability of the LightGBM model, a clinical impact curve (CIC) was created in this study (Fig. 9). The findings revealed that, as expected, the number of high-risk patients predicted by the model gradually decreased as the high-risk threshold increased. The trends of the curves in both the training and validation sets were largely consistent, suggesting that the model exhibited good generalization ability without any noticeable overfitting. The curves for both the training and validation sets reached a better equilibrium at around a threshold value of 0.7. This indicated that choosing this threshold was effective in reducing false positives and avoiding excessive medical interventions while identifying a sufficient number of truly high-risk patients. This reflected a good cost-benefit ratio.

Fig. 9
Fig. 9
Full size image

Clinical impact curve for the LightGBM model. a represents the training set and b represents the validation set. The red curve represents the number of patients predicted to be at high risk by the model (number high risk); the blue curve represents the number of patients predicted to be at high risk with event (number high risk with event). The results of the clinical impact curves confirmed the clinical value of the LightGBM model.

Discussion

The clinical data of 1,457 DM patients with HF from the MIMIC-IV database were retrospectively analyzed. ML models were successfully constructed to predict AKI risk during hospitalization. The study findings indicated that the LightGBM model exhibited the highest performance, with the highest AUC values, good calibration, and higher net clinical benefit. In addition, it exhibited good discrimination and generalization ability. This study represents the first attempt to construct models for predicting AKI risk in this population using common clinical indicators. This study offers an effective tool for the early identification of high-risk patients and timely intervention measures in clinical practice.

Analysis of study results and exploration of risk factors

This study included 1,457 DM patients with HF. It aimed to investigate the risk of developing AKI and related predictors. The results showed that up to 83.2% of patients developed AKI within 30 days of hospitalization. This suggests that AKI is very common in this population and requires early identification and intervention.

20 independent predictive factors associated with AKI risk were identified by the LASSO regression and stepwise logistic regression analysis. Independent risk factors for AKI development include emergency admission, abnormal creatinine values, anemia, atrial fibrillation, hypotension, obesity, pneumonia, APSIII, sepsis, and a history of cardiovascular surgery. Further analysis reveals that the above predictors affect the occurrence of AKI mainly through the following mechanisms: affecting renal perfusion (such as hypotension, anemia, and cardiovascular surgery)30,31,32,33; and inducing inflammatory responses or direct injury to the kidneys (such as sepsis, pneumonia, and certain drugs)34,35. Atrial fibrillation can lead to the formation of thrombosis in the atria. After thrombus dislodgement, renal artery embolism occurs, blocking renal blood flow and increasing the risk of AKI. Obesity damages kidney function through mechanisms such as chronic inflammation, insulin resistance, and oxidative stress36,37. The APSIII score reflects the critical condition of patients, often accompanied by shock, sepsis, and other conditions. Higher scores indicate a higher risk of developing AKI. Emergency admission often suggests that patients are in critical condition. There may be hemodynamic instability, infection, or shock, which may affect renal perfusion. On the contrary, the use of antiarrhythmic drugs, ACEI/ARB drugs, aspirin, statins, β lactam antibiotics, calcium gluconate, RRT, and ECMO is linked to a reduced AKI risk in DM patients with HF. Antiarrhythmic drugs can effectively reduce the risk of AKI caused by cardiogenic shock by controlling arrhythmia, improving heart function, and stabilizing blood pressure. ACEI/ARB drugs can delay the progression of chronic kidney disease by lowering blood pressure and reducing proteinuria. In addition, although aspirin and statins are primarily utilized for preventing cardiovascular events, their role in improving vascular endothelial function may indirectly reduce the risk of renal artery embolism caused by thrombosis. β lactam antibiotics can effectively control bacterial infections and reduce the associated risk of AKI. Calcium gluconate may improve kidney function by regulating calcium and phosphorus metabolism. RRT can reduce the burden on the kidneys. For critically ill patients, ECMO treatment can replace cardiopulmonary function, improve systemic circulation, maintain blood pressure, and reduce the risk of AKI due to shock and other factors. It is worth noting that diuretics have a dual effect on AKI risk. Rational use of diuretics can reduce the heart load and improve renal perfusion, but excessive use may lead to dehydration and increase the risk of AKI.

In addition to the above 20 predictors, new biomarkers, genetic factors, and gut microbiota may also affect AKI risk in DM patients with HF, and provide a direction for the optimization of AKI risk prediction models in the future. For example, mid-stream urine biomarkers (such as NGAL, KIM-1, IL-18) and serum biomarkers (such as cystatin C, β 2-microglobulin) can reflect kidney injury earlier and more sensitively. It is expected to improve the prediction accuracy by incorporating these biomarkers into prediction models38,39,40,41. The polymorphism of genes such as TNF-α, IL-6, and eNOS may affect the susceptibility of patients to AKI. Conducting genetic testing and assessing genetic risk can help achieve individualized prevention of AKI42. Dysbiosis of the gut microbiota can lead to impaired intestinal barrier function, allowing bacteria and their metabolites to enter the blood, which causes a systemic inflammatory response, damages the kidneys, and increases the risk of AKI.

With the advancement of artificial intelligence, ML is becoming increasingly prevalent in the medical field. The advantage of ML lies in its ability to analyze large volumes of existing data and generate the most valuable insights. For example, ML can handle complex relationships, improving prediction accuracy, enabling personalized prediction, and aiding clinical decision-making in AKI prediction43. In this study, ML was used to build models for predicting AKI risk in DM patients with HF. The performances of models using six common algorithms (GBM, KNN, LightGBM, LR, NN, and RF) were compared. The findings indicated that the LightGBM model demonstrated superior performance, achieving the highest AUC values (0.973 and 0.804) in both the training and validation sets. It exhibited excellent calibration and clinical practicality.

For the convenience of clinical application, this study also constructed a visual nomogram based on the LR model to assist clinicians in rapidly assessing the risk of AKI in patients and formulating individualized prevention and treatment plans.

Comparison with previous studies

Previous studies have investigated the application of ML models in predicting AKI risk, but have focused primarily on the general population or disease-specific populations, such as patients with sepsis44 and patients undergoing cardiac surgery45,46. However, there are relatively few AKI risk prediction studies for the DM population with HF. This study was the first to develop and validate risk prediction models for AKI for DM patients with HF based on the MIMIC-IV database. In addition, AKI risk predictors were screened for this population, providing a new reference for clinical practice.

Compared with previous studies, this study has certain innovations and strengths. (i) The study is highly targeted, focusing on the high-risk population of DM with HF, which is more conducive to early identification and intervention of AKI. (ii) The data source is reliable, using the large, publicly available, and high-quality MIMIC-IV database. (iii) The model construction and validation methods are scientific and reasonable. ML algorithms were used to construct models. Moreover, cross-validation and bootstrap resampling methods were used for parameter optimization to ensure the stability and generalization ability of models. (iv) The results were visualized. A visualized nomogram and SHAP analyses were developed for easy clinical application.

Theoretical significance and clinical application value of the study

This study confirmed the feasibility and effectiveness of ML in predicting AKI risk in DM patients with HF. Multiple predictive factors associated with the occurrence of AKI were revealed, providing a foundation for a more comprehensive understanding of the pathogenesis of AKI and the development of new prevention and treatment strategies. Compared to traditional methods for AKI risk assessments, the LightGBM prediction model and visual nomogram developed in this study can provide more accurate and objective risk assessment results. These results can aid healthcare professionals in promptly identifying high-risk patients and implementing timely preventive measures (such as optimizing fluid management and avoiding nephrotoxic drugs), thus reducing the incidence of AKI and promoting rational allocation of healthcare resources.

Study limitations

Although the present study yielded some meaningful results, there are some limitations. (i) Due to the retrospective study design, the study may be subject to selection bias and information bias. It is difficult to completely exclude the effect of potential confounders. In addition, the MIMIC-IV database primarily includes critically ill patients. Therefore, the findings of the study might not generalize to other populations. (ii) The study data were single-center data, only including data from BIDMC in the United States. No data were collected in domestic hospitals for DM patients with HF. The generalization ability of the model needs to be further validated in other populations and different healthcare institutions to improve the extrapolation of the model. (iii) The model has not been externally validated. Its stability and reliability need to be externally validated using data from other databases to further assess its predictive performance and clinical application value. (iv) There were few features in the study. Due to limitations in the MIMIC-IV database, some special variables like cystatin C and renin were not included in the model, which are associated with AKI. (5) Some variables had high missing values. Although this study used MI to handle missing values, it may still have some impact on the results. Some alternative variables such as triglycerides, cholesterol, glycated hemoglobin, and NT-PROBNP were not included in this study due to too many missing values.

Conclusion

In conclusion, ML models built in this study, especially the LightGBM model, demonstrated effective predictive performance for assessing AKI risk in DM patients with HF during hospitalization. Multiple independent risk factors were revealed, providing a new reference for clinical risk assessment. In the future, larger sample sizes and multicenter prospective studies are required to further validate and optimize the model, in order to develop more accurate, reliable, and practical AKI risk prediction tools that provide stronger support for clinical decision-making.

Tables.