Machine learning model for early prediction of acute kidney injury in heatstroke patients based on the first 24 h hospitalization data

Ding, Xiaonan; Wang, Min; Wang, Lu; Li, Yun; Yan, Lei; Li, Lu; Niu, Yue; Du, Junxia; Duan, Yingjie; Chen, Fei; Song, Chenwen; Kang, Hongjun; Zhu, Hanyu

doi:10.1038/s41598-025-17590-1

Download PDF

Article
Open access
Published: 26 September 2025

Machine learning model for early prediction of acute kidney injury in heatstroke patients based on the first 24 h hospitalization data

Xiaonan Ding^1,2^na1,
Min Wang³^na1,
Lu Wang⁴,
Yun Li³,
Lei Yan⁵,
Lu Li⁵,
Yue Niu¹,
Junxia Du¹,
Yingjie Duan¹,
Fei Chen¹,
Chenwen Song¹,
Hongjun Kang³ &
…
Hanyu Zhu¹

Scientific Reports volume 15, Article number: 33085 (2025) Cite this article

668 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

With the increasing frequency and intensity of heatwaves driven by climate change, heatstroke has emerged as a growing public health concern. As the most severe form of heat-related illness, heatstroke is frequently complicated by acute kidney injury (AKI), a major contributor to poor prognosis. Although AKI often develops in later stages, early detection is essential to reduce morbidity and mortality. This study aimed to develop and validate machine learning models to predict AKI using clinical data from the first 24 h of hospitalization, enabling timely intervention and improved outcomes. We retrospectively collected data from 290 heatstroke patients admitted to 55 hospitals in China between 2008 and 2024. Variables included demographics, clinical features, comorbidities, vital signs, laboratory results, treatments, and complications. Data from the first 24 h of hospitalization were analyzed using univariate analysis, ROC curves, and collinearity testing to identify key predictors. These variables were used to build logistic regression and five machine learning models (Naive Bayes, decision tree, kNN, SVM, and XGBoost), with 20-fold cross-validation applied to reduce overfitting. The cohort was predominantly male (90.69%) with a median age of 25 [21, 41] years, and AKI occurred in 57.93% of patients. Within the first 24 h of hospitalization, the AKI group showed significantly higher core temperatures and heart rates compared to the non-AKI group. They also exhibited elevated renal function markers, coagulation and inflammatory indicators, as well as more pronounced liver dysfunction and rhabdomyolysis. Logistic regression and five machine learning algorithms were applied to predict AKI occurrence using early clinical data. Among them, the kNN model achieved the best performance (AUC = 0.934 [0.909, 0.959]), with troponin T (TnT), D-dimer, myoglobin (Mb), and hematocrit (HCT), identified as key predictive features. Based on clinical data from the first 24 h of hospitalization, the kNN model demonstrated the highest predictive performance for identifying heatstroke patients at risk of a rapid rise in serum creatinine or oliguria during hospitalization. TnT, D-dimer, Mb, and HCT were identified as key predictive variables.

Construction of a machine learning-based interpretable prediction model for acute kidney injury in hospitalized patients

Article Open access 18 March 2025

Predicting outcomes of acute kidney injury in critically ill patients using machine learning

Article Open access 18 June 2023

Development and validation of a real-time prediction model for acute kidney injury in hospitalized patients

Article Open access 02 January 2025

Introduction

Global warming is progressing, the incidence of heat-related illnesses has been reported to be increasing yearly, estimated 500,000 additional deaths worldwide each year^1,2. Heat-related illnesses encompass a continuum that includes heat edema, heat syncope, heat cramps, heat exhaustion, and the most severe form, heatstroke³. Clinically, heatstroke is characterized by central nervous system dysfunction, multiorgan failure, and extreme hyperthermia (usually > 40.5 °C)⁴.

Heatstroke can be classified as either classic or exertional, depending on its cause. Both types involve an imbalance between the body’s heat production and heat dissipation, though their underlying mechanisms differ⁵. Classic heatstroke (CHS) results from passive exposure to environmental heat and inadequate heat-dissipation mechanisms. In contrast, exertional heatstroke (EHS) occurs due to exposure to a hot environment during physical exercise, resulting when excessive metabolic heat production overwhelms the body’s physiological heat-loss mechanisms⁶. EHS affects mainly athletes, military personnel, firefighters, and occupational workers. For CHS, older adults are particularly vulnerable, especially those with common age-associated chronic health conditions (e.g., cardiovascular disease, hypertension, obesity, type 2 diabetes, chronic kidney disease)⁷. It is a life-threatening condition ultimately progressing to life-threatening multiple organ failure and associated with a reported 28-day mortality rate of up to 58%⁸. Currently, the primary treatment modalities for heatstroke include hypothermia control, rehydration therapy, and hemodialysis. While significant research has been conducted on symptomatic therapies for heatstroke, most of these approaches remain at various preclinical stages³. Therefore, management of heatstroke primarily focuses on prevention and early intervention to prevent the progression of the disease.

Rhabdomyolysis and heat-induced inflammatory damage both significantly elevate the risk of acute kidney injury (AKI) in heatstroke patients³. Previous studies have demonstrated that heatstroke complicated by AKI is associated with higher hospitalization costs and worse clinical outcomes⁹. Despite this urgency, AKI typically manifests in the later stages of the disease, and there remains a lack of studies specifically addressing the early prediction of AKI in heatstroke patients. To bridge this gap, this study introduces machine learning models designed to predict AKI incidence in heatstroke patients using clinical data obtained during the first 24 h of hospitalization.

Methods

Patients and study design

Data were collected from 55 hospitals in China between 2008 and 2024. After applying the inclusion and exclusion criteria, a total of 290 patients with heatstroke were enrolled in the study (Fig. 1)¹⁰. The inclusion criteria for this study were as follows: (1) a history of exposure to high-temperature environments and/or participation in high-intensity manual labor; (2) an axillary temperature above 39 °C; (3) evidence of central nervous system dysfunction, including symptoms such as delirium, coma, impaired consciousness, or disorientation^5,6; (4) patients with a hospital stay of more than 24 h.

The exclusion criteria were as follows: (1) patients aged under 18 years; (2) patients with pre-existing comorbidities prior to heatstroke onset, including diabetes, cerebral infarction, pulmonary infection, chronic kidney disease, and dementia; and (3) patients with more than 30% missing data in their records were excluded from the analysis.

Ethical considerations

The study was conducted by the PLA General Hospital and received approval from the ethics committees of all participating institutions. Each patient underwent comprehensive, condition-specific treatment, which included body cooling, fluid administration, and anti-inflammatory measures. For those diagnosed with rhabdomyolysis and AKI, organ support was provided as needed, in accordance with clinical guidelines. This support included appropriate hydration, urine alkalization, and, when necessary, continuous renal replacement therapy (CRRT), along with other interventions⁹.

Definitions

1.
AKI was defined according to the Kidney Disease: Improving Global Outcomes (KDIGO) criteria as one of the following: (1) an increase in serum creatinine (Scr) to ≥ 26.5 μmol/L (≥ 0.3 mg/dL) within 48 h; (2) an increase in Scr to ≥ 1.5 times the baseline within 7 days; (3) urine output <0.5 mL/kg/h for 6 h¹¹.
2.
Rhabdomyolysis: Characterized by acute muscle weakness, myalgia, and muscle swelling combined with a creatine kinase (CK) cut-off value of > 1000 IU/L or CK > 5 × upper limit of normal for the standard definition of rhabdomyolysis. Additionally measured myoglobinuria and AKI indicate a severe type of rhabdomyolysis^12,13.
3.
Sequential organ failure assessment (SOFA) score: a validated tool used to quantify the extent of organ dysfunction in critically ill patients. It evaluates six organ systems—respiratory, cardiovascular, hepatic, coagulation, renal, and neurological—each assigned a score ranging from 0 (normal function) to 4 (severe dysfunction). The total SOFA score, ranging from 0 to 24, reflects the overall severity of organ failure, with higher scores associated with increased mortality (Supplementary Table 1)¹⁴.
4.
Disseminated intravascular coagulation (DIC): Diagnosed based on a combination of clinical manifestations and laboratory findings that reflect systemic activation of the coagulation cascade. The International Society on Thrombosis and Haemostasis (ISTH) has proposed a widely accepted scoring system to identify overt DIC. The scoring system incorporates four parameters: platelet count, prolongation of prothrombin time (PT), levels of fibrin-related markers (such as D-dimer or fibrin degradation products), and fibrinogen concentration (Fib). Each parameter is assigned a score, and a cumulative score of ≥ 5 is considered indicative of overt DIC (Supplementary Table 2)¹⁵.
5.
Effective cooling: Defined as the reduction of core body temperature to below 38.5 °C within 30–60 min of initiating treatment. This threshold is widely accepted to prevent irreversible neurological injury and multi-organ dysfunction. Commonly employed cooling strategies include cold-water immersion, evaporative cooling, ice blanket therapy, intravascular temperature management, and extracorporeal methods such as cold hemodialysis or high-flow continuous hemodiafiltration^5,16.

Statistical analysis

All statistical analyses were performed using RStudio (version 2024.12.1) running R version 4.4.2 (R Core Team, 2024, https://www.r-project.org/). Prior to any analysis, data preprocessing was carried out to handle missing values and ensure data quality. Variables with more than 30% missing values were excluded from further analysis. For variables with less than 30% missingness, multiple imputation was performed using the “mice” package (version 3.18.0) to reduce bias and maximize statistical power^17,18,19.

Continuous variables were first assessed for normality using the Shapiro–Wilk test. Normally distributed variables are reported as means with standard deviations (mean ± SD), while non-normally distributed variables are expressed as medians with interquartile ranges (median [Q1, Q3]). Categorical variables are presented as frequencies and percentages. For group comparisons, the two-independent-samples t-test was applied to normally distributed continuous variables, and the Mann–Whitney U test was used for non-normally distributed data. The chi-square (χ²) test was employed to assess associations between categorical variables. A two-tailed p value < 0.05 was considered statistically significant.

Univariate logistic regression analyses were then conducted for all candidate predictor variables to explore their associations with the outcome of interest. Variables with p values < 0.05 in the univariate analysis were considered for inclusion in the multivariable logistic regression model. Prior to multivariable modeling, multicollinearity was assessed using the variance inflation factor (VIF); variables with a VIF greater than 5 were considered to have potential multicollinearity and were reviewed accordingly.

To further evaluate the predictive performance of significant variables, receiver operating characteristic (ROC) curve analyses were conducted for those identified as significant in univariate testing. Variables with an area under the curve (AUC) ≥ 0.7 were retained as candidates for model development²⁰. The Youden Index was used to determine the optimal cutoff points by maximizing the sum of sensitivity and specificity.

In this study, certain variables were derived to reflect the most clinically relevant values observed within the first 24 h of hospital admission. Specifically, variable suffixes such as “_min” and “_max” denote the minimum and maximum values, respectively, of clinical parameters recorded during that period. The choice of using either the minimum or maximum value for a given parameter was informed by both clinical expertise and relevant literature, reflecting the characteristic physiological trajectories observed in patients with heat stroke following disease onset.

Machine learning model construction

Logistic Regression, along with five machine learning algorithms—support vector machine (SVM)^21,22, XGBoost^23,24, k-nearest neighbor (kNN)^25,26, Naive Bayes^27,28, and decision tree (DT)^29,30—were implemented to develop early warning models for AKI). Model development and evaluation were conducted using R 4.4.2 (R Core Team, 2024, https://www.r-project.org/). These machine learning methods were selected due to their wide applicability, robustness, and capacity to address classification challenges in medical data. The multivariable logistic regression model was constructed using a stepwise backward elimination approach, with a threshold of p < 0.05 for retention in the final model. Model discrimination was evaluated using AUC, and calibration was assessed with the Hosmer–Lemeshow goodness-of-fit test. Logistic regression served as a benchmark against which the performance of other machine learning algorithms was evaluated.

Model evaluation

To prevent overfitting, we applied 20-fold cross-validation (CV). The dataset was randomly partitioned into 20 subsets; each subset was used once as the test set while the remaining 19 subsets formed the training set. The final evaluation metric was the average of the performance across all folds. Model performance was assessed using standard classification metrics, including accuracy, precision, sensitivity, specificity, F1-score, and AUC. These are widely accepted evaluation measures in machine learning and are used here following conventional definitions. An ROC curve was also plotted to visually compare the classification performance of different models.

Feature importance

The important parameters of logistic regression are represented using formulas, while the parameters in various machine learning algorithms are ranked according to their significance. The interpretation of the models was conducted using the Shapley Additive Explanations (SHAP) metric, which provides a unified approach for precisely calculating the contribution and influence of each feature on the final predictions. The SHAP values indicate how much each predictor contributes, either positively or negatively, to the target variable³¹.

Results

Demographic characteristics and baseline clinical data

A multicenter dataset comprising 511 patients diagnosed with heatstroke was established using clinical records collected from 55 hospitals across China between 2008 and 2024. Detailed data were collected for each patient, including demographic characteristics, medical history, clinical symptoms and signs, laboratory test results, imaging findings, diagnostic information, treatments and medication use, surgical and therapeutic interventions, follow-up, and clinical outcomes.

After applying predefined inclusion and exclusion criteria, a total of 290 patients with heatstroke were included in the final analysis. Among them, 263 were male, with a median age of 25 [21, 41] years and a mean body mass index (BMI) of 23.66 ± 2.61. Occupational distribution showed that 89 patients (30.69%) were workers, 64 (22.07%) were farmers, followed by 38 unemployed individuals (13.10%), 35 athletes (12.07%), 26 students and teachers (8.97%), 23 retired individuals (7.93%), and 15 police officers or firefighters (5.17%).

Among the 290 cases, 90 (31.03%) were diagnosed with classic heatstroke (CHS) and 200 (68.97%) with exertional heatstroke (EHS). Rhabdomyolysis was observed in 78 EHS patients (39%). Among those, 57 cases (73.08%) developed acute kidney injury (AKI). In contrast, the incidence of AKI among EHS patients without rhabdomyolysis was 37.59%. In total, 117 of the 200 EHS patients (58.5%) developed AKI, compared with 51 of the 90 CHS patients (56.67%). Overall, AKI occurred in 168 of the 290 heatstroke patients (57.93%) and 28 patients (9.66%) died during hospitalization.

There were no significant differences in sex, age, or BMI between patients with and without AKI. However, the incidence of rhabdomyolysis and the proportion of patients receiving CRRT were significantly higher in the AKI group. In addition, cooling measures appeared to be less effective among patients who developed AKI (Table 1).

Table 1 Demographic and clinical characteristics of patients with heat stroke.

Full size table

Univariate analysis

This study incorporated patients first 24 h data after in hospital, a total of 53 commonly used clinical indicators, including admission temperature, heart rate, respiratory rate, mean arterial pressure (MAP), SOFA score, GCS score, arterial blood gas analysis, complete blood count, coagulation parameters, biochemical markers, and myocardial injury biomarkers. Univariate analysis was performed to assess whether there were statistically significant differences in the indicators between patients who developed AKI and those who did not. Based on the results of the univariate analysis, 40 indicators that showed significant differences between the two groups were selected for further analysis (Supplementary Table 3).

The AKI group exhibited significantly higher body temperature (41.23 ± 2.17 °C vs. 38.35 ± 1.66 °C, p < 0.001) and higher heart rate (89.00 [76.75, 100.00] bpm vs. 80.00 [68.00, 102.00] bpm, p = 0.045) compared to the non-AKI group. Additionally, Scr and blood urea nitrogen (BUN) levels were markedly elevated in the AKI group (p < 0.001). Coagulation and inflammatory markers, including platelet count (PLT), thrombin time (TT), prothrombin time (PT), activated partial thromboplastin time (APTT), fibrinogen (Fib), prothrombin activity (PTA), international normalized ratio (INR), D-dimer, procalcitonin (PCT), neutrophil count (Neu), white blood cell count (WBC), lymphocyte count (Lym), lactate dehydrogenase (LDH), and interleukin-6 (IL-6), were all significantly higher in the AKI group (p ≤ 0.001). As well as, Liver function markers, including aspartate aminotransferase (AST), alanine aminotransferase (ALT), total bilirubin (TBIL), direct bilirubin (DBIL), and albumin (ALB), also indicated greater severity in patients with AKI (p ≤ 0.001). Furthermore, biomarkers associated with rhabdomyolysis, such as creatine kinase (CK) (p = 0.002), CK-MB, and myoglobin (Mb), were significantly elevated (p < 0.001).

Receiver operating characteristic curve analysis

Furthermore, ROC curve analysis was conducted for the 40 variables that showed statistical significance in the univariate analysis. Among these, seven variables demonstrated acceptable discriminatory performance, with an AUC ≥ 0.70. Specifically, the results were as follows: HCT_min (AUC = 0.710, 95% CI 0.649–0.772), Scr_max (AUC = 0.798, 95% CI 0.744–0.851), BUN_max (AUC = 0.704, 95% CI 0.643–0.765), Mb_max (AUC = 0.705, 95% CI 0.640–0.770), TnT_max (AUC = 0.709, 95% CI 0.627–0.792), D-Dimer_max (AUC = 0.755, 95% CI 0.694–0.816), and IL-6_max (AUC = 0.704, 95% CI 0.617–0.791), as shown in Fig. 2A.

Multicollinearity assessment

Considering that serum creatinine is part of the diagnostic criteria for AKI and may therefore artificially enhance model performance, we excluded this variable from subsequent analyses. Multicollinearity among the independent variables was assessed using the VIF.

Several variables exhibited moderate to high multicollinearity, with VIF values exceeding commonly accepted thresholds (Supplementary Table 4). Notably, K⁺_max (VIF = 120.246), GCS score_min (VIF = 38.269), Cl⁻_min (VIF = 31.314), Lac_max (VIF = 21.404), INR_max (VIF = 20.672) demonstrated substantial collinearity. Other variables, including pH_min, BUN_max, Na⁺_min, resp_rate, MAP_max, Hb_min, and AST_max also show ed VIF values greater than 10, indicating moderate multicollinearity. A total of 35 variables with VIF < 5 were considered acceptable and retained for subsequent multivariable modeling. Figure 2B presents a heatmap of the Pearson correlation coefficients among candidate predictor variables.

Model performance

Based on acceptable model discrimination (AUC > 0.70) and the absence of significant multicollinearity (VIF < 5), four key predictors were selected for inclusion in the construction of the AKI early warning model. Multiple imputation was first performed to handle missing data, including HCT, D_dimer, IL_6, TnT, and Mb (Fig. 3A).

Among all models evaluated, the kNN and SVM algorithms demonstrated the highest discriminative performance, with AUCs of 0.934 (95% CI 0.909–0.959) and 0.924 (95% CI 0.886–0.962), respectively, indicating excellent predictive accuracy. The XGBoost model also performed well, achieving an AUC of 0.863 (95% CI 0.842–0.884), followed closely by the naïve Bayes, with an AUC of 0.851 (95% CI 0.808–0.893). In contrast, logistic regression yielded relatively lower AUCs of 0.753 (95% CI 0.697–0.808), though maintained acceptable discriminative ability (Fig. 3B). The corresponding metric is presented in Supplementary Table 5. The final logistic regression model can be expressed as follows:

$$\begin{aligned} \log \left( {\frac{{P\left( {AKI} \right)}}{{1 - P\left( {AKI} \right)}}} \right) & = 0.1618 - 0.0602 \times HCT_{min} + 0.0001 \times IL\_6_{max} - 0.0017 \\ & \quad \times TnT_{max} + 0.0005 \times Mb_{max} - 0.0001 \times D\_Dimer_{max} \\ \end{aligned}$$

The model’s predictive performance was comprehensively evaluated using multiple metrics, including AUC with 95% CI, accuracy, precision, sensitivity, and F1 score, as summarized in Table 2. Among the models tested, the kNN algorithm demonstrated the best overall performance, with an AUC of 0.934 [0.909–0.959], accuracy of 0.841 [0.800–0.879], specificity of 0.851 [0.798–0.903], precision of 0.803 [0.733–0.873], sensitivity of 0.828 [0.758–0.891], and an F1 score of 0.814 [0.757–0.870].

Table 2 Classification performance metrics of various machine learning algorithms.

Full size table

The Fig. 3C–H illustrate the feature importance for logistic regression and five machine learning models in predicting AKI, including Naive Bayes, KNN, SHAP-based feature importance for XGBoost, Decision Tree, and SVM. Across all models, HCT, TnT, Mb consistently emerged as the most important feature, followed by D-dimer, IL_6.

We observed that the increase in HCT levels was evident within the first 24 h following the onset of heat stroke, with a HCT of 2.66 [2.31, 4.40] in 290 patients. The HCT levels in patients who developed AKI after heat stroke were significantly higher compared to those without AKI (patients with AKI: HCT = 2.52 [2.21, 4.15]; patients without AKI: HCT = 3.42 [2.40, 4.40]; p < 0.001). Based on machine learning results, our findings suggest that a lower HCT level within the first 24 h of hospital admission serves as an important indicator for predicting subsequent acute increases in Scr or oliguria during hospitalization. Additionally, D-dimer emerged as a significant predictor of AKI development. Patients with AKI exhibited significantly higher levels of D-dimer and Mb compared to those without AKI (patients with AKI: D-dimer = 2.65 [0.83, 10.23], Mb = 557.95 [341.33, 2295.50]; patients without AKI: D-dimer = 1.08 [0.50, 3.78], Mb = 422.00 [233.40, 924.90]; p < 0.001).

Discussion

After applying the inclusion and exclusion criteria, a total of 290 heatstroke patients from 55 hospitals in China, between 2008 and 2024, were enrolled in the study. In this study, we define AKI based on the KDIGO criteria, where patients experience a sudden increase in Scr and oliguria during hospitalization and AKI occurred in 57.93% of the patients.

Currently available evidence suggests that clinical research focusing on risk assessment, prediction, and identification of risk factors for AKI in patients with heatstroke remains limited. Most existing studies have relatively small sample sizes, typically ranging from 58 to 187 patients, which may constrain the generalizability of their findings^32,33. Previous studies have reported that the combination of serum Mb and lactate dehydrogenase (LDH) can effectively predict AKI in heat stroke patients with concomitant rhabdomyolysis, achieving an AUC of up to 0.9116³⁴. Additionally, the lowest recorded platelet counts during hospitalization demonstrated a predictive value for AKI with an AUC of 0.73³⁵. Other identified independent risk factors for AKI in patients with EHS include elevated lymphocyte and neutrophil counts, D-dimer levels, and Mb ≥ 1000 ng/mL, amd Mb has been found to be a more reliable predictor of AKI than CK^9,36. However, to the best of our knowledge, no studies to date have established an early warning model for AKI in heatstroke patients using machine learning algorithms based on early-phase clinical data.

Based on the first 24 h of hospitalization data, the AKI group exhibited significantly higher temperature and heart rate compared to the non-AKI group. Additionally, patients with AKI showed significantly elevated levels of kidney function indicators, coagulation and inflammatory markers, as well as more severe liver dysfunction and rhabdomyolysis. We performed logistic regression and five machine learning algorithms to predict AKI occurrence during hospitalization based on data from the first 24 h. Among the models tested, the kNN algorithm demonstrated the best overall performance, with an AUC of 0.934 [0.909–0.959], and key predictors included TnT, HCT, D-dimer, and Mb.

Although previous studies have suggested that AKI typically manifests in the middle to late stages of heatstroke progression, emerging evidence indicates that renal impairment may begin much earlier in the disease course. Notably, elevated Scr levels observed in the early phase are associated with a higher likelihood of a rapid rise in Scr and the subsequent development of oliguria, suggesting early subclinical kidney injury and a more aggressive renal trajectory during hospitalization³⁷.

Coagulation dysfunction is a common complication in patients with heat stroke and shares pathophysiological similarities with sepsis-associated coagulopathy³⁶. Previous studies have demonstrated that D-dimer levels are predictive of AKI in various clinical settings, including patients with intra-abdominal infections, those admitted to intensive care units (ICUs), individuals with ST-segment elevation myocardial infarction (STEMI), and patients with sepsis^38,39,40. D-dimer is a fibrin degradation product that indicates ongoing fibrinolysis following the activation of coagulation. Its marked elevation in heatstroke may signal systemic endothelial injury, excessive thrombin generation, and a hypercoagulable state, all of which can result in microvascular thrombosis⁴¹. These microthrombi may occlude renal capillaries and small arterioles, impairing renal perfusion and oxygen delivery, thereby promoting ischemic tubular injury and contributing to the development or worsening of AKI^9,42.

Furthermore, heat stroke is associated with a systemic inflammatory response similar to sepsis, leading to cytokine-mediated endothelial activation and dysfunction⁴³. This promotes the expression of tissue factor, amplifies the coagulation cascade, and inhibits natural anticoagulant pathways (e.g., antithrombin III, protein C)⁴⁴. The resulting imbalance between coagulation and fibrinolysis may aggravate microvascular thrombosis and inflammatory injury within renal tissue⁴⁵. D-dimer may therefore serve not only as a biomarker of disease severity and AKI risk, but also as a potential therapeutic target. Modulating the coagulation pathway—such as through the use of anticoagulants or targeted therapies to prevent microthrombosis—could potentially mitigate renal injury.

Our findings identified Mb as a critical predictor of AKI in patients with heat stroke, aligning with prior studies. The underlying mechanism is primarily attributed to rhabdomyolysis-induced myoglobinemia, which leads to the accumulation of Mb in renal tubules. Myoglobin, especially under conditions of hypovolemia and acidic urine, can cause direct oxidative damage to tubular epithelial cells and promote tubular obstruction through cast formation, ultimately contributing to AKI onset⁴⁶.

In addition, HCT was among the top-ranking variables in our predictive model. Previous research has shown that reduced HCT during cardiopulmonary bypass is significantly associated with higher AKI risk in cardiac surgery patients^47,48. The proposed mechanism involves hemodilution-related reductions in oxygen-carrying capacity, resulting in renal hypoxia and impaired oxygen delivery to the tubular cells⁴⁹. In the context of heat stroke, where volume depletion, hyperthermia, and systemic inflammation already compromise renal perfusion, a low HCT may further exacerbate renal ischemia, thereby increasing susceptibility to AKI⁴⁷. These observations suggest that both Mb and HCT are not only valuable predictive markers but also reflect distinct yet converging pathophysiological pathways leading to heatstroke–related renal injury.

In patients with heat stroke complicated by AKI, CRRT is widely acknowledged as a consensus-driven therapeutic strategy⁵⁰. In our study, 55.36% of patients diagnosed with AKI received CRRT, reflecting adherence to this clinical recommendation. Interestingly, a notable proportion of patients without AKI—18.45%—were also treated with CRRT. In clinical practice, beyond the presence of AKI, the initiation of CRRT should be considered in heatstroke patients presenting with any of the following conditions: persistent core body temperature above 40 °C unresponsive to standard cooling interventions; rhabdomyolysis; or other signs of organ dysfunction; presence of severe electrolyte imbalances or metabolic acidosis^51,52. Hemodialysis (HD), continuous hemodiafiltration (CHDF), and continuous plasma diafiltration (CPDF) have been reported as effective adjunctive therapies in the management of heatstroke, not only for their capacity to reduce core body temperature, but also for their ability to support impaired organ function^53,54. When conventional cooling methods—such as gastric lavage with cold water or intravenous infusion of cold saline—fail to achieve adequate temperature control, the use of cold dialysate in HD or high-flow cold CHDF can facilitate rapid core temperature reduction⁵⁴. Cold hemodialysis (cold HD) is generally appropriate for patients requiring rapid cooling, while high-flow cold CHDF may be more suitable for critically ill patients with multi-organ involvement, as it provides continuous temperature regulation alongside organ support^51,55. And extracorporeal blood purification helps to remove metabolic byproducts and inflammatory mediators, thereby alleviating organ burden and promoting functional recovery⁵⁶.

However, our study has several limitations. This study integrated electronic medical records collected from multiple hospitals. To ensure data consistency and standardization across centers, we performed unified unit conversions for all relevant clinical variables. However, inter-institutional variations in laboratory testing methods and equipment may still introduce inherent heterogeneity in the dataset. Given the nature of heatstroke as a condition that predominantly occurs under extreme environmental conditions, its incidence remains relatively low, which poses challenges for patient recruitment. Although the sample size in our study exceeds that of most existing retrospective clinical studies on heatstroke, it remains relatively limited and may affect the robustness of model performance. Previous research has demonstrated that traditional machine learning algorithms can still achieve satisfactory performance even when applied to relatively limited datasets. Nevertheless, results obtained from ensemble-based algorithms such as XGBoost should be interpreted with caution. As a retrospective analysis, the findings of this study warrant further validation through well-designed, large-scale prospective studies to rigorously assess the predictive performance of our models, particularly regarding the early identification of AKI in patients with heatstroke.

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Feulner, G. Global challenges: Climate change. Glob Chall. (Hoboken, NJ) 1(1), 5–6 (2017).
Article Google Scholar
Burkart, K. G. et al. Estimating the cause-specific relative risks of non-optimal temperature on daily mortality: A two-part modelling approach applied to the Global Burden of Disease Study. Lancet (London, England) 398(10301), 685–697 (2021).
Article PubMed Google Scholar
Zhang, Z. et al. Heat stroke: Pathogenesis, diagnosis, and current treatment. Ageing Res. Rev. 100, 102409 (2024).
Article CAS PubMed Google Scholar
Mayer, T. & Casa, D. J. Heatstroke. N. Engl. J. Med. 381(12), 1186–1187 (2019).
Article PubMed Google Scholar
Liu, S. Y., Song, J. C., Mao, H. D., Zhao, J. B. & Song, Q. Expert consensus on the diagnosis and treatment of heat stroke in China. Mil. Med. Res. 7(1), 1 (2020).
PubMed PubMed Central Google Scholar
Epstein, Y. & Yanovich, R. Heatstroke. N. Engl. J. Med. 380(25), 2449–2459 (2019).
Article PubMed Google Scholar
Meade, R. D. et al. Physiological factors characterizing heat-vulnerable older adults: A narrative review. Environ. Int. 144, 105909 (2020).
Article PubMed Google Scholar
Hifumi, T., Kondo, Y., Shimizu, K. & Miyake, Y. Heat stroke. J. Intensive Care 6, 30 (2018).
Article PubMed PubMed Central Google Scholar
Wu, M. et al. Clinical characteristics and risk factors associated with acute kidney injury inpatient with exertional heatstroke: An over 10-year intensive care survey. Front. Med. 8, 678434 (2021).
Article Google Scholar
Yang, M. M. et al. Establishment and effectiveness evaluation of a scoring system for exertional heat stroke by retrospective analysis. Mil. Med. Res. 7(1), 40 (2020).
CAS PubMed PubMed Central Google Scholar
Lameire, N. H. et al. Harmonizing acute and chronic kidney disease definition and classification: Report of a kidney disease: Improving global outcomes (KDIGO) consensus conference. Kidney Int. 100(3), 516–526 (2021).
Article PubMed Google Scholar
Cervellin, G., Comelli, I. & Lippi, G. Rhabdomyolysis: Historical background, clinical, diagnostic and therapeutic features. Clin. Chem. Lab. Med. 48(6), 749–756 (2010).
Article CAS PubMed Google Scholar
Stahl, K., Rastelli, E. & Schoser, B. A systematic review on the definition of rhabdomyolysis. J. Neurol. 267(4), 877–882 (2020).
Article PubMed Google Scholar
Vincent, J. L. et al. The SOFA (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 22(7), 707–710 (1996).
Article CAS PubMed Google Scholar
Levi, M., Toh, C. H., Thachil, J. & Watson, H. G. Guidelines for the diagnosis and management of disseminated intravascular coagulation. British Committee for Standards in Haematology. Br. J. Haematol. 145(1), 24–33 (2009).
Article CAS PubMed Google Scholar
Bouchama, A. et al. Classic and exertional heatstroke. Nat. Rev. Dis. Primers 8(1), 9 (2022).
Article Google Scholar
Blazek, K., van Zwieten, A., Saglimbene, V. & Teixeira-Pinto, A. A practical guide to multiple imputation of missing data in nephrology. Kidney Int. 99(1), 68–74 (2021).
Article PubMed Google Scholar
Dong, Y. & Peng, C. Y. Principled missing data methods for researchers. Springerplus 2(1), 222 (2013).
Article PubMed PubMed Central Google Scholar
Vink, G. Roderick J. Little and Donald B. Rubin: Statistical analysis with missing data. Psychometrika 87(4), 1575–1578 (2022).
Article Google Scholar
Mandrekar, J. N. Receiver operating characteristic curve in diagnostic test assessment. J. Thorac. Oncol. Off. Publ. Int. Assoc. Study Lung Cancer 5(9), 1315–1316 (2010).
Google Scholar
Lee, Y. W., Choi, J. W. & Shin, E. H. Machine learning model for predicting malaria using clinical information. Comput. Biol. Med. 129, 104151 (2021).
Article PubMed Google Scholar
Gupta, A. & Kahali, B. Machine learning-based cognitive impairment classification with optimal combination of neuropsychological tests. Alzheimer’s Dementia (New York, N Y) 6(1), e12049 (2020).
PubMed Google Scholar
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (San Francisco: Association for Computing Machinery, 2016).
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Article MathSciNet Google Scholar
Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967).
Article Google Scholar
Altman, N. S. An Introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992).
Article MathSciNet Google Scholar
Rennie, J. D. M., Shih, L., Teevan, J. & Karger, D. R. Tackling the poor assumptions of naive Bayes text classifiers. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 616–623 (Washington: AAAI Press, 2003).
McCallum, A. & Nigam, K. A comparison of event models for naive Bayes text classification. In AAAI Conference on Artificial Intelligence: 1998 (1998).
Quinlan, J. R. Induction of decision trees. Mach. Learn. 1, 81–106 (1986).
Article Google Scholar
Breiman, L. Random forests. J. Mach. Learn. 45(1), 5–32 (2001).
Article Google Scholar
Nohara, Y., Matsumoto, K., Soejima, H. & Nakashima, N. Explanation of machine learning models using Shapley additive explanation and application for real data in hospital. Comput. Methods Programs Biomed. 214, 106584 (2022).
Article PubMed Google Scholar
Goto, H. et al. Early biomarkers for kidney injury in heat-related illness patients: A prospective observational study at Japanese Self-Defense Force Fuji Hospital. Nephrol. Dial. Transplant. Off. Publ. Eur. Dial. Transplant. Assoc. Eur. Renal Assoc. 38(3), 644–654 (2023).
CAS Google Scholar
Yeargin, S. W. et al. Exertional heat illness risk factors and physiological responses of youth football players. J. Sport Health Sci. 10(1), 91–98 (2021).
Article PubMed Google Scholar
Yin, X. & Wang, W. Predictive value of serum myoglobin and lactate dehydrogenase in rhabdomyolysis-induced acute kidney injury from severe heatstroke. Am. J. Transl. Res. 16(4), 1477–1483 (2024).
Article CAS PubMed PubMed Central Google Scholar
Fan, H. et al. Thrombocytopenia as a predictor of severe acute kidney injury in patients with heat stroke. Ren. Fail. 37(5), 877–881 (2015).
Article CAS PubMed Google Scholar
Wang, C. et al. Association of D-dimer and acute kidney injury associated with rhabdomyolysis in patients with exertional heatstroke: An over 10-year intensive care survey. Ren. Fail. 43(1), 1561–1568 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zappitelli, M. et al. A small post-operative rise in serum creatinine predicts acute kidney injury in children undergoing cardiac surgery. Kidney Int. 76(8), 885–892 (2009).
Article CAS PubMed Google Scholar
Xu, Z. et al. Coagulative biomarkers on admission to the ICU predict acute kidney injury and mortality in patients with septic shock caused by intra-abdominal infection. Infect. Drug Resist. 12, 2755–2764 (2019).
Article CAS PubMed PubMed Central Google Scholar
Spring, J. L., Winkler, A. & Levy, J. H. The influence of various patient characteristics on D-dimer concentration in critically ill patients and its role as a prognostic indicator in the intensive care unit setting. Clin. Lab. Med. 34(3), 675–686 (2014).
Article PubMed Google Scholar
Su, Q. Y. et al. Development and external validation of a nomogram for the early prediction of acute kidney injury in septic patients: A multicenter retrospective clinical study. Ren. Fail. 46(1), 2310081 (2024).
Article PubMed PubMed Central Google Scholar
Iba, T. et al. The unique characteristics of COVID-19 coagulopathy. Crit. Care (Lond. Engl.) 24(1), 360 (2020).
Article Google Scholar
Williams, B., Zou, L., Pittet, J. F. & Chao, W. Sepsis-induced coagulopathy: A comprehensive narrative review of pathophysiology, clinical presentation, diagnosis, and management strategies. Anesth. Analg. 138(4), 696–711 (2024).
Article PubMed PubMed Central Google Scholar
Levi, M. & van der Poll, T. Inflammation and coagulation. Crit. Care Med. 38(2 Suppl), S26-34 (2010).
Article CAS PubMed Google Scholar
Shapiro, Y. & Seidman, D. S. Field and clinical observations of exertional heat stroke patients. Med. Sci. Sports Exerc. 22(1), 6–14 (1990).
Article CAS PubMed Google Scholar
Iba, T., Connors, J. M., Levi, M. & Levy, J. H. Heatstroke-induced coagulopathy: Biomarkers, mechanistic insights, and patient management. EClinicalMedicine 44, 101276 (2022).
Article PubMed PubMed Central Google Scholar
Liapis, H., Boils, C., Hennigar, R. & Silva, F. Myoglobin casts in renal biopsies: Immunohistochemistry and morphologic spectrum. Hum. Pathol. 54, 25–30 (2016).
Article CAS PubMed Google Scholar
Ellis, M. C. et al. Nadir hematocrit on bypass and rates of acute kidney injury: Does sex matter?. Ann. Thorac. Surg. 100(5), 1549–1554 (2015) (discussion 1554-1545).
Article PubMed PubMed Central Google Scholar
Ranucci, M. et al. Acute kidney injury and hemodilution during cardiopulmonary bypass: A changing scenario. Ann. Thorac. Surg. 100(1), 95–100 (2015).
Article PubMed Google Scholar
Wu, M. Y. et al. Current mechanistic concepts in ischemia and reperfusion injury. Cell. Physiol. Biochem. Int. J. Exp. Cell. Physiol. Biochem. Pharmacol. 46(4), 1650–1667 (2018).
Article CAS Google Scholar
Kanda, J. et al. Association between active cooling and lower mortality among patients with heat stroke and heat exhaustion. PLoS ONE 16(11), e0259441 (2021).
Article CAS PubMed PubMed Central Google Scholar
Raj, V. M. et al. Therapeutic plasma exchange in the treatment of exertional heat stroke and multiorgan failure. Pediatr. Nephrol. (Berlin, Germany) 28(6), 971–974 (2013).
Article Google Scholar
Casa, D. J., Armstrong, L. E., Kenny, G. P., O’Connor, F. G. & Huggins, R. A. Exertional heat stroke: New concepts regarding cause and care. Curr. Sports Med. Rep. 11(3), 115–123 (2012).
Article PubMed Google Scholar
Inoue, N. et al. Successful treatment of exertional heat stroke using continuous plasma diafiltration. J. Clin. Apheresis 31(5), 490–492 (2016).
Article PubMed Google Scholar
Wakino, S. et al. Heat stroke with multiple organ failure treated with cold hemodialysis and cold continuous hemodiafiltration: A case report. Ther. Apher. Dial. Off. Peer Rev. J. Int. Soc. Apher. Jpn. Soc. Apher. Jpn. Soc. Dial. Ther. 9(5), 423–428 (2005).
Google Scholar
Fan, H., Zhu, J. H. & Ye, J. H. Cold hemodiafiltration for exertional heat stroke resulting in multiple organ failure. Ther. Apher. Dial. Off. Peer Rev. J. Int. Soc. Apher. Jpn. Soc. Apher. Jpn. Soc. Dial. Ther. 19(5), 522 (2015).
Google Scholar
Goto, H., Kinoshita, M. & Oshima, N. Heatstroke-induced acute kidney injury and the innate immune system. Front. Med. 10, 1250457 (2023).
Article Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China (No. 62271506) and the Innovation Team and Talents Cultivation Program of the National Administration of Traditional Chinese Medicine (No. ZYYCXTD-D-202402).

Author information

Xiaonan Ding and Min Wang contributed equally to this work.

Authors and Affiliations

Department of Nephrology, State Key Laboratory of Kidney Diseases, National Clinical Research Center for Kidney Diseases, Beijing Key Laboratory of Medical Devices and Integrated Traditional Chinese and Western Drug Development for Severe Kidney Diseases, Beijing Key Laboratory of Digital Intelligent TCM for the Preventionand Treatment of Pan-vascular Diseases, Key Disciplines of National Administration of Traditional Chinese Medicine(zyyzdxk-2023310), Innovation Team and Talents Cultivation Program of National Administration of Traditional Chinese Medicine. (No: ZYYCXTD-D-202402), First Medical Center of Chinese PLA General Hospital, Beijing, 100853, China
Xiaonan Ding, Yue Niu, Junxia Du, Yingjie Duan, Fei Chen, Chenwen Song & Hanyu Zhu
Medical School of Chinese PLA, Beijing, 100853, China
Xiaonan Ding
Department of Critical Care Medicine, The First Medical Center, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China
Min Wang, Yun Li & Hongjun Kang
Department of Critical Care Medicine, The Fifth Medical Center, Chinese PLA General Hospital, Beijing, 100039, China
Lu Wang
Henan Provincial Key Laboratory of Kidney Disease and Immunology, Department of Nephrology, Henan Provincial Clinical Research Center for Kidney Disease, Henan Provincial People’s Hospital, People’s Hospital of Zhengzhou University, Zhengzhou, 450003, China
Lei Yan & Lu Li

Authors

Xiaonan Ding
View author publications
Search author on:PubMed Google Scholar
Min Wang
View author publications
Search author on:PubMed Google Scholar
Lu Wang
View author publications
Search author on:PubMed Google Scholar
Yun Li
View author publications
Search author on:PubMed Google Scholar
Lei Yan
View author publications
Search author on:PubMed Google Scholar
Lu Li
View author publications
Search author on:PubMed Google Scholar
Yue Niu
View author publications
Search author on:PubMed Google Scholar
Junxia Du
View author publications
Search author on:PubMed Google Scholar
Yingjie Duan
View author publications
Search author on:PubMed Google Scholar
Fei Chen
View author publications
Search author on:PubMed Google Scholar
Chenwen Song
View author publications
Search author on:PubMed Google Scholar
Hongjun Kang
View author publications
Search author on:PubMed Google Scholar
Hanyu Zhu
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: Xiaonan Ding, Hanyu Zhu, Hongjun Kang; Resources: Hanyu Zhu, Hongjun Kang, Min Wang, Lei Yan, Lu Li; Methodology: Xiaonan Ding, Yue Niu; Data curation: Xiaonan Ding, Min Wang; Analysis: Xiaonan Ding; Investigation: Xiaonan Ding, Chenwen Song, Fei Chen, Lu Wang, Yun Li, Yinjie Duan; Original draft preparation: Xiaonan Ding, Junxia Du; Review and editing: Xiaonan Ding, Hanyu Zhu; Supervision: Hanyu Zhu, Hongjun Kang; Project administration: Hanyu Zhu; Funding acquisition: Hanyu Zhu.

Corresponding authors

Correspondence to Hongjun Kang or Hanyu Zhu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

The authors confirm that all experiments involving human subjects or human tissues were conducted in accordance with the ethical guidelines and regulations of National Military Medical Ethics (NMAP)/ Good Clinical Practice (GCP) and the Declaration of Helsinki. The study was approved by the Ethics Committee of the Chinese PLA General Hospital (Approval number: S2024-138-01). Informed consent was obtained from all participants prior to their inclusion in the study.

Consent for publication

The authors obtained informed consent for publication from all participants involved in the study. All participants provided written consent for their personal information and images to be published in this research article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ding, X., Wang, M., Wang, L. et al. Machine learning model for early prediction of acute kidney injury in heatstroke patients based on the first 24 h hospitalization data. Sci Rep 15, 33085 (2025). https://doi.org/10.1038/s41598-025-17590-1

Download citation

Received: 21 December 2024
Accepted: 25 August 2025
Published: 26 September 2025
DOI: https://doi.org/10.1038/s41598-025-17590-1