Machine learning based clinical decision tool to predict acute kidney injury and survival in therapeutic hypothermia treated neonates

Keles, Elif; Ali, Syed Yaseen; Wintermark, Pia; Annaert, Pieter; Groenendaal, Floris; Şahin, Suzan; Öncel, Mehmet Yekta; Armangil, Didem; Koc, Esin; Battin, Malcolm R.; Gunn, Alistair J.; Frymoyer, Adam; Chock, Valerie; Mekahli, Djalila; van den Anker, John; Smits, Anne; Allegaert, Karel; Bagci, Ulas

doi:10.1038/s41598-025-01141-9

Download PDF

Article
Open access
Published: 19 May 2025

Machine learning based clinical decision tool to predict acute kidney injury and survival in therapeutic hypothermia treated neonates

Elif Keles¹,
Syed Yaseen Ali^nAff1,
Pia Wintermark^nAff2,
Pieter Annaert^nAff3,
Floris Groenendaal⁵^nAff4,
Suzan Şahin^nAff6,
Mehmet Yekta Öncel^nAff7,
Didem Armangil^nAff8,
Esin Koc^nAff9,
Malcolm R. Battin^nAff10,
Alistair J. Gunn^nAff11,
Adam Frymoyer^nAff12,
Valerie Chock^nAff12,
Djalila Mekahli¹⁴^nAff13,
John van den Anker^nAff15,
Anne Smits¹³^nAff16,
Karel Allegaert^3,13,17^na1 &
…
Ulas Bagci^1,18,19^na1

Scientific Reports volume 15, Article number: 17278 (2025) Cite this article

3325 Accesses
1 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Therapeutic hypothermia (TH) significantly reduces mortality and morbidities in neonates with Neonatal Encephalopathy (NE). NE may result in neonatal death and multisystem organ impairment, including acute kidney injury (AKI). Our study aimed to utilize machine learning (ML) methods to predict the outcome of TH-treated NE neonates developing AKI and death during TH. In this retrospective multinational study, 1149 TH-treated NE neonates and 801 controls were included. AKI was classified using KDIGO neonatal criteria based on serum creatinine measurements. The ML model incorporated gestational age, birth weight, postnatal age, and serum creatinine values. The algorithm used all these covariates to predict one of five outcomes: survival with/without AKI, mortality with/without AKI, and hospitalized non-NE controls. The XGBoost model achieved an AUC of 95% and an accuracy of 75.08% in predicting AKI and survival, surpassing other ML classifiers that demonstrated accuracy levels ranging from 54% to 65%. To our knowledge this is the first ML model trained on multicenter, multinational data specifically aimed at predicting neonates’ AKI, death, and survival within the first three days. Our ML scoring systems’ code and user interface are freely available (https://github.com/NUBagciLab/Therapeutic-Hypothermia-Outcome-Classification, https://thprediction.streamlit.app/). This tool has potential to support neonatologists to personalize therapies, and to optimize pharmacotherapy for renally cleared drugs.

Risk factors for acute kidney injury in preterm neonates after noncardiac surgery: a single-center retrospective cohort study

Article Open access 02 August 2024

Evaluation of risk factors and outcomes of neonates with hypoxic ischemic encephalopathy and acute kidney injury

Article 04 August 2025

Neonatal acute kidney injury risk stratification score: STARZ study

Article 19 May 2021

Introduction

Birth asphyxia is characterized by a lack of oxygen with reduced brain blood flow around the time of birth, which can lead to Neonatal Encephalopathy (NE)¹. NE is a clinical syndrome of neurologic dysfunction that encompasses a broad spectrum of symptoms and severity, from mild irritability and feeding difficulties to coma and seizures. The global prevalence of NE ranges from 1 to 3.5 per 1000 live births in high-income countries (HICs) to 26 per 1,000 in low- and middle-income countries (LMICs)^2,3. In HICs, therapeutic hypothermia (TH) is an effective intervention that significantly reduces mortality and morbidities in neonates with moderate to severe NE⁴. However, NE is a multiorgan condition affecting more than just the central nervous system^5,6,7.

The kidneys are highly vulnerable to oxygen deprivation, with acute kidney injury (AKI) occurring commonly (30–60%) as part of the “perinatal asphyxia syndrome,” now classified under the new kidney disease: Improving Global Outcomes (KDIGO) definition^8,9,10,11,12. Growing evidence suggests that AKI is a significant risk factor for adverse long-term neurocognitive outcomes and increased mortality, longer hospitalization, and increased duration of mechanical ventilation^8,11,13. Prompt diagnosis of AKI in TH-treated NE neonates is important both for renal management and to help identifying neonates who are most likely to have poor outcomes^8,14. The neonatal modified KDIGO definition is currently the standard used in research and clinical practice. This definition is based on an increase in serum creatinine (sCr) or a reduction in urine output (UOP) (Table 1)¹². However, the KDIGO criteria are difficult to apply in neonates because at birth sCr reflects maternal sCr and is already elevated. The typical physiological changes after birth involve fall in sCr over the first few weeks of life. The precise measurement of UOP can also be difficult and it is frequently low on the first day of life. Moreover, oliguria is not always present in neonates with AKI^15,16,17. The definition of neonatal AKI is expected to evolve in the future. In our prior studies, we established the baseline serum creatinine values and changes in GFR and serum creatinine concentrations during TH for TH-treated NE neonates^18,19. However, there is still an unmet clinical need to predict the clinical outcome of TH-treated NE neonates, including survival and AKI status.

Table 1 Modified neonatal kidney disease: improving global outcomes (KDIGO) criteria.

Full size table

TH has been utilized to mitigate ischemic injury in kidney transplantation and after cardiopulmonary resuscitation but its impact on renal outcomes in neonates with NE is not well understood when NE has affected the kidney. The results of a meta-analysis of six TH trials of AKI in neonates with NE were inconclusive. A meta-analysis of six TH trials assessing the impact of TH on renal impairment did not find a statistically significant difference in the rate of renal impairment in cooled versus non-cooled neonates⁴. However, these studies were completed before the adoption of the KDIGO AKI classification for neonates, and the definitions of renal impairment varied among the studies¹⁵. A single center randomized controlled trial involving 120 term neonates with NE suggests that TH may reduce the risk of AKI (32% vs. 60%, p < 0.05)²⁰.

Further research has assessed the effects of NE and TH on the glomerular filtration rate (GFR) and drug clearance related to GFR. Renal clearance was reduced by 25–40% and up to 60% as measured by mannitol clearance^21,22. Except for one dataset on gentamicin clearance, most data have concentrated on the time when TH was being used (the first three days of life), with less evidence on later times. Thorough literature review revealed that increases in serum creatinine in this specific subpopulation were open to interpretation for the remainder of the first week of postnatal life¹⁶. It is also unlikely that additional studies comparing neonates with moderate to severe NE with and without TH would be conducted because TH has become the standard of care for NE treatment⁸. Supportive care should continue during TH, and further understanding drug clearance in neonates with NE is crucial because most medications, including antibiotics, inotropes, and antiepileptic drugs, are renally excreted^16,18,23.

The overall goal of our study was to predict the outcome of TH-treated NE neonates regarding survival and AKI based on their gestational age (GA), birth weight, postnatal age (PNA), and serum creatinine measurements within the initial 10 days of life. In this study we therefore propose a new machine learning (ML) method to predict TH-treated NE neonates’ clinical outcome, including survival and associated AKI status during TH after postnatal day 1. Figure 1 shows our overall framework, utilizing an input with four parameters, and outcome prediction as the output.

Results

All ML classifiers performed with an overall accuracy score (across all labels) of 54%–65% except for XGBoost, which predicted an outcome with 73% accuracy, which is why this classifier was selected to be further optimized (Table 2 and supplements).

Table 2 Machine learning classifiers and their explanations.

Full size table

The single classifier approach demonstrated superior precision and recall compared to the hierarchical classification approach, particularly for surviving neonates (Classes 1, 2 and 5) (Table 3).

Table 3 Performance metrics of hierarchical classification approach.

Full size table

In the hierarchical classification approach, precision scores ranged from 0.508 to 0.828, with lower precision for TH-treated NE neonates who died without AKI. Recall varied from 0.619 to 0.798, with TH-treated NE neonates who survived without AKI having the highest recall, F1-scores reflect a generally balanced performance (Table 3).

The single classifier strategy demonstrated a stronger precision relative to the hierarchical method, achieving a precision of 0.835 for TH-treated NE neonates who survived and had AKI. Recall varied between 0.505 and 0.803, with lower recall for TH-treated NE neonates who died without AKI. The F1-scores show that TH-treated NE neonates who survived and had AKI (0.809) exhibited the highest classification performance (Table 3). This single classifier provided the highest predictive capability, reiterated by precision, recall and F1 score (Table 3).

The first classifier in the hierarchical model struggled to predict TH treated NE neonates who died with or without AKI with recall, precision and F1 scores of the death cases ranging from 48%–56%, while in the single classifier model, they ranged between 51%–58% (Table 3). The only notable decrease observed with the single classifier model compared to the hierarchical model was in the ‘TH-treated NE neonates who died without AKI’ label, which had about a 12% decline in recall score. Despite this, the overall performance showed significant improvements. The classification performance improved across multiple labels, with some showing minor gains and others significant enhancements, leading to an overall increase in accuracy, precision, recall, and F1 scores (Table 3).

The classification of neonates who died without AKI had the lowest performance in both approaches, highlighting challenges to correctly identify this group. By contrast, TH-treated NE neonates who survived and had AKI had the best classification results across both approaches. Hospitalized (Control-non-NE) neonates were classified with relatively high scores in both models (Table 3; Figs. 2 and 3, Supplement).

Beyond accuracy, precision, recall and F1 scores, our model demonstrated excellent classification with a mean AUC of 0.95 ± 0.01 across 10-fold cross-validation. This indicates that the model has a high probability of correctly distinguishing between different outcomes in TH treated NE neonates (Fig. 4). The model becomes slightly more accurate as folds of cross-validation increase, indicating a continuous model improvement with an increase in training data, while simultaneously confirming the model’s generalizability. From 5, 10, to 20 folds of cross-validation, we observed 74.2%, 75.1%, and 75.1% accuracy, respectively. All the curves are close together and have high AUCs, which means the model performed consistently (Fig. 4). The calibration curve for the single classifier model showed good performance, except for class 4 (infants who died with AKI).

Our newly developed model demonstrates promising potential for predicting outcomes in TH treated NE neonates as evidenced by the confusion matrix, the model exhibits high accuracy in predicting survival with and without AKI and death with AKI. The matrix also highlights areas for improvement, particularly in predicting outcomes for TH treated NE neonates died without AKI. It originated from class imbalance (Fig. 3).

To further assess the model’s performance, the Matthews correlation coefficient (MCC) was calculated based on the confusion matrix. For our five-class classification, the overall MCC is 0.656, indicating a moderately strong positive correlation between the predicted and actual class labels (Table 4). Classes 2 and 4 which refer to TH-treated NE neonates survived with AKI and TH-treated NE neonates who died with AKI) have high MCC values (0.797 and 0.786). This suggests that the model is remarkably effective at identifying neonates with AKI compared to all other outcomes. Class 1 (TH-treated NE neonates who survived without AKI) had a lower MCC (0.502), and Class 3 (TH treated NE neonates who died without AKI) had a moderate MCC (0.571). Class 5, the control group (hospitalized neonates, non-NE), showed an MCC of 0.614, which is moderate. This result suggests the model is performing effectively, accurately classifying many instances across all classes.

Table 4 Matthews correlation coefficient (MCC) values for single classifier model and all 5 classes.

Full size table

We then examined the creatinine trends to understand the explainability of our models. On the first day, all TH-treated neonates had very similar sCr values. However, in the following days, these values diverged, reflecting different outcomes. Interestingly, the serum creatinine trends for TH-treated NE neonates who survived without AKI closely mirrored those of hospitalized control neonates until day 3 (Figs. 5 and 6).

Next, we analyzed the serum creatinine trends for TH-treated NE neonates with AKI, irrespective of survival status, along with TH-treated neonates without AKI and hospitalized control neonates (Fig. 5). The graph shows two distinct trend lines for serum creatinine: higher creatinine, stable trend and lower creatinine, decreasing trend. In the AKI groups, serum creatinine concentrations started higher and remained relatively stable over the 10-day period. This suggests persistent kidney injury in this group. In neonates without AKI, serum creatinine concentrations started lower and gradually decreased over time, indicating improving kidney function (Figs. 5 and 6). The divergence of these two trends was apparent early on, suggesting that serum creatinine concentrations are a valuable early indicator of AKI in NE neonates (Figs. 5 and 6).

We closely examined serum creatinine trends in the non-AKI groups and showed that serum creatinine trends overlapped during their NICU stay (Fig. 7). We also closely examined neonates who deceased and those who survived. The trend lines show that surviving neonates had lower serum creatinine concentrations as compared to those who passed away (Fig. 8).

Consistent with our previous research¹⁹ we report serum creatinine centile lines for all our data including TH treated NE neonates and hospitalized non-NE neonates (Fig. 9). To understand the model’s decision-making black box nature, we undertook feature importance analysis for both hierarchical and single classifier models for each class. Serum creatinine, and postnatal age interaction with serum creatinine play important roles in the hierarchical algorithmic decision-making (Fig. 10). Feature importance in single classifier model revealed serum creatinine and gestational age–creatinine interaction in decision making (Fig. 11). Our analysis revealed no linear relationship between the input variables. Instead, the models capture complex, nonlinear interactions that are critical for accurate predictions. These interactions are essential for the decision-making mechanism of the models, as they reflect the complex nature of physiological processes in neonates.

Discussion

AKI has been associated with worse outcomes in previous studies, including increased mortality, prolonged length of stay, increased need for mechanical ventilation, and more adverse neurocognitive outcomes^8,16,24. It is still difficult to rapidly assess renal function in this high-risk group in order to detect AKI and start suitable treatment promptly. Neonatologists focus on detecting AKI based on the sCr concentrations, urine output, urine biomarkers, and noninvasive near-infrared spectroscopy (NIRS) monitoring⁸. A study in of 53 neonates using blood biomarkers from the beginning of TH achieved AUC by 0.61 in 2 h of life for serum NGAL (neutrophil gelatinase-associated lipocalin) suggesting this has acceptable accuracy to identify developing AKI²⁵. Another study of 110 NE neonates combined serum and urine biomarkers to predict AKI from the 24 h of life. Among those markers a urinary NGAL achieved an AUC of 0.86 to predict AKI²⁶. Urine biomarkers were collected within the 12, 24, 48, and 72 h of life in 64 TH treated NE neonates²⁷. In this study urine KIM-1 (Kidney Injury Molecule-1) had an AUC of 0.79 at 48 h of life. Renal oxygen saturations were higher in the AKI group than non-AKI NE neonates, and renal saturation > 75% achieved an AUC of 0.76 within the 48 h of life²⁸. However, these methods are not practical or clinically adapted for early detection of AKI and other outcomes⁸. Our algorithm predicts AKI and death in TH-treated NE neonates with an AUC of 95% and an accuracy of 75.08% for AKI and survival, outperforming urine and blood biomarkers in predictive accuracy^26,27.

The present study highlights that diagnosing AKI in neonates requires a comprehensive approach that cannot solely rely on serum creatinine concentrations. Meticulous clinical monitoring can significantly enhance detection accuracy. Integrating diagnostic tools with clinical decision-making enables healthcare professionals to more effectively identify AKI, facilitating timely interventions that can improve patient outcomes. Given the limited availability of urine and blood biomarkers in practice in the NICU, our approach would not only ensure accurate diagnosis but also help guide evidence-based treatment decisions, ultimately enhancing patient care and improving outcomes.

At present, there are only two alternative neonatal and pediatric calculators to estimate risk of AKI, using Logistic regression methods, which are the Baby NINJA²⁹ and STARZ³⁰ calculator. Those two calculators use electronic health records (EHR). Baby NINJA is a warning system to decrease nephrotoxic events based on the EHR²⁹. The STARZ-Neonatal AKI risk stratification uses 10 predictors to predict the onset of AKI. STARZ was designed using 744 neonates’ data, including postnatal age at NICU, serum creatinine, sepsis, use of PPV (positive pressure ventilation), inotropes, urine output, furosemide use, the status of cardiac disease, gestational age and nephrotoxic drugs^30,31. Neither were developed to predict AKI in TH-treated NE neonates. Advancements in neonatal care have decreased the mortality of TH-treated NE neonates. However, the daily management of the multiorgan failure of these neonates, especially during the first days of life, remains important to help limit other morbidities. We expect that our tool will help physicians reduce the morbidities of TH-treated NE neonates. Our user interface will be available for the public and for research use to help predict clinical outcomes. Based on “gestational age, birth weight, postnatal day (within the first three days) and serum creatinine value (within the first three days)”, the system will give the potential clinical outcomes with probabilities.

Although the present algorithm achieved 95% of AUC, the reader should consider some limitations. Most recent developments in artificial intelligence rely on deep learning, which has become a game changer for many research fields, from computer science to healthcare³². The dataset defines which model could be suitable for the algorithm³². Deep learning typically requires much larger datasets, even if our dataset is the largest multicenter dataset described to date in this context^32,33. Our output is categorical, and the number of variables for each patient is limited for evaluating classifier performance in unbalanced datasets, traditional metrics like accuracy can be misleading. Instead, metrics such as precision, recall, F1-score, and confusion matrices provide a clearer picture of model effectiveness by considering both false positives and false negatives. Ensuring robust model evaluation is essential for developing reliable classification systems in clinical settings^32,34. Although the overall accuracy of 73% may appear modest, it must be interpreted in the context of complex, multi-class clinical outcomes. The inclusion of sensitivity, specificity, precision, and related metrics provides a nuanced evaluation and helps establish clinically acceptable thresholds. The Matthews correlation coefficient (MCC) offers a more comprehensive evaluation by accounting for all aspects of classification errors, providing a more robust and clinically meaningful interpretation of model performance³⁵.

Our model performed well in terms of Matthews correlation coefficient. No previous study has used any biomarker or predictive tool to obtain these outcomes with these specific metrics. For survivors without AKI (Class 1), the model achieves a high F1-score (0.783), reflecting strong precision and recall. However, the Matthews correlation coefficient (MCC) is notably lower (0.502), suggesting that while the model identifies many true positives, it may also misclassify a substantial number of non-Class 1 cases. This discrepancy indicates that F1 may overestimate performance in this scenario, as it does not account for true negatives (Tables 3 and 4).

In contrast, for survivors with AKI (Class 2), the alignment between F1 (0.809) and MCC (0.797) suggests a well-balanced classification with minimal misclassification errors. This consistency indicates that the model effectively identifies this outcome without introducing systematic bias toward a particular class (Tables 3 and 4). For neonates who died without AKI (Class 3), both F1 (0.537) and MCC (0.571) indicate moderate classification performance. While precision and recall remain suboptimal, the slightly higher MCC suggests a more favorable overall error distribution when considering the complete confusion matrix (Tables 3 and 4).

For neonates who died with AKI (Class 4), F1 (0.708) is moderate; however, MCC (0.786) is substantially higher. This indicates that although direct detection performance is reasonable, the overall classification—accounting for true negatives and error distribution—is stronger. In a clinical context, where distinguishing high-risk cases is critical, this robustness is particularly relevant (Tables 3 and 4).

For hospitalized control neonates (Class 5), F1 (0.729) exceeds MCC (0.614), suggesting that while precision and recall are favorable, the model’s overall classification performance diminishes when true negatives are considered. This again highlights the tendency of F1 to overestimate performance by underrepresenting misclassified negative cases (Tables 3 and 4).

These observations underscore the importance of using MCC alongside the F1-score for medical classification tasks. The discrepancies, particularly in Class 1 and Class 5, suggest that models may overestimate performance when evaluated solely on F1. Conversely, the alignment of MCC and F1 in Class 2 supports confidence that the model is providing balanced performance. In high-risk cases such as Class 4, the high MCC suggests that the model effectively differentiates these outcomes from others, even when precision and recall are only moderate.

Our current approach accounts for temporal trends through strategic data partitioning. However, it does not fully leverage the sequential nature of time-series data. Variables such as serum creatinine and postnatal age exhibit dynamic fluctuations that may be more effectively captured using sequential deep learning architectures, including recurrent neural networks, long short-term memory networks, or temporal convolutional networks. While dataset size precluded the implementation of these methods in the present study, future research should explore their potential to enhance temporal modeling and improve predictive performance in clinical applications.

A decline in performance between the hierarchical and single classifiers was observed in TH-treated NE neonates who died without AKI. This may be attributed to overlapping clinical features among subgroups and inherent model limitations in detecting subtle biomarker variations. Additional inputs, clinical variables may be necessary to enhance subgroup distinction and improve model performance. These lower numbers may indicate that the features used to characterize neonates without AKI—whether they survived or died—are less distinctive or more overlapping with those of other classes. Even though hospitalized non-NE controls share some similarities with certain NE groups in baseline clinical characteristics, making the classification task more challenging.

We encountered a data imbalance problem during algorithm development. We have started to develop a hierarchical model of four XGBoost³⁶ classifiers that make binary classification decisions in cascading order. To increase model performance, further oversampling methods and interaction factors were introduced. However, the model’s performance was not comparable or superior. Due to multiple attempts and combinations of methodologies, this model may not be improved. Data imbalance is a common issue in clinical machine learning that can negatively impact classification performance^37,38,39,40. Addressing this challenge requires tailored approaches, including data-level, algorithmic, and ensemble methods^37,38,39,40. Data-level techniques such as under-sampling and over-sampling, including SMOTE, help balance datasets, while hybrid methods combine these strategies for improved representation. In clinical applications, where datasets often contain healthier individuals than affected cases, these strategies improve detection in disease diagnosis. Integrating sampling techniques with algorithmic adjustments yields optimal results, though deep learning approaches require substantial computational resources. Future research should refine these strategies to enhance model generalizability in medical classification tasks.

There are also different methods to handle data imbalance problems, but their selection should be optimized according to the dataset and clinical scenario. In our predictive algorithm to address clinical questions, we used known predictors and aimed to fit them into a statistically viable algorithm. This has never been done before. Since we are already limited to certain predictors, we tried to utilize them to explore non-linear relationships between the variables, and to predict the patient outcome based on this non-linear relation. Different classifiers, even the same classifiers with different kernels (such SVM examples), the classifiers will behave differently. Based on what we observe in the behavior, we optimized all selected classifiers.

To address class imbalance, we applied a random oversampling technique; however, we acknowledge that this manual oversampling may not generalize across different clinical datasets. Moreover, our feature set—limited to gestational age, birth weight, postnatal age, and serum creatinine—may not capture the full complexity of neonatal outcomes. Additional factors, such as markers of hypoxic-ischemic encephalopathy (HIE) severity, inflammatory biomarkers, and resuscitation details, could further enhance model performance. Future studies should aim to incorporate these variables and explore more sophisticated oversampling or augmentation techniques.

Our input selection was guided by the availability and established clinical relevance of gestational age, birth weight, postnatal age, and serum creatinine. We acknowledge that additional factors—such as the severity of hypoxic-ischemic encephalopathy (HIE), the presence of PPHN, sepsis, neonatal resuscitation details, and other biomarkers—may significantly impact outcomes. These variables could not be used in the present analysis due to missing data. Future studies should integrate these predictors to enhance model performance and clinical applicability.

The serum creatinine centile trends were very similar, making it challenging to distinguish among these three groups based solely on creatinine values. This suggests that serum creatinine alone may not be the most reliable predictor of outcomes in these specific groups of neonates (Figs. 3 and 5).

Artificial intelligence implementations in pediatrics and neonatology have drawn attention to decision-making and clinical support systems³². Here, we performed an ML analysis on a multicenter international retrospective cohort of TH-treated NE neonates. Due to the data scarcity in pediatrics and neonatology³², a natural difficulty in such fields, we put our effort to train and test our algorithm on the largest dataset ever. We followed statistical evaluation paradigms (cross validation) to avoid any bias in evaluations. For reproducibility and generalizability, we share our code in GitHub and user interface publicly.

Further research is needed to evaluate the prospective use of our model. It is highly likely that its performance could be enhanced by incorporating additional variables and larger datasets.

Although sCr has its limitations as a biomarker of kidney function and injury, serial measures of sCr over the first week of life can help establish a pattern of renal function. We used the neonatal KDIGO definition to diagnose AKI, neonatal KDIGO is the most used standard definition for AKI. Our dataset consists of neonates from 1999 to 2021. The use of TH has likely improved over time and includes both selective head cooling to total body cooling. The dataset does not include details on demographics and pharmacotherapy during NICU.

As a collaborative team of neonatologists and AI scientists, we emphasize the critical role of clinical decision support systems and the importance of ‘human-in-the-loop’ approaches in medical applications. Tools like our prediction model offer clinicians valuable, objective data on AKI and survival status, while preserving clinician autonomy in the final decision-making process. As clinical decision support systems become more integral to medical practice, our tool addresses a crucial gap, particularly during the critical first 72 h of life for neonates undergoing TH.

We developed our model based on the first 10 days of life. With our supervised ML model, the model was able to predict clinical outcomes within the first 3 days with an AUC of 95%. In addition to careful monitoring of clinical parameters, this clinical decision tool might tailor future physiologically based therapeutic approaches or support precision medicine decisions. By providing insights into potential organ injury, our model encourages timely consideration of AKI and survival outcomes, potentially enabling earlier renal-protective interventions like methylxanthines⁴¹ and improving survival rates for affected neonates.

Methods

This study reanalyzed previously reported pooled datasets on sCr^18,19 (Tables 5 and 6). The initial study protocol was approved by the Ethics Committee Research of UZ /KU Leuven (S63365). Informed written consent was hereby waived. We confirm that all research was performed in accordance with relevant guidelines/regulations.

Table 5 Description of the cohorts of TH-treated NE neonates included in the pooled study¹⁹.

Full size table

Table 6 Description of the control cohort of hospitalized neonates included in the pooled study⁴⁸.

Full size table

Datasets in TH-treated NE neonates and non-TH-treated, non-NE control neonates

Data from 8 TH-treated NE cohorts were combined¹⁹ (Table 5). The first ten days of PNA were analyzed to assess recovery of kidney function over time. Day 1 was hereby defined as the date of delivery (from birth until 24 h). AKI detection was based on the KDIGO definition (any AKI): sCr↑ ≥0.3 mg/dL within 48 h or sCr↑ ≥1.5 fold versus the lowest prior sCr within 10 days, irrespective of urine output. Data on NE severity, fluid management, perinatal pharmacotherapy, comorbidity, or urine output were not available. sCr observations in controls were extracted from an already published sCr population model as a time-dependent covariate in neonates^{14,19,28,42,43,44,45,46,47,48}. The covariates collected in both datasets were restricted to birth weight, GA, survival (neonatal death, day 1–28, yes/no), and sCr values (day 1–10) to facilitate pooling. We have 1149 TH treated NE neonates with 5526 sCr observations within the first 10 days. Information on included cohorts of TH-treated NE neonates is provided in Table 5.

We analyzed serum creatinine (sCr) data from 801 control neonates with a GA of ≥ 36 weeks and postnatal age (PNA) of 1–10 days, collected from the neonatal intensive care unit of the University Hospitals Leuven (2007–2011) (Table 6). Neonates treated for infections, respiratory adaptation, and congenital malformations were included, excluding those receiving therapeutic hypothermia. The dataset yielded 2,881 sCr measurements within the first 10 days and relevant covariates (birth weight, GA).

The descriptive statistics of covariates for TH-treated NE neonates and control neonates are presented in Tables 7 and 8. The sCr values obtained by the Jaffe assay were converted to values equivalent to ones obtained by an isotope dilution mass spectrometry (IDMS) traceable enzymatic assay using the following Eq. (1): sCr_IDMS = 1.003 × sCr_Jaffe + 0.057, (conversion, 1 mg/dL = 88.4 µmol/L).

Table 7 Neonates’ number for each group in dataset.

Full size table

Table 8 Patient characteristics of NE and control neonates.

Full size table

Machine learning (ML)

The general basis of the ML model was to use the four types of neonatal data available in our multicenter international pooled dataset (GA, birth weight, PNA, and creatinine values) as input and to determine one of the five classes of outcomes that were described in our dataset, which are:

(1)
TH-treated NE neonates who survived without AKI.
(2)
TH-treated NE neonates who survived with AKI.
(3)
TH-treated NE neonates who died without AKI.
(4)
TH-treated NE neonates who died and had AKI.
(5)
Hospitalized neonates who did not need TH.

As the outcome of the prediction is categorical, a classification model would be ideal for our needs. Different classifiers were experimented to figure out the most capable model to engineer and yield the most optimal results⁴⁹ (Table 1 and Supplements). These models include Logistic Regression, Random Forest, Support Vector Classifier (SVC), Extreme Gradient Boosting (XGBoost), Gradient Boosting, Adaptive Boosting (AdaBoost), K-Nearest Neighbors (KNN), Decision Tree Classifier, Extra Trees Classifier, and a neural network⁴⁹.

In this study, a hierarchical model of four XGBoost classifiers was proposed as a simplification in the decision-making process of the classifier, as each classifier would be responsible for binary classification, and decisions would be made in cascading order, from the broadest decision to the eventual selection of a label, in a similar fashion to a decision tree. This model was further optimized using parameter optimization through GridSearch, interaction features derived from the input data, and oversampling of the classes where TH resulted in mortality since there was a large imbalance between survived and death cases of treatment. Oversampling was handled using the RandomOverSampler from the imbalance learning package (imblearn)⁵⁰. Although the oversampling did improve the classification of predicting neonatal deaths from day 1 remained the model’s overall weakness.

We take considerable steps to prevent data leaking and ensure that our model’s performance is accurate and applicable to new data.

Patient-level splitting

During the cross-validation procedure, we separated data from each patient. All data points (serum creatinine values) from a single patient were assigned to either the training or test set, never both. This ensures that the model does not “learn” from the same patient’s data in both sets by accident.

We used a method called stratified K-fold (StratifiedKFold) cross-validation, which ensures that the balance of groups is kept in each split.

Despite all this model tuning, an overall accuracy of no more than 73.5% was obtained. This is due to an expected drop-off in accuracy, even after tuning, when faced with cascading decision processes. Furthermore, the model was severely limited as the first and broadest classifier in the model, which predicted between alive or dead neonates, suffered the lowest accuracy due to data imbalance, which is survived neonates’ number are more than dead neonates (Table 7), ultimately capping the model’s performance at the level of its weakest classifier.

A reversion to a single XGBoost classifier predicting all five labels was made to negate this phenomenon of cascading error. Interaction features and GridSearch were also applied with slightly different parameters to account for differences in the model. Oversampling was programmed once again using RandomOverSampler, which functions manually this time by selecting the minority labels, specifying the desired sample count for the minority labels, then continuously randomly sampling from the minority classes until the ideal amount is reached and concatenating that set to the dataset for training. This was done since techniques such as the Synthetic Minority Oversampling Technique (SMOTE) yielded mediocre results. Patient-Level Splitting and Stratified Cross-Validation (StratifiedKFold) were once again used for testing. We employed a double cross-validation, wherein the model underwent k-fold cross-validation. Additionally, the dataset utilized comprised one of five stratified datasets, each with its corresponding test set, which was entirely excluded from the training process. After testing the respective training sets with their corresponding test sets, the results were averaged to obtain the current accuracy. This approach offers greater protection against bias for the model than traditional cross-validation methods. The model achieved an overall accuracy score of 75.1%. In evaluating classification performance, key metrics such as accuracy, precision, recall, F1-score, and AUC provide essential insights, each with distinct strengths and limitations (Table 9).

Table 9 Evaluation metrics in machine learning.

Full size table

Accuracy quantifies the proportion of correctly classified instances across all predictions but can be misleading in imbalanced datasets, where a model may achieve high accuracy by predicting only the majority class while failing to detect minority cases. Where true positives (TP) and true negatives (TN) represent correctly classified instances, while false positives (FP) and false negatives (FN) denote misclassified cases (Table 9).

While accuracy is intuitive, it has notable limitations, particularly in imbalanced datasets where one class significantly outweighs another. A model could achieve high accuracy simply by predicting the majority class while failing to detect the minority class altogether. In such scenarios, additional metrics like precision, recall, F1-score, and ROC-AUC are essential to provide a more comprehensive evaluation of model performance (Table 9).

‘Overall accuracy score’ is defined as the proportion of correctly predicted cases relative to the total number of cases. This metric provides an initial measure of model performance across all outcome classes, although its limitations in the context of imbalanced data are acknowledged. Accuracy indicates the overall proportion of correct predictions; AUC measures the model’s discriminative ability; MCC offers a balanced performance measure even with imbalanced classes; and precision, recall, and F1-scores provide insight into the true positive rates and error margins.

F1 Score is the harmonic mean of precision and recall. Detailed definitions were given below (Table 9).

AUC (Area Under the Curve) evaluates the classifier’s ability to distinguish between classes, with higher values indicating better discrimination. It remains useful even in imbalanced datasets, as it considers performance across varying decision thresholds.

A combination of these metrics is essential for a robust evaluation, particularly in medical classification tasks where both false positives and false negatives carry significant consequences.

MCC (Matthews Correlation Coefficient), in a one-vs‐all setting, incorporates all four elements of the confusion matrix (TP, TN, FP, FN). It is therefore more balanced, especially when there is an uneven distribution of negatives versus positives. MCC values range from − 1 to + 1, where + 1 represents perfect prediction, 0 reflects random prediction, and − 1 signifies complete disagreement between predictions and actual outcomes.

We also looked at the calibration curves of our single classifier models (Fig. 9). To assess our model’s performance on the imbalanced dataset, we calculated the Matthews correlation coefficient (MCC)³⁵. Recent research and applications highlight the robustness of MCC in multi-class problems, particularly in domains where class distributions are highly skewed. Studies suggest that MCC provides more reliable insights than metrics that may disproportionately favor the majority class.

This methodology benefited our model incorporating post-natal age (PNA) as a variable and utilizing multiple measurements per patient enabled the model to analyze the temporal variations of creatinine levels in relation to other clinical factors. We also inserted the creatinine percentiles within first ten days.

Comprehensive evaluation of classification performance

Our models were trained using four input variables: gestational age (GA), birth weight (BW), post-natal age (PNA), and creatinine values (Cr). The feature analysis was conducted to enhance the explainability of our models and to understand the interaction among these variables. We employed feature importance techniques and interaction bars to illustrate how these variables contribute to the model’s predictions.

User interface for practical application

We developed an intuitive user interface integrated with the final version of the trained ML model (XGBoost); code was made publicly available too (https://github.com/NUBagciLab/Therapeutic-Hypothermia-Outcome-Classification). The model takes input from all the metrics it was trained on (gestational age, birth weight, postnatal age, and creatinine level), with the option to choose between mg/dL and µmol/L for creatinine levels. When the “Predict” button is pressed, the model receives the input, which is used to predict the outcome TH (https://thprediction.streamlit.app/). It prints out one of five outputs, each message representing one of the five possible outcomes, followed by a confidence interval (probabilistic score). A confidence interval was implemented within the prediction model, which will provide us with a plausible range of an estimate to express the uncertainty of said estimate. This implementation involved a 100 bootstrap sample dataset to calculate the mean probability of the predicted outcome. Then, the standard deviation of the samples is computed to measure the variability in predictions, and the confidence interval is calculated using the 95% CI formula, where 95% of data lies within 1.96 standard deviations of the mean.

Odds ratios were also calculated to measure the association between exposure and an outcome, specifically the relative odds of death given the presence of AKI. This was calculated by first finding the odds of death in the case of infants with AKI and then the odds of death without AKI. The odds ratio is the ratio of these two odds, comparing the likelihood of death in patients with AKI against those without AKI (Figures in Supplements).

Data availability

User interface and code are freely and publicly available (https://github.com/NUBagciLab/Therapeutic-Hypothermia-Outcome-Classification, https://thprediction.streamlit.app/). Dataset is available from the corresponding author upon reasonable request.

References

LaRosa, D. A., Ellery, S. J., Walker, D. W. & Dickinson, H. Understanding the full spectrum of organ injury following intrapartum asphyxia. Front. Pead. 5, 16 (2017).
Google Scholar
Gale, C. et al. Neonatal brain injuries in England: population-based incidence derived from routinely recorded clinical data held in the National neonatal research database. Arch. Dis. Child. Fetal Neonatal Ed. 103, F301–F306 (2018).
Article PubMed Google Scholar
Chakkarapani, A. A. et al. Therapies for neonatal encephalopathy: targeting the latent, secondary and tertiary phases of evolving brain injury. Semin Fetal Neonatal Med. 26, 101256 (2021).
Article PubMed Google Scholar
Jacobs, S. E. et al. Cooling for newborns with hypoxic ischaemic encephalopathy. Cochrane Database Syst. Rev. 2013, CD003311 (2013).
van Wincoop, M., de Bijl-Marcus, K., Lilien, M., van den Hoogen, A. & Groenendaal, F. Effect of therapeutic hypothermia on renal and myocardial function in asphyxiated (near) term neonates: A systematic review and meta-analysis. PLoS ONE. 16, e0247403 (2021).
Article PubMed PubMed Central Google Scholar
Polglase, G. R., Ong, T. & Hillman, N. H. Cardiovascular alterations and multiorgan dysfunction after birth asphyxia. Clin. Perinatol. 43, 469–483 (2016).
Article PubMed PubMed Central Google Scholar
Iribarren, I., Hilario, E. & Alvarez, A. Alonso-Alconada, D. Neonatal multiple organ failure after perinatal asphyxia. Pediatr. (Engl Ed). 97, 280 (2022).
Google Scholar
Segar, J. L. et al. Fluid management, electrolytes imbalance and renal management in neonates with neonatal encephalopathy treated with hypothermia. Semin Fetal Neonatal Med. 26, 101261 (2021).
Article PubMed Google Scholar
Kirkley, M. J. et al. Acute kidney injury in neonatal encephalopathy: an evaluation of the AWAKEN database. Pediatr. Nephrol. 34, 169–176 (2019).
Article PubMed Google Scholar
Robertsson Grossmann, K., Barany, P., Blennow, M. & Chromek, M. Acute kidney injury in infants with hypothermia-treated hypoxic-ischaemic encephalopathy: an observational population-based study. Acta Paediatr. 111, 86–92 (2022).
Article PubMed Google Scholar
Bozkurt, O. & Yucesoy, E. Acute kidney injury in neonates with perinatal asphyxia receiving therapeutic hypothermia. Am. J. Perinatol. 38, 922–929 (2021).
Article PubMed Google Scholar
Global, K. D. I. & Group, O. K. .I.W. KDIGO clinical practice guideline for acute kidney injury. Kidney Int. Supplements. 2, 1 (2012).
Google Scholar
Tanigasalam, V., Bhat, V., Adhisivam, B. & Sridhar, M. G. Does therapeutic hypothermia reduce acute kidney injury among term neonates with perinatal asphyxia?—a randomized controlled trial. J. Matern Fetal Neonatal Med. 29, 2545–2548 (2016).
Article PubMed Google Scholar
Chock, V. Y., Cho, S. H. & Frymoyer, A. Aminophylline for renal protection in neonatal hypoxic-ischemic encephalopathy in the era of therapeutic hypothermia. Pediatr. Res. 89, 974–980 (2021).
Article CAS PubMed Google Scholar
Zappitelli, M. et al. Developing a neonatal acute kidney injury research definition: a report from the NIDDK neonatal AKI workshop. Pediatr. Res. 82, 569–573 (2017).
Article PubMed PubMed Central Google Scholar
Borloo, N., Smits, A., Thewissen, L., Annaert, P. & Allegaert, K. Creatinine trends and patterns in neonates undergoing whole body hypothermia: A systematic review. Child. (Basel). 8, 1 (2021).
Google Scholar
Quaedackers, J. S. et al. Polyuria and impaired renal blood flow after asphyxia in preterm fetal sheep. Am. J. Physiology-Regulatory Integr. Comp. Physiol. 286, R576–R583 (2004).
Article CAS Google Scholar
Krzyzanski, W. et al. A population model of time-dependent changes in serum creatinine in (near)term neonates with hypoxic-ischemic encephalopathy during and after therapeutic hypothermia. AAPS J. 26, 4 (2023).
Article PubMed Google Scholar
Keles, E. et al. Serum creatinine patterns in neonates treated with therapeutic hypothermia for neonatal encephalopathy. Neonatology 119, 686–694 (2022).
Article CAS PubMed Google Scholar
Tanigasalam, V., Bhat, V., Adhisivam, B. & Sridhar, M. G. Does therapeutic hypothermia reduce acute kidney injury among term neonates with perinatal asphyxia? A randomized controlled trial. J. Maternal-Fetal Neonatal Med. 29, 2545–2548 (2016).
Article Google Scholar
Lutz, I. C., Allegaert, K., de Hoon, J. N. & Marynissen, H. Pharmacokinetics during therapeutic hypothermia for neonatal hypoxic ischaemic encephalopathy: a literature review. BMJ Paediatr. Open. 4, e000685 (2020).
Article PubMed PubMed Central Google Scholar
Deferm, N. et al. Glomerular filtration rate in asphyxiated neonates under therapeutic whole-body hypothermia, quantified by mannitol clearance. Clin. Pharmacokinet. 60, 897–906 (2021).
Article CAS PubMed PubMed Central Google Scholar
Leys, K. et al. Pharmacokinetics during therapeutic hypothermia in neonates: from pathophysiology to translational knowledge and physiologically-based Pharmacokinetic (PBPK) modeling. Expert Opin. Drug Metab. Toxicol. 19, 461–477 (2023).
Article CAS PubMed Google Scholar
Selewski, D. T., Jordan, B. K., Askenazi, D. J., Dechert, R. E. & Sarkar, S. Acute kidney injury in asphyxiated newborns treated with therapeutic hypothermia. J. Pediatr. 162, 725–729 (2013).
Article PubMed Google Scholar
Oso, B. I., Oseni, S. B. A., Aladekomo, T. A., Adedeji, T. A. & Olowu, W. A. Predictive ability of 2-h serum neutrophil gelatinase-associated Lipocalin for perinatal asphyxia-induced acute kidney injury. Pediatr. Nephrol. 39, 283–289 (2024).
Article PubMed Google Scholar
Zhang, Y., Zhang, B., Wang, D., Shi, W. & Zheng, A. Evaluation of novel biomarkers for early diagnosis of acute kidney injury in asphyxiated full-term newborns: a case-control study. Med. Princ Pract. 29, 285–291 (2020).
Article PubMed Google Scholar
Rumpel, J. et al. Urine biomarkers for the assessment of acute kidney injury in neonates with hypoxic ischemic encephalopathy receiving therapeutic hypothermia. J. Pediatr. 241, 133–140 (2022). e133.
Article CAS PubMed Google Scholar
Chock, V. Y., Frymoyer, A., Yeh, C. G. & Van Meurs, K. P. Renal saturation and acute kidney injury in neonates with hypoxic ischemic encephalopathy undergoing therapeutic hypothermia. J. Pediatr. 200, 232–239e231 (2018).
Article PubMed Google Scholar
Stoops, C. et al. Baby NINJA (nephrotoxic injury negated by just-in-time action): reduction of nephrotoxic medication-associated acute kidney injury in the neonatal intensive care unit. J. Pediatr. 215, 223–228e226 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sethi, S. K. et al. Validation of the STARZ neonatal acute kidney injury risk stratification score. Pediatr. Nephrol. 37, 1923–1932 (2022).
Article PubMed Google Scholar
Dhooria, G. S. et al. Validation of the STARZ neonatal acute kidney injury risk stratification score in an independent prospective cohort. J. Neonatal-Perinatal Med. 15, 777–785 (2022).
Article PubMed Google Scholar
Keles, E. & Bagci, U. The past, current, and future of neonatal intensive care units with artificial intelligence: a systematic review. NPJ Digit. Med. 6, 220 (2023).
Article PubMed PubMed Central Google Scholar
Bagci, U., Irmakci, I., Demir, U. & Keles, E. Building blocks of AI. In AI in Clinical Medicine: A Practical Guide for Healthcare Professionals 56–65 (2023).
Hicks, S. A. et al. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 12, 5979 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Huang, S. & Foody, G. M. Challenges in the real world use of classification accuracy metrics: from recall and precision to the Matthews correlation coefficient. PLoS ONE. 18, 1 (2023).
Google Scholar
Chen, T., Guestrin, C. & Xgboost A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (2016).
Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N. & Asadpour, M. Boosting methods for multi-class imbalanced data classification: an experimental review. J. Big Data. 7, 1 (2020).
Article Google Scholar
Rendón, E., Alejo, R., Castorena, C., Isidro-Ortega, F. J. & Granda-Gutiérrez, E. E. Data sampling methods to deal with the big data multi-class imbalance problem. Appl. Sci. 10, 1 (2020).
Article Google Scholar
Kaur, H., Pannu, H. S. & Malhi, A. K. A systematic review on imbalanced data challenges in machine learning. ACM Comput. Surveys. 52, 1–36 (2019).
Google Scholar
Halimu, C., Kasem, A. & Newaz, S. H. S. Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. In Proceedings of the 3rd International Conference on Machine Learning and Soft Computing 1–6 (2019).
Bhatt, G. C., Gogia, P., Bitzan, M. & Das, R. R. Theophylline and aminophylline for prevention of acute kidney injury in neonates and children: a systematic review. Arch. Dis. Child. 104, 670–679 (2019).
Article PubMed Google Scholar
Smits, A., Annaert, P., Van Cruchten, S. & Allegaert, K. A physiology-based Pharmacokinetic framework to support drug development and dose precision during therapeutic hypothermia in neonates. Front. Pharmacol. 11, 587 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cristea, S. et al. Amikacin pharmacokinetics to optimize dosing in neonates with perinatal asphyxia treated with hypothermia. Antimicrob. Agents Chemother. 61, 1. https://doi.org/10.1128/aac.01282-01217 (2017).
Article CAS Google Scholar
Groenendaal, F. et al. Introduction of hypothermia for neonates with perinatal asphyxia in the Netherlands and Flanders. Neonatology 104, 15–21 (2013).
Article PubMed Google Scholar
Gluckman, P. D. et al. Selective head cooling with mild systemic hypothermia after neonatal encephalopathy: multicentre randomised trial. Lancet 365, 663–670 (2005).
Article PubMed Google Scholar
Oncel, M. Y. et al. Urinary markers of acute kidney injury in newborns with perinatal asphyxia. Ren. Fail. 38, 882–888 (2016).
Article CAS PubMed Google Scholar
Frymoyer, A. et al. Theophylline dosing and pharmacokinetics for renal protection in neonates with hypoxic-ischemic encephalopathy undergoing therapeutic hypothermia. Pediatr. Res. 88, 871–877 (2020).
Article CAS PubMed PubMed Central Google Scholar
Krzyzanski, W., Smits, A., Van Den Anker, J. & Allegaert, K. Population model of serum creatinine as time-dependent covariate in neonates. AAPS J. 23, 86 (2021).
Article CAS PubMed Google Scholar
Dreiseitl, S. & Ohno-Machado, L. Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inform. 35, 352–359 (2002).
Article PubMed Google Scholar
He, H. & Ma, Y. Imbalanced learning: foundations, algorithms, and applications (2013).
Cox, D. R. The regression analysis of binary sequences. J. Royal Stat. Soc. Ser. B Stat. Methodol. 20, 215–232 (1958).
Article MathSciNet MATH Google Scholar
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques To Build Intelligent Systems (O’Reilly Media, 2019).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article MATH Google Scholar
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Article MATH Google Scholar
Quinlan, J. R. Induction of decision trees. Mach. Learn. 1, 81–106 (1986).
Article Google Scholar
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
Article MathSciNet MATH Google Scholar
Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory. 13, 21–27 (1967).
Article MATH Google Scholar
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
Article MATH Google Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Article ADS MATH Google Scholar
Akisu, M., Kumral, A. & Canpolat, F. E. Turkish neonatal society guideline on neonatal encephalopathy. Turk. Pediatr. Ars. 53, S32–s44 (2018).
Article Google Scholar
La Haye-Caty, N. et al. Impact of restricting fluid and sodium intake in term asphyxiated newborns treated with hypothermia. J. Maternal-Fetal Neonatal Med. 33, 3521–3528 (2020).
Article Google Scholar
Shankaran, S. et al. Whole-body hypothermia for neonates with hypoxic-ischemic encephalopathy. N Engl. J. Med. 353, 1574–1584 (2005).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

Anne Smits’ research activities receive support from a Senior Clinical Investigatorship awarded by the Research Foundation, Flanders (FWO) (18E2H24N).Anne Smits is an author but she wanted to mention this in " Acknowledgement”.

Funding

This work is partially supported by the NIH (National Institutes of Health) grants: R01-CA246704, R01-CA240639, U01-DK127384-02S1, and U01-CA268808.PI: Bagci.

Author information

Syed Yaseen Ali
Present address: Department of Radiology, Feinberg School of Medicine, Northwestern University, 737 N. Michigan Avenue, Suite 1600, Chicago, IL, 60611, USA
Pia Wintermark
Present address: Division of Newborn Medicine, Department of Pediatrics, Montreal Children’s Hospital, Research Institute of the McGill University Health Centre, McGill University, Montreal, QC, Canada
Pieter Annaert
Present address: Department of Pharmaceutical and Pharmacological Sciences, KU Leuven, Leuven, Belgium
Floris Groenendaal
Present address: Department of Neonatology, Wilhelmina Children’s Hospital, University Medical Center Utrecht, and Utrecht University, Utrecht, The Netherlands
Suzan Şahin
Present address: Department of Neonatology, Faculty of Medicine, Izmir Demokrasi University, Izmir, Turkey
Mehmet Yekta Öncel
Present address: Department of Neonatology, Faculty of Medicine, İzmir Katip Çelebi University, İzmir, Turkey
Didem Armangil
Present address: Neonatal Intensive Care Unit, Koru Hospital, Ankara, Turkey
Esin Koc
Present address: Department of Neonatology, Faculty of Medicine, Gazi University, Ankara, Turkey
Malcolm R. Battin
Present address: Newborn Service, Auckland City Hospital, Health New Zealand, Auckland, New Zealand
Alistair J. Gunn
Present address: Department of Physiology, University of Auckland, Auckland, New Zealand
Adam Frymoyer & Valerie Chock
Present address: Neonatal and Developmental Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
Djalila Mekahli
Present address: Department of Development and Regeneration, KU Leuven, Leuven, Belgium
John van den Anker
Present address: Division of Clinical Pharmacology, Children’s National Hospital, Washington, DC, USA
Anne Smits
Present address: Neonatal Intensive Care Unit, University Hospitals Leuven, Leuven, Belgium
These authors jointly supervised to this work: Karel Allegaert and Ulas Bagci.

Authors and Affiliations

Department of Radiology, Feinberg School of Medicine, Northwestern University, 737 N. Michigan Avenue, Suite 1600, Chicago, IL, 60611, USA
Elif Keles & Ulas Bagci
Department of Pharmaceutical and Pharmacological Sciences, KU Leuven, Leuven, Belgium
Karel Allegaert
Utrecht Brain Center, University Medical Center Utrecht, Utrecht, The Netherlands
Floris Groenendaal
Department of Development and Regeneration, KU Leuven, Leuven, Belgium
Anne Smits & Karel Allegaert
Department of Pediatric Nephrology, University Hospitals, Leuven, Belgium
Djalila Mekahli
Department of Hospital Pharmacy, Erasmus MC University Medical Center, Rotterdam, The Netherlands
Karel Allegaert
Department of Biomedical Engineering, Northwestern University, Chicago, USA
Ulas Bagci
Department of Electrical and Computer Engineering, Northwestern University, Chicago, USA
Ulas Bagci

Authors

Elif Keles
View author publications
Search author on:PubMed Google Scholar
Syed Yaseen Ali
View author publications
Search author on:PubMed Google Scholar
Pia Wintermark
View author publications
Search author on:PubMed Google Scholar
Pieter Annaert
View author publications
Search author on:PubMed Google Scholar
Floris Groenendaal
View author publications
Search author on:PubMed Google Scholar
Suzan Şahin
View author publications
Search author on:PubMed Google Scholar
Mehmet Yekta Öncel
View author publications
Search author on:PubMed Google Scholar
Didem Armangil
View author publications
Search author on:PubMed Google Scholar
Esin Koc
View author publications
Search author on:PubMed Google Scholar
Malcolm R. Battin
View author publications
Search author on:PubMed Google Scholar
Alistair J. Gunn
View author publications
Search author on:PubMed Google Scholar
Adam Frymoyer
View author publications
Search author on:PubMed Google Scholar
Valerie Chock
View author publications
Search author on:PubMed Google Scholar
Djalila Mekahli
View author publications
Search author on:PubMed Google Scholar
John van den Anker
View author publications
Search author on:PubMed Google Scholar
Anne Smits
View author publications
Search author on:PubMed Google Scholar
Karel Allegaert
View author publications
Search author on:PubMed Google Scholar
Ulas Bagci
View author publications
Search author on:PubMed Google Scholar

Contributions

Elif Keles, Ulas Bagci, and Karel Allegaert conceptualized and designed the study. Karel Allegaert was responsible for data collection, Elif Keles and Syed Yaseen Ali were responsible for data curation and organization. Elif Keles, Pia Wintermark, Floris Groenendaal, Anne Smits, Pieter Annaert, Suzan Sahin, Mehmet Yekta Oncel, Valerie Y. Chock, Didem Armangil, Esin Koc, Malcolm R. Battin, Alistair J Gunn, Adam Frymoyer provided data from the different cohorts, and Pieter Annaert, John van den Anker provided input on the final version of the study design. Syed Yaseen Ali, Elif Keles, and Ulas Bagci conducted the initial analysis for machine learning model development. Elif Keles, Syed Yaseen Ali, and Ulas Bagci drafted the initial manuscript. All authors contributed to the interpretations of the data and analyses, provided input on the manuscript, approved the final manuscript as submitted, and agreed to be accountable for all aspects of the work.Funding is provided by Bagci Lab.

Corresponding author

Correspondence to Elif Keles.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Keles, E., Ali, S.Y., Wintermark, P. et al. Machine learning based clinical decision tool to predict acute kidney injury and survival in therapeutic hypothermia treated neonates. Sci Rep 15, 17278 (2025). https://doi.org/10.1038/s41598-025-01141-9

Download citation

Received: 16 November 2024
Accepted: 05 May 2025
Published: 19 May 2025
DOI: https://doi.org/10.1038/s41598-025-01141-9

This article is cited by

Advances in pediatric acute kidney injury detection and prediction: biomarkers and artificial intelligence
- Manson Chon In Kuok
- Winnie Kwai Yu Chan
World Journal of Pediatrics (2025)