Introduction

Intraventricular hemorrhage (IVH), a leading cause of poor neurodevelopmental outcomes, is prevalent in preterm infants with gestational ages (GAs) of < 32 weeks and in very low birth weight (VLBW) infants (< 1,500 g). However, despite advances in neonatal intensive care and the increasing survival rates of preterm infants, the incidence rate of IVH has remained high, ranging from 15 to 40%, depending on the facility1,2,3. Both severe IVH and mild-to-moderate IVH have been reported to increase the risk of cerebral palsy and cognitive impairment. Thus, efforts have been made to identify the risk factors for developing IVH and methods for early detection3,4,5,6,7,8.

The majority of IVH occurs within the first postnatal week, usually within the first 3 days after birth3,9. Immaturity and vulnerability to ischemia due to impaired autoregulation mainly account for the development of IVH in preterm infants. White matter injury is a frequent sequela. To the best of our knowledge, although prior evidence showed that early hemodynamic shifts could affect IVH occurrence, no reliable method is available for continuously estimating IVH risks. Thus, methods to evaluate the immediate effects of interventions, which would enable therapeutic options to prevent IVH, are warranted3,10,11,12.

Artificial intelligence is now being widely used in medical research, offering a more individualized approach with the capacity to handle massive amounts of data compared to traditional risk stratification. The methodological revolution has also prompted the development of predictive modeling in neonatal intensive care13,14,15. The real-time analysis of the probability of developing IVH is expected to assist clinicians in timely interventions and follow-ups. The present study aimed to develop a reliable machine-learning model that could accurately detect IVH using time-series data, which could be continuously retrieved.

Materials and methods

Study setting and data collection

We retrospectively analyzed time-series data for the first 2 weeks after birth in preterm infants born at less than 32 weeks of GA admitted to the neonatal intensive care unit (NICU) of a single center in a tertiary university-affiliated hospital between January 2013 and June 2022. The study was approved by the Institutional Review Board of the Seoul National University Bundang Hospital (approval no. B-2207-771-102) in accordance with the Declaration of Helsinki. The requirement for informed consent was waived for this de-identified retrospective analysis.

The following variables were extracted from our institution’s data warehouse: baseline demographics (birth weight, gender, GA, Apgar score at 1 min after birth [AS1], Agar score at 5 min after birth [AS5], prenatal history (maternal age, in vitro fertilization, the presence of premature rupture of membrane, pregnancy-induced hypertension, gestational diabetes mellitus, histological chorioamnionitis, and the administration of antenatal steroids), resuscitation in the delivery room, surfactant use, medication use (inotropes, antibiotics, sedatives, neuromuscular blockers, and systemic steroids), respiratory support with ventilator parameters (ventilator modes, mean airway pressure, and fraction of inspired oxygen), laboratory findings (pH, hemoglobin, potassium, chloride, and bicarbonate levels), and vital signs (systolic blood pressure [SBP], diastolic blood pressure [DBP], mean blood pressure [MBP], heart rate [HR], respiratory rate [RR], body temperature [BT], and percutaneous saturation of oxygen [SpO2]). Time-series data recorded after IVH diagnosis, including data from the re-admission of the same patient, were excluded. Thirty patients with a less meaningful observation period of less than 24 h were also excluded.

Serial cranial ultrasounds were performed in accordance with the study institution’s protocols. Initial screening was obtained within 24 h after birth. Routine follow-up was repeated weekly until 32 weeks postmenstrual age and every 2 weeks thereafter. An additional follow-up ultrasound was conducted if clinically necessary, particularly in cases of unexpected events compromising the perfusion status. A median of 2.75 days after birth was taken for the initial IVH diagnosis in the study cohort. Experienced radiologists specializing in pediatric radiology reported the presence of IVH based on Papile’s classification16.

Data preprocessing

Time-series data were solely used for model development and validation, as categorical variables showed substantial differences in distribution between the groups with and without IVH. Laboratory findings and clinical information, such as the mode of respiratory support, with large sampling intervals, were excluded from the input variables used to build the machine-learning model. Linear interpolation was used to match time-series data in a 1-hour frame. The administration of medication was assumed to be maintained for 24 h once initiated. The observation period was set to be from birth to the first diagnosis of IVH in each patient. However, each measurement grouped by patient was zero-padded to equalize the input duration to 317 h, which was the maximum observation period of the patients, to minimize the missing values. Because the feature distributions were not validated to follow normal or Gaussian distribution, normalization was applied for scaling, utilizing the minimum and maximum feature values.

The patient group without IVH was approximately 10-fold greater in size than the group with IVH. We employed the synthetic minority over-sampling technique (SMOTE), an oversampling technique that balances distribution by supplementing the number of minority group samples to avoid overfitting problems from the unbalanced data17.

Model development and validation

Automated machine learning (AutoML) has recently been highlighted since it automates the process from feature extraction and model selection to model training with hyperparameter optimization18. We built a pipeline of model development and validation using the AutoML method and TabularPredictor from the AutoGluon package in Python 3.10.12. A hold-out strategy with a data fraction of 0.2 was applied using the AutoML method, and nine models were fitted. Temperature scaling was performed to calibrate the models. The following 14 features were selected for model development: SBP, DBP, MBP, HR, RR, BT, SpO2, mean airway pressure, FiO2, and medication use (inotropes, antibiotics, sedatives, neuromuscular blockers, and systemic steroids). The entire dataset was divided into a training set for model training and a test set for validation at a 7:3 ratio (Fig. 1).

Fig. 1
Fig. 1
Full size image

The architecture of model development and validation of machine learning. A total of 20 time-series data features derived from clinical information were used for model training. The entire dataset was split into a training set for model training and a test set for validation at a 7:3 ratio. Given the relatively small proportion of neonates with intraventricular hemorrhage (IVH), the synthetic minority oversampling technique was performed to improve classification performance. An automated machine-learning method was used to build a pipeline from feature selection to model training and hyperparameter tuning. IVH, intraventricular hemorrhage; SMOTE, synthetic minority oversampling technique; AutoML automated machine learning.

The models were developed using the following algorithms: K-nearest neighbors, random forest, Extra Trees, weighted ensembles, and neural networks. Python version 3.10.12 (Python Software Foundation, Beaverton, OR, USA; https://www.python.org) and open libraries, including Pandas, Keras, Pytorch, Numpy, and Scikit-learn, were utilized for data preprocessing and machine learning.

Statistical analyses

The categorical variables are expressed as frequencies with proportions and compared using the Chi-squared test, whereas continuous variables are presented as medians with interquartile ranges (IQRs) and compared using the Mann-Whitney U test. Statistical significance was set at a P-value of < 0.05. Precision scores were mainly used to compare model performance. The area under the receiver operating characteristic (ROC) curves was also calculated. An ROC curve and a precision-recall curve with a confusion matrix of the best-performing model were generated. R software version 4.3.1 and Python version 3.10.4 were used to analyze the baseline characteristics.

Results

Baseline characteristics

A cohort of 778 preterm infants born at less than 32 weeks of GA was included for retrospective analysis. Table 1 describes the demographics and characteristics of the patients. The cohort had an IVH incidence rate of 10.2%. No significant differences in baseline characteristics were seen in the groups with and without IVH, except for GA, birth weight, resuscitation in the delivery room, the use of surfactants, AS5, and the duration of NICU stay. Resuscitation in the delivery room, including positive pressure ventilation and more advanced management with endotracheal intubation or chest compression, was significantly associated with the development of IVH. GA (P < 0.001), birth weight (P < 0.001), respiratory distress syndrome requiring surfactants (P < 0.001), chorioamnionitis (P = 0.017), low AS5 under 7 (P < 0.001), invasive mechanical ventilation within 24 h after admission (P < 0.001), the use of inotropes (P = 0.049) and sedatives (P = 0.042) within 24 h after admission, and hemoglobin levels at the initial admission (P = 0.003) were also found to be associated with IVH development, as reported in preliminary studies1,6,7,8,9,12. The NICU stay duration was significantly longer in the group with IVH than in the group without IVH (105 (64−133) days vs. 58 (42−84) days, P < 0.001).

Table 1 Demographics and characteristics of the patients.

We examined the trends in vital signs within a week (Fig. 2). An earlier discrepancy in hemodynamic variables, especially within 72 h after birth, was observed between the groups with IVH and without IVH, consistent with a prior study identifying infants at high risk of IVH8.

Fig. 2
Fig. 2
Full size image

Comparison of time-series vital signs data between infants with and without intraventricular hemorrhage (IVH). Early hemodynamic change within 72 h showed an apparent difference between the two groups. The vital signs of infants with IVH are represented by orange lines with 95% confidence intervals (CI) in light orange, whereas those of infants without IVH are plotted in blue lines with 95% CIs in light blue. IVH, intraventricular hemorrhage.

Outcomes

Thirty infants who had been observed for less than 24 h (n = 30, 1.23%) were excluded from the model development. Table 2 summarizes the performance of the models selected using AutoML. The Extra Trees Classifier model processed with the “entropy” criterion presented the best-classifying performance, with an average F1 score of 0.93 and an AUROC of 0.999. The confusion matrix, ROC curve, and precision-recall curve of the model are shown in Fig. 3.

Table 2 Performance of the developed models using an automated machine-learning method.
Fig. 3
Fig. 3
Full size image

Classification performance of the Extra Trees Classifier model. (a) A confusion matrix, (b) receiver operating characteristic curve, and (c) precision-recall curve were produced for the best-performing model. AUROC, area under the receiver operating curve; AUPRC, area under the precision-recall curve.

Discussion

In our study cohort, respiratory distress syndrome requiring surfactants, resuscitation in the delivery room, chorioamnionitis, low AS5, the use of inotropes and sedatives within 24 h after admission, invasive mechanical ventilation within 24 h after admission, and hemoglobin levels at the initial admission were found to be significantly associated with IVH, consistent with previous studies1,5,7,8,9,19. The independent risks of IVH after adjustment were not analyzed thoroughly because a risk analysis was not the primary aim of the study. Notably, an early hemodynamic shift after birth was observed to be related to IVH development, which was in good agreement with earlier findings10,11. Thus, time-series data with 14 features were selected for model development. After excluding models with an AUROC of exactly 1.0, the best-performing model showed an F1 score of 0.93 and an AUROC of 0.999. The sensitivity and specificity of the model were 100% and 98%, respectively.

Even though preclinical and clinical studies have investigated interventions to modulate the course of dysmaturation with white matter injury preceded by IVH, how to prevent IVH is still a conundrum in neonatal critical care9. A few pharmacological therapies, including the administration of perinatal steroids, perinatal tocolytics, and postnatal low-dose indomethacin, were found to have protective benefits1,20,21,22. No evidence of a protective effect from the administration of prenatal steroids was found in the present study. This finding may be attributable to the frequent use of antenatal steroids in both groups because of elaborately established indications and guidelines for their use. Previous studies suggested that the timely intervention and delicate management of cardiopulmonary complications of prematurity in the early phase were critical1,10,11. These findings with earlier hemodynamic shifts in the IVH group reinforce the importance of rapid stabilization through identification and intervention (Fig. 2). A case-control study conducted by Lee et al.1 suggested that metabolic acidosis and the use of inotropes were related to an increased risk of IVH. In recent work using retrospective regression analysis to predict IVH in infants with respiratory distress syndrome, acidosis and hypoxia were reported to increase the risk of IVH5. However, heterogeneous etiologies, a lack of specific indicators at the time of occurrence, and tools for continuous monitoring risks may all contribute to limitations in conducting clinical trials3. The utilization of neuroimaging, including cranial ultrasound, also has limitations in detecting mild-to-moderate IVH at the time of its occurrence4,23,24. Huvanandana et al.7 performed a time-series analysis in 10-minute windows to predict IVH with earlier hemodynamic changes after birth. Using logistic regression, the highest AUROC was 0.921 when fitted by pulse intervals and DBP. Although the results should be cautiously interpreted because of the relatively small number of patients in the IVH group (n = 8) and the limited size of the entire cohort (n = 27), the study findings elucidated the potential utilization of time-series data for predicting IVH.

In recent years, artificial intelligence has provided a more customized and integrated approach based on clinical information readily available to constitute the input variables21. The clinical implication of artificial intelligence has expanded to include the development of predictive models to guide clinicians in stratifying risks in real time and proactively adjusting their management. In an endeavor to establish evidence-based strategies for prevention, the capability to handle massive amounts of time-series data might provide meticulous guidance based on hemodynamic changes13,14,15. Turova et al.6 proposed a machine-learning model using the random forest method with a plausible AUROC ranging from 0.86 to 0.93. An observational study conducted by Zernikow et al. utilizing an artificial neural network also showed good performance, with an AUROC of 0.93525. In these studies, the input variables comprised blood cell counts, blood gas analysis, inflammatory markers, cerebral blood flow, and perinatal information. While more IVH was observed in the cohort of Turova et al.6 (51.5%), where 118 of 229 infants remained to be investigated after matching and cross-validation was done with 10% of the cohort, Zernikow et al.25 included a larger cohort for the full dataset (n = 890). However, this study aimed to predict severe IVH. Seventy-nine infants were labeled as positive out of 890 infants, of which 50% were separated as test sets for the developed models.

Among the 20 infants with IVH examined in this study, infants with relatively milder IVH of grade 2 based on Papile’s criteria were also included16. We padded all samples to the same length to overcome bias from discrepancies in measurement sizes in the groups with and without IVH, which was mentioned as a limitation in the model of Turova et al.6 We enhanced the classification performance of positively labeled data using SMOTE, in accordance with a previous study6. Although specificity can be reduced, sensitivity and precision, which are more important on the clinical front, can be improved through this technique. Owing to advances in artificial intelligence techniques, the AutoML method used in the present study has successfully systematized the process of model development, minimizing unintended bias.

There were several limitations to this study. First, our input data, including time-series parameters, were not retrieved at the time of occurrence, restricting its applicability for immediate risk assessment. Nevertheless, robust classification was performed with transformed data obtained retrospectively for 2 weeks and padded to 317 h. Second, because a large number of features were used to build the model on a relatively small sample size and zero-padding was done to fill 317 h to minimize the impact of missing values, bias cannot be excluded with regard to the performance of the model and it should be interpreted with caution. Third, external validation has not yet been performed. The small number of affected infants in the test data from a single center was insufficient to confirm the performance of the model. Future research to predict IVH on a continuous basis with external validation based on the developed model is planned. Furthermore, the consideration of confounding factors other than those we proposed using a multimodal approach with echocardiographic and radiographic findings may improve performance in future studies. Last, the interpretation of the chosen features and model-tuning structure was restricted due to the nature of the AutoML method.

Because more newborns survive and neurodevelopmental impairment has become a more serious issue, more clinical attention should be paid to preventing IVH. We developed a machine-learning model that performed well in detecting infants at high risk of IVH. Further research based on the model is required to predict IVH in real time.