Abstract
Early prediction of intraventricular hemorrhage (IVH) in very low-birthweight infants (VLBWIs) remains challenging because of multifactorial risk factors. IVH often occurs within a few hours after birth, yet its onset cannot be reliably predicted using clinical symptoms or vital signs such as blood pressure or heart rate. Accurate early prediction of IVH is critical for timely intervention but remains challenging due to the limited feasibility of current prediction methods in resource-constrained settings. Traditional prediction methods often require advanced equipment and specialized expertise. Therefore, developing a predictive model based on a limited set of available factors is crucial for accurate and reliable early IVH prediction in VLBWIs. We propose a deep neural network-based model with an attention mechanism (DNN-A) that utilizes a limited set of readily available clinical factors. The model was trained and tested on data from 387 infants, incorporating eight variables, including maternal age, delivery mode, endotracheal intubation, birth weight, gestational age at delivery, APGAR scores at 1 and 5 min, and sex. Our DNN-A model achieved 90% and 87% accuracies in the training and testing sets, respectively, with an area under the curve of 87, and outperformed various state-of-the-art machine-learning-based models for IVH prediction. These results underscore the effectiveness of DNN-A for predicting the risk of IVH in VLBWIs in low-resource settings.
Similar content being viewed by others
Introduction
The survival rate of very low birth weight infants (VLBWIs), those weighing less than 1500 g at birth, has improved with advancements in neonatal care1. However, intraventricular hemorrhage (IVH) is strongly associated with adverse neurodevelopmental outcomes2. Despite advanced interventions, such as antenatal corticosteroids, magnesium sulfate for neuroprotection, and improved neonatal resuscitation in the delivery room, IVH remains a significant challenge in the neonatal intensive care unit (NICU). Brain ultrasonography is the most readily available diagnostic tool for cerebral lesions in the NICU. To date, no definitive treatment exists to mitigate the severity of IVH or substantially improve long-term outcomes. IVH is most common among preterm infants3, and the incidence of IVH in VLBWIs varies from 20 to 40%4,5,6,7,8,9. Most IVH occurs within the first 72 h after birth10, and within the first 24 h after birth with over 50%11. Furthermore, approximately 35% of VLBWIs with IVH develop post-hemorrhagic hydrocephalus, and 15% require surgical intervention9,12. Therefore, identifying the risk factors is crucial to prevent and predict the occurrence of IVH. Most studies have used logistic regression models in population-based cohorts6,13. Heuchan et al. developed a risk-adjustment model for predicting variations in IVH occurrence in different institutions in Australia and New Zealand6. Singh et al. developed a predictive model for assessing the risk for severe IVH in VLBWIs using seven clinical variables at birth13. Recently, machine learning models have shown promising results in predicting neonatal mortality14,15,16. Artificial intelligence-based (AI) models have been effective in identifying conditions such as patent ductus arteriosus and spontaneous intestinal perforation at admission, which are major complications in VLBWIs17,18. Early prediction of IVH is crucial for timely intervention and improved outcomes. However, resource limitations often hinder early and accurate IVH prediction immediately after birth. Traditional prediction methods often require advanced equipment and specialized expertise, which are not universally available. Therefore, there is a pressing need to develop a robust and resource-limited AI-based model capable of leveraging readily available clinical data. Such a model would broaden access to critical early warning systems by enabling hospitals with limited resources to implement timely and effective interventions. By focusing on scalability and ease of integration, this study aims to develop an AI model that is practical and effective for deployment in a wide range of healthcare settings, including those with significant constraints on equipment, training, and funding.
This study aims to develop and validate a novel deep neural network-based model with an attention mechanism (DNN-A) for the early prediction of IVH in VLBWIs. The model architecture includes an input layer with eight neurons representing infant characteristics, two hidden layers (eight and six neurons), and an output layer for binary classification. An attention mechanism, situated between the hidden layers, highlights critical input features. The network employs ReLU activation functions in the hidden layers, a sigmoid activation function in the output layer, and the Adam optimizer for parameter optimization. This architecture ensures efficient processing of clinical data, providing timely and accurate IVH risk assessments, even in resource-limited hospital settings. Our model achieved 90% accuracy during training and 87% during testing, with an area under the curve (AUC) of 87, indicating its strong predictive performance for IVH. Moreover, during validation, the proposed model outperformed several other machine-learning-based models, including the support vector machine (SVM), decision tree, logistic regression, XGBoost, and naïve Bayes models, using multiple well-known evaluation metrics.
Materials and methods
Study population
The clinical records of 877 VLBWIs treated at Keimyung University Dongsan Hospital, Daegu, South Korea, were retrospectively reviewed. All data were anonymized. IVH cases were identified using data on brain ultrasound performed within the first 7 days after birth. Given that unstable hemodynamic changes at birth are major risk factors for IVH, infants with suspected neonatal asphyxia (APGAR score at 5 min < 5 ), early onset sepsis, hypotension with medical treatment, and severe respiratory distress with persistent pulmonary hypertension of newborns were excluded. The dataset was divided into two groups: the affected group (infants with IVH) and the control group (non-IVH). To ensure comparability between the two groups, they were matched based on sex, gestational age at delivery, maternal age, and delivery mode. Because the number of controls was much higher than the number of infants with IVH, all infants with IVH were included in the study. Since IVH predilection was higher in male infants than in female infants19, sex-match was a priority between IVH and non-IVH infants. In addition to the delivery mode and maternal age, a close gestational age was also considered to select as many samples as possible. After applying these criteria, 189 infants with IVH and 198 controls were enrolled for this study.
Ethics statement
Data collection was approved by the Institutional Review Board (IRB) of Keimyung University Dongsan Hospital, and the requirement for informed consent was waived in this study (IRB Keimyung University Dongsan Hospital No. 2021-060-073). All experiments were performed in accordance with relevant guidelines and regulations of the Keimyung University Dongsan Hospital.
Features pre-processing
The model leverages eight readily available clinical input factors: maternal age (ranging from 18 to 42 years), delivery mode, endotracheal intubation in the delivery room, birth weight (ranging from 490 g to 2,260 g), gestational age at delivery (ranging from 20 to 31 weeks), APGAR scores at 1 and 5 min (ranging from 1 to 9), and sex. The sex variable originally recorded as “F” for females and as “M” for males was converted into dummy numeric variables. To prevent very large or small values from negatively affecting model performance, all variables in the dataset were normalized to a range between 0 and 120. Eight variables were used to characterize the fixed/global information of preterm infants. Normalization was calculated using the following equation:
where X is the input and Xnew is the normalized value.
Average of each feature for all study populations in control (green bars) versus affected infants (red bars). All statistical comparisons were performed using an unpaired student t-test to detect significant differences between averaged groups. (*P < 0.05); (**P < 0.01); (***P < 0.001); (****P < 0.0001).
Statistical analysis
All statistical analyses were performed using GraphPad Prism 8 software (GraphPad Software, Inc., La Jolla, CA, USA : https://www.graphpad.com) to detect significant differences in critical parameters between the two groups. For each parameter, the mean values of the affected and control groups were compared using an unpaired Student t-test. This is appropriate when the two groups being compared are independent of each other (affected vs. control groups to evaluate whether the affected group differs significantly from the control group, making a two-sample comparison appropriate. Data are presented as mean ± standard deviation (SD) to reflect the central tendency and variability within each group. The comparison was performed directly using the raw numerical values of the parameters, ensuring accuracy and transparency in the statistical evaluation. All available data samples for affected infants were analyzed. Notably, measurement samples were higher for controls than for affected infants. The clinical records from both groups were used to develop and assess machine learning models without distinguishing between specific individuals. The model did not consider the age at which the clinical records were completed. Unpaired Student’s t-test was used to determine the p-value, and categorical factors (such as sex and delivery mode) were compared based on their relative values in percentages. Statistical significance was at p < 0.05. No significant differences in the relative number of male infants were observed between the groups. Table 1 summarizes the statistical comparisons of various parameters between two groups, including the sample size for each group and the p-value from the unpaired Student’s t-test, which assessed whether there were significant differences between the two groups for each parameter. The control and affected groups exhibited significant differences in terms of endotracheal intubation, birth weight, gestational age at delivery, and APGAR scores at both 1 and 5 min, suggesting that these factors are associated with the occurrence of IVH. However, maternal age, delivery mode, and sex did not differ significantly between the two groups (Fig. 1).
Proposed deep neural network with attention
The proposed approach used a deep neural network architecture with an attention mechanism to classify VLBWIs into two groups according to the presence or absence of IVH. The proposed model consists of four layers: an input layer with eight neurons representing infant characteristics or input features, and two hidden layers with eight and six neurons for identifying and extracting pertinent information from the input data. The attention mechanism is located between two hidden layers to highlight important features of data. Finally, an output layer with one neuron to predict IVH (affected) or non-IVH (control) cases (see Fig. 2). Each neuron in the network uses the rectified linear unit (ReLU) for the activation function, and the output layer consists of one neuron for binary classification and uses the sigmoid activation function. The Adam optimizer, with a learning rate of 0.0001, was used to minimize the training-related losses21. Although a lower learning rate may cause delayed learning or convergence to inferior solutions, it will result in steadier convergence. The model used a cross-entropy loss function, a statistical tool frequently used in classification, which quantifies the discrepancy between the actual and predicted label22. The output layer of the neural network, which comprises one neuron reflecting the probability of each class (affected or control), is intended for binary classification. To minimize cross-entropy loss during training, the model modifies its parameters (weights and biases) using the Adam optimizer. Cross-entropy is defined by measuring the difference between the actual and predicted outputs of the model and is expressed as follows:
where y is the probability predicted by the network; \(\bar {y}\)is the ground truth; i is the number of data points; and N is the total number of samples.
Proposed attention mechanism
In machine and deep learning models, the attention mechanism enables models to focus on specific segments of the input data throughout the prediction process23. The basic idea behind attention is to assign distinct weights to various input data segments, enabling the model to “pay attention” to pertinent data while generating predictions. The additive attention (general attention) method uses a learned compatibility function to generate attention scores by introducing trainable parameters. Self-attention, also known as multi-head attention, enables the model to use multiple sets of attention weights simultaneously to different sections of the input sequence24.
The proposed deep neural network integrates a general attention mechanism (Fig. 2) that allows the model to focus on specific features or patterns in the input data that are most relevant for the classification task. The MLP processes intermediate representations by incorporating an attention layer, which generates a weighted representation of the input features by applying the attention mechanism to the output of the preceding layer. The attention mechanism introduces trainable parameters to compute attention scores, which indicate the importance or relevance of the different features in the input data when making predictions. Attention scores were applied to the input features to produce a context vector that emphasizes the informative components of the input for the current classification task24. The general attention mechanism comprises three main components: queries (Q), keys (K), and values (V). The general attention mechanism used in neural networks is expressed as follows:
where Q, k, and V denote the query, key, and value matrices, respectively, and dk denotes the dimension of the key vectors. The dot products of the query and key vectors were calculated using matrix multiplication QKT. Scaling the result \(\sqrt {{d_k}}\)helps mitigate issues related to disappearing gradients. The T in KT represents the transpose of the matrix K. To determine the attention weights, the scores were normalized using the softmax algorithm, and the resulting attention information is represented as a weighted sum of the values, V, calculated using these attention weights.
Model validation
The classification performance of the proposed model was assessed using various well-established metrics, including recall, accuracy, precision, F1-measure, specificity, Matthew correlation coefficient, and markedness, which were calculated using Eqs. (4), (5), (6), (7), (8), (9), and (10) respectively25. The classification accuracy was calculated as follows:
Precision represents a fraction of relevant predictions among all predicted values, which can be calculated as follows:
Recall is the ratio of correctly predicted occurrences among all instances in the dataset and can be calculated as follows:
An imbalanced dataset may harm the actual network accuracy owing to accuracy detection in the majority of classes. Accordingly, the F1 measure was used to assess the detection performance of the proposed model and was calculated as follows:
The ability of a model to accurately detect real negatives among all actual negatives was measured using its specificity., which was calculated using the following equation:
Another metric for evaluating the validity of binary classifications is the Matthews correlation coefficient (MCC), which is a balanced measure that can be applied even in cases where the classes have different sizes because it accounts for both true and false positives as well as negatives. MCC measures the correlation between the observed and predicted binary classifications using the following equation:
In binary classification problems, markedness is a statistical metric that indicates how well a model can accurately predict positive and negative occurrences while accounting for false-positive and false-negative rates.
where negative predictive value measures the proportion of true negative predictions among all negative predictions.
Probability density function (PDF) evaluation
By examining the probability density function (PDF) for each variable, variations or similarities in the distributions between the affected and control groups can be determined. Variations in PDF distribution, form, and position could indicate variations in developmental characteristics, risk factors, or health outcomes between the two groups.
Results
Proposed model training
Training the suggested network models across preterm infant data was the primary task in this step. The Graphics Processing Unit (GPU) accelerates the training process. A GPU-capable Python learning package was used to implement the proposed model. For every training epoch, the training dataset was divided into smaller groups known as mini-batches. The process was repeated three times, and the mean values were calculated. All the ideal parameters were determined experimentally. The system configuration for training the proposed network model was as follows: CPU: AMD Ryzen 9 7950 × 16-Core Processor 4.50 GHz, GPU: RTX 4090, Programming language: Python, Memory: 64 GB, and Operating system: Windows 10 (64-bit).
Manual grid search for optimal hyperparameters
Hyperparameters are external adjustments that substantially influence the behavior and performance of a neural network but are not learned during training. Searching for combinations of hyperparameters is crucial to achieving the best results for a specific task. In this study, we used the grid search method to adjust key hyperparameters, specifically batch size and the number of epochs26. This process involves systematically evaluating a predefined set of hyperparameter combinations to identify the configuration that yields the best performance. Specifically, we experimented with different batch sizes and training epoch counts to observe their impact on the model’s accuracy and generalization capabilities. Consequently, a batch size of 10 with 100 epochs yielded the highest accuracy that was used to train and test the proposed model. Table 2.
To determine the optimal percentage of training and test data, we trained the network using different percentages of test data. Performing experiments using various percentages of training and test data is essential for assessing the model’s ability to generalize to new data by changing the ratio of test to training data. We utilized the random splitting of data into training and testing subsets. By specifying different test sizes into 10%, 20%, 30%, and 40% we effectively varied the proportion of data allocated for testing, with the remaining data used for training. This provides insights into the robustness and reliability of the model by enabling us to evaluate its performance under various data availability scenarios. Owing to various limitations, large-scale data collection may not be possible, especially in premature infants with and without IVH in real-world situations. Testing various data percentages can help evaluate the model’s performance with limited data and replicate scenarios of data scarcity. Models trained on smaller datasets often develop more resilient and generalizable patterns, potentially enhancing performance when applied to new datasets. This approach provides valuable insights into how well the model generalizes across various dataset sizes.
The training accuracy of the proposed DNN-A and the corresponding loss curves for different percentages of test data are illustrated in Fig. 3a,b, respectively. Although training accuracy remained consistent for various percentages of the test data, the AUC values varied with each training percentage, and training loss increased for configurations with 40% test data. To ensure a fair comparison between the different machine-learning-based models and the proposed model, we performed a similar experiment with different training and test percentages for all other models and compared the results using different evaluation metrics (Table 3).
Model evaluation
To further validate the performance of the proposed model, we utilized different machine-learning approaches, including support vector machine (SVM)27, logistic regression28, naïve Bayes models29, decision trees30, and XGBoost. The performance of the classification model was evaluated using a confusion matrix25, which is a widely used method for measuring the classification accuracy of machine learning methods. The confusion matrix is explained in Table 4. The term “true positive” (TP) was used when the actual value and network forecast were identical. When both the actual class and network’s anticipated value are equal to zero, the term “true negative” (TN) was used. When the expected class is 1 and the actual class is 0, it is referred to as a false positive (FP). When the expected class is 0 and the actual class is 1, it is referred to as a false negative (FN).
The confusion matrix of the classification accuracy using the proposed DNN-A is shown in Fig. 4a. The classification accuracy using the confusion matrix was compared with other machine learning-based models, including the decision tree algorithm, XGBoost, support vector machine, logistic regression, and naïve Bayes models, each tested with 30% of the dataset as test data, illustrated in Fig. 4b–f, respectively. The proposed model exhibited the highest accuracy rate than the other models.
Comparison of area under the curve of receiver operating characteristics (ROC) for various machine learning models against the proposed model across different test data percentages. Subfigures depict ROC curves for (a) 10% of test data, (b) 20% of test data, (c) 30% of test data, and (d) 40% percent of test data.
Table 3 provides an in-depth analysis of the various test data percentages for various machine-learning models, highlighting the performance metrics of each compared to the proposed DNN-A model. A comprehensive analysis of performance indicators such as accuracy, precision, recall, F1 score, specificity, false-positive rate, false-negative rate, MCC, and markedness were evaluated to assess model effectiveness. The results underscore the superior performance of the proposed DNN-A model. Notably, the proposed model outperformed the SVM model with an accuracy of 87.1% using 30% of the test data. This demonstrates that even with a smaller amount of test data, the suggested model is resilient in handling the classification problem. Furthermore, the proposed model demonstrated stability and dependability throughout a range of data sizes as the percentage of test data increased to 40%, maintaining its superiority with an accuracy of 81.9%. In contrast, the SVM model exhibited slightly lower accuracy with smaller test sets, achieving 80.0% for both 10% and 20% test data. Notably, the logistic regression and naïve Bayes models exhibited similar accuracy within this range, indicating sensitivity to the amount of available data. However, for 30% of the test data, the XGBoost model exhibited the highest accuracy, comparable with that of the logistic regression model. This indicates that XGBoost effectively identified intricate patterns within the dataset, thereby improving classification accuracy. The decision tree exhibited the highest accuracy with 40% of the test data, suggesting its strength in handling larger datasets, possibly because of their capacity to identify complex relationships between decisions in the data. The proposed model and other machine learning models were compared using the receiver operating characteristic (ROC) curves (Fig. 5). The ROC curve of the proposed model was compared to those of other machine learning models, which demonstrated consistently better performance across various test data percentages in Fig. 5a–d. The higher AUC values of the proposed model reflect its superior balance between sensitivity and specificity, enhancing its ability to effectively distinguish between true and false positives.
Variations in PDF distribution form and position could indicate variations in developmental characteristics, risk factors or health outcomes between the two groups. These findings are crucial for understanding the causes of preterm births and formulating clinical care plans. The distributions of the APGAR scores at 1 and 5 min, respectively, for both the affected and control groups are shown in Fig. 6a,b, with higher APGAR scores indicating improved neonatal health and vitality. The PDF of the maternal age and gestational age is illustrated in Fig. 6c,d, indicating that older maternal age and lower gestational age at delivery are associated with an increased risk of IVH.
Model evaluation with extra limited resources
To enhance efficiency and streamline the analytical process, we conducted additional studies with more limited resources, focusing on four primary measurements recorded immediately after birth. These assessments, such as gestational age at delivery and APGAR scores, are recognized as critical indicators of newborn health and are strongly associated with adverse outcomes like intraventricular hemorrhage (IVH). By reducing the number of features in the analysis, we aimed to decrease the computational and data collection demands while achieving faster prediction capabilities. This approach optimizes resource utilization, enabling more targeted and efficient testing. For practical clinical applications, developing a model based on a minimal set of essential birth measurements offers significant benefits. Such a model facilitates rapid risk assessments and early intervention planning, which are crucial for timely and focused treatment in very low birth weight infants (VLBWIs). By concentrating on a streamlined feature set, this study aims to support the implementation of effective, resource-conscious, and clinically actionable strategies in neonatal care.
Accordingly, we conducted four experiments using various combinations of different primary measurement sets (feature sets 1–4), as listed in Table 5. The trial results indicated that Feature set 1, which included sex, APGAR score at 1 min, gestational age at delivery, and birth weight, achieved the highest accuracy, highlighting the importance of these variables for predicting IVH. The predictive ability of different feature sets in precisely predicting IVH are shown by the confusion matrix in Fig. 7 for (a) Feature Set 1, (b) Feature Set 2, (c) Feature Set 3, and (d) Feature Set 4, and the ROC curve in Fig. 8. The confusion matrix and ROC curve both further confirmed that the models that used Feature Set 1 consistently exhibited higher accuracy rates in predicting IVH than those of other feature sets.
A comparison of the proposed model with several cutting-edge machine learning models for IVH prediction in VLBWIs is shown in Fig. 9, demonstrating that the proposed model consistently outperformed other machine learning models in terms of accuracy, precision, recall, F1 score, AUC value, specificity, MCC, and markedness. From a practical standpoint, this suggests that the proposed model performs better than the current models for identifying IVH in VLBWIs, reducing false positives and false negatives while accurately classifying a higher percentage of true IVH cases. These findings highlight the effectiveness of the proposed model and its potential for clinical use for predicting IVH in VLBWIs.
Discussion
We developed a novel deep neural network model with an attention mechanism to predict IVH in VLBWIs in low-resource settings that can be easily applied at the time of delivery. Multilayer perceptron (MLP) is the most fundamental type of network31. The model prediction was based on eight primary variables: gestational age at delivery, birth weight, maternal age, sex, mode of delivery, APGAR score at 1 and 5 min, and endotracheal intubation during delivery room resuscitation. The selection of variables was based on previous studies examining the risk factors of IVH in cohort studies of VLBWIs. Furthermore, these variables can be easily collected through a brief history and neonatal evaluation in the delivery room, facilitating the model’s applicability in low-resource settings.
IVH in VLBWIs was diagnosed using brain ultrasonography and classified according to Papile’s classification32. A recent cohort study in Australia33 reported a higher incidence of cerebral palsy in school-aged children born at less than 28 weeks gestation with low-grade IVH, which is typically considered benign. Moreover, low-grade IVH was associated with higher rates of cerebral palsy in school-age children33,34. Therefore, IVH grade I, including germinal matrix hemorrhage on brain ultrasonography, was defined as the IVH group in our study. IVH begins with injury to the germinal matrix, which is susceptible to injury due to the abundance of immature friable capillaries and decreased capacity of the premature brain to autoregulate cerebral blood flow11. Lee et al.35 indicated that gestational age and birth weight at delivery are the strongest predictors of IVH in VLBW infants. Notably, approximately 50% of IVH cases in VLBW infants occur within the first 6–8 h of life36,37.
Intrauterine growth restriction (IUGR) is associated with fetal hypoxemia and an increased risk of adverse postnatal outcomes. However, the impact of IUGR on the incidence of IVH in preterm infants remains unclear38. For this reason, we considered gestational age at delivery and birth weight, which are strongly correlated, as separate factors in the present study. Lee et al. reported that postnatal acidosis within the first hour after birth is associated with a higher risk of severe IVH in VLBWIs35. Hypoxemia or hypoperfusion in the delivery room can lead to metabolic acidosis39. Although serum pH or base deficit is a more accurate indicator of metabolic acidosis in the delivery room, we analyzed the APGAR score at 5 min, which has been commonly used in practice. Antenatal steroid administration promotes fetal drug maturation and reduces the occurrence of IVH in VLBWIs40, highlighting the role of glucocorticoids in antenatal steroid administration, which promotes fetal vasoconstriction and alleviates fetal vasodilation. Because more than 95% of enrolled pregnant women in the present study were administered antenatal steroids, additional analysis including antenatal steroids in the present study was not necessary. The ability of the proposed model to make reliable predictions without the need to monitor hemodynamic changes after birth is a major advantage. Compared with other machine learning models, DNN-A exhibited superior accuracy of 90% for training and 87% for tests, with the highest AUC of 0.87 because it used limited resources. IVH prediction at birth is crucial to provide objective counseling to the infant’s parents and to determine appropriate treatment decisions during the critical postnatal period. This study included models representing diverse learning paradigms, such as SVM for margin-based classification, Decision Trees for rule-based learning, Logistic Regression for linear probabilistic modeling, XGBoost for gradient-boosted decision trees, and Naïve Bayes for probabilistic inference. The inclusion of feature subset analysis further underscores the importance of feature engineering in predictive modeling. Feature Set 1, which consisted of key demographic and neonatal health indicators (sex, APGAR score at 1 min, gestational age at delivery, and birth weight), emerged as the most predictive combination. These variables likely hold strong associations with the underlying physiological and developmental factors that contribute to the risk of IVH, highlighting their critical role in improving model accuracy and interpretability. This result not only underscores the significance of domain-specific feature selection but also provides valuable insights for clinical decision-making and early risk assessment of IVH.
The present study had several limitations. First, brain ultrasonography was not performed simultaneously after birth. To overcome this limitation, we classified the occurrence of IVH based on the results of brain ultrasound performed within the first 3 days after birth. However, the present study did not include postnatal hemodynamic changes, which are major risk factors for IVH. Unlike in other countries, ibuprofen was the most frequently used medication for the treatment of patent ductus arteriosus (PDA) in South Korea, as indomethacin is not available. Moreover, the prophylactic use of indomethacin is exceedingly rare in most neonatal intensive care units (NICUs) across the country. At our institution’s NICU, prophylactic treatment for PDA was not practiced. For all very low birth weight infants (VLBWIs) included in this study, brain ultrasounds were performed by a single pediatric radiologist within the first 7 days after birth. The severity of intraventricular hemorrhage (IVH) was assessed according to Papile’s classification32. Our study aimed to predict the occurrence of IVH at birth. Another novel innovation of the current model is that it makes a reliable prediction without the need for continuous vital sign monitoring. Additionally, the small sample size might have limited the accuracy and reliability of the machine-learning models, warranting further studies with a large training set. Different percentages of the training and test datasets were compared to evaluate the proposed model under different conditions. Furthermore, this model was developed using data from a single institution. Although the study population encompassed patients from a range of geographic areas (rural, suburban, and metropolitan), the cohort was restricted to the hospital’s service area, where all patients were treated by the same group of healthcare professionals. Future research should involve a more geographically diverse sample, ideally incorporating an international cohort.
Conclusion
A novel deep neural network model with an attention mechanism was developed to predict IVH in VLBWIs, particularly in limited resource settings. Using eight readily available measurements and limited to only four measurements, this model has the potential to be applied in real-time at the time of delivery, making it a vital tool for early intervention and care decisions. Our methodology is directly relevant in resource-constrained settings which are based on prior cohort studies, and are easily accessible through maternal information and the newborn’s condition in the delivery room.
Although our approach demonstrates good accuracy and repeatability, IVH classification may be affected by the non-standard interpretation of brain ultrasonography in terms of post-birth temporality. The experimental results demonstrated that the infant’s sex, 1-minute APGAR score, gestational age at delivery, and birth weight are the most important variables for predicting IVH. Nevertheless, because our goal was to predict IVH at the time of delivery, this topic should be further investigated in future studies. Despite these drawbacks, our approach is a noteworthy development in IVH prediction and provides potential insights into neonatal care in environments with limited resources.
Data availability
The data that support the findings of this study are available from the first author (ezat.ahmadzade@gmail. com) upon reasonable request.
References
Chung, S. H. & Bae, C. W. Improvement in the survival rates of very low birth weight infants after the establishment of the Korean neonatal network: comparison between the 2000s and 2010s. J. Korean Med. Sci. 32, 1228–1234 (2017).
Bolisetty, S. et al. Intraventricular hemorrhage and neurodevelopmental outcomes in extreme preterm infants. Pediatrics 133, 55–62 (2014).
Leijser, L. M. et al. Brain imaging findings in very preterm infants throughout the neonatal period: part II. Relation with perinatal clinical data. Early Hum. Dev. 85, 111–115 (2009).
Jain, N. J., Kruse, L. K., Demissie, K. & Khandelwal, M. Impact of mode of delivery on neonatal complications: trends between 1997 and 2005. J. Mat. -Fetal Neonatal Med. 22, 491–500 (2009).
Ahn, S. Y., Shim, S. Y. & Sung, I. K. Intraventricular hemorrhage and post hemorrhagic hydrocephalus among very-low-birth-weight infants in Korea. J. Korean Med. Sci. 30, S52–S58 (2015).
Heuchan, A. M., Evans, N., Henderson, D. J. & Simpson, J. M. Perinatal risk factors for major intraventricular hemorrhage in the Australian and new Zealand neonatal network. Arch. Dis. Child. Fetal Neonatal Ed. 86 (2), F86–F90 (2002).
Philip, A. G. A. W. LR. Intraventricular hemorrhage in preterm infants: declining incidence in the 1980s. Pediatrics 5, 797–801 (1989).
Stoll, B. J. et al. Neonatal outcomes of extremely preterm infants from the NICHD neonatal research network. Pediatrics 126, (2010).
Murphy, B. P. et al. Posthaemorrhagic ventricular dilatation in the premature infant: natural history and predictors of outcome. Archives Disease Childhood-Fetal Neonatal Ed. 87, 1 (2002).
Luque, M. J. et al. A risk prediction model for severe intraventricular hemorrhage in very low birth weight infants and the effect of prophylactic indomethacin. J. Perinatol. 34, 43–48 (2014).
Ment, L. R. & Schneider, K. Intraventricular hemorrhage of the preterm infant. Semin Neurol. 13, 40–47 (1993).
Robinson, S. Neonatal posthemorrhagic hydrocephalus from prematurity: pathophysiology and current treatment concepts: A review. J. Neurosurg. Pediatr. 9, 242–258 (2012).
Singh, R. et al. A predictive model for SIVH risk in preterm infants and targeted indomethacin therapy for prevention. Sci. Rep. 3, (2013).
Matsushita, F. Y., Krebs, V. L. J. & de Carvalho, W. B. Identifying clinical phenotypes in extremely low birth weight infants—an unsupervised machine learning approach. Eur. J. Pediatr. 181, 1085–1097 (2022).
Podda, M. et al. A machine learning approach to estimating preterm infants survival: development of the preterm infants survival assessment (PISA) predictor. Sci. Rep. 8, (2018).
Na, J. Y. et al. Learning-based longitudinal prediction models for mortality risk in very-low-birth-weight infants: A nationwide cohort study. Neonatol 120, 652–660 (2023).
Na, J. Y. et al. Artificial intelligence model comparison for risk factor analysis of patent ductus arteriosus in nationwide very low birth weight infants cohort. Sci. Rep. 11, (2021).
Son, J. et al. Development of artificial neural networks for early prediction of intestinal perforation in preterm infants. Sci. Rep. 12, (2022).
Felderhoff-Müser, U. et al. Machine learning models for identifying preterm infants at risk of cerebral hemorrhage. PLOS One 15, (2020).
Raju, V. N. G., Lakshmi, K. P., Jain, V. M., Kalidindi, A. & Padma, V. Study the influence of normalization/transformation process on the accuracy of supervised classification. In Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020 729–735Institute of Electrical and Electronics Engineers Inc., (2020).
Kingma, D. & Ba, P. (ed A, J.) A method for stochastic optimization. ArXiv 2014 106 (2020). ArXiv preprint ArXiv:1412.6980.
Yeung, M., Sala, E., Schönlieb, C. B. & Rundo, L. Unified focal loss: generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph 95, (2022).
Vaswani, A. et al. Attention is all you need. Adva.Neur.Inform.Proc. Syst. (2017).
Niu, Z., Zhong, G. & Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021).
Tharwat, A. Classification assessment methods. Appl. Comput. Inf. 17, 168–192 (2018).
Prabu, S., Thiyaneswaran, B., Sujatha, M., Nalini, C. & Rajkumar, S. Grid search for predicting coronary heart disease by tuning hyper-parameters. Comput. Syst. Sci. Eng. 43, 737–749 (2022).
Dileep, A. D. & Sekhar, C. C. Identification of block ciphers using support vector machines. In The 2006 IEEE International Joint Conference on Neural Network Proceedings, 2696–2701, (2006).
Chaudhuri, K. & Monteleoni, C. Privacy-preserving logistic regression. Adv. Neural Inf. Process. Syst. Proceedings of the 2008 Conference 21, 289–296 (2009).
Vaidya, J., KantarcIoǧlu, M. & Clifton, C. Privacy-preserving Naïve Bayes classification. VLDB J. 17, 879–898 (2008).
De Cock, M. et al. Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Trans. Depend. Sec Comput. 16, 217–230 (2019).
Desai, M. & Shah, M. An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (MLP) and convolutional neural network (CNN). Clin. eHealth. 4, 1–11 (2021).
Lu-Ann Papile, M. D. et al. Incidence and evolution of subependymal and intraventricular hemorrhage: A study of infants with birth weights less than 1,500 Gm. Pediatrics 92, 529–534 (1978).
Hollebrandse, N. L. et al. School-age outcomes following intraventricular hemorrhage in infants born extremely preterm. Arch. Dis. Child. Fetal Neonatal Ed. 106, 4–8 (2021).
Tortora, D. et al. The effects of mild germinal matrix-intraventricular hemorrhage on the developmental white matter microstructure of preterm neonates: A DTI study. Eur. Radiol. 28, 1157–1166 (2018).
Lee, J., Hong, M., Yum, S. K. & Lee, J. H. Perinatal prediction model for severe intraventricular hemorrhage and the effect of early postnatal acidosis. Child’s Nerv. Syst. 34, 2215–2222 (2018).
Mirza, H. et al. Indomethacin prophylaxis to prevent intraventricular hemorrhage: association between incidence and timing of drug administration. J. Pediatr. 163, (2013).
Al-Abdi, S. & Al-Aamri, M. A systematic review and meta-analysis of the timing of early intraventricular hemorrhage in preterm neonates: clinical and research implications. J. Clin. Neonatol. 3, 76 (2014).
Fang, S. Management of preterm infants with intrauterine growth restriction. Early Hum. Dev. 81, 889–900 (2005).
Ambalavanan, N. et al. Predicting outcomes of neonates diagnosed with hypoxemic-ischemic encephalopathy. Pediatrics 118, 2084–2093 (2006).
Poryo, M. et al. Ante-, peri- and postnatal factors associated with intraventricular hemorrhage in very premature infants. Early Hum. Dev. 116, 1–8 (2018).
Funding
This research was supported by a grant of Korea Health Technolocy R&D Project through the Korea Health Industry Development Institute (KHIDH), funded by Ministry of Health & Wellfare, Republic of Korea (grant number: RS-2021-KH114109).
Author information
Authors and Affiliations
Contributions
E.A. developed a deep neural network model with an attention mechanism, conducted a performance evaluation, and wrote the original manuscript. J.K. interpreted the experimental results and reviewed the manuscript. J.L. and N.K. collected the dataset and provided information about measurements. H.W.K. supervised the experimental results. J.P. and J.H. coordinated and supervised data collection and critically reviewed and revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ahmadzadeh, E., Kim, J., Lee, J. et al. Early prediction of intraventricular hemorrhage in very low birth weight infants using deep neural networks with attention in low-resource settings. Sci Rep 15, 10102 (2025). https://doi.org/10.1038/s41598-025-90901-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-90901-8











