Introduction

Hematoma expansion (HE) is an independent predictor of poor prognosis and early neurological deterioration in intracerebral hemorrhage (ICH), approximately one-fifth of whom experience HE within 24 h of acute ICH1,2. Conventional HE (cHE) is defined ICH volume increase (≥ 6 mL or ≥ 33% ratio) within 24–72 h after the initial CT scan3. This definition overlooks another independent factor, intraventricular hemorrhage (IVH) expansion, which is associated with poor prognosis in ICH patients4,5,6. Some studies have proposed that revised hematoma expansion (rHE), incorporating IVH growth, may enhance the prediction of poor neurological outcomes compared to cHE7,8. Hence, identifying high-risk rHE effectively is crucial to provide timely targeted medical interventions, such as intensive systolic blood pressure (BP) reductions or emergency surgical intervention9,10.

Currently, various predictive scores, including clinical variables, laboratory markers, noncontrast CT (NCCT) markers, and CT-angiography (CTA) spot signs, are used to identify the risk of ICH growth2. A clinical prediction score (BRAIN) developed from simple variables achieved good predictions (C-statistic, 0.73) in estimating the probability of ICH growth3. A large meta-analysis revealed that models incorporating additional CTA spot signs had greater overall discriminative ability11 but CTA availability is not standardized in acute settings12. Although NCCT-based imaging markers may serve as reliable substitutes for CTA spot signs13 they are limited by experience dependence, prone to inter- and intrareader variability, and are too time-consuming for routine clinical use. Machine learning (ML) aided diagnosis models have received significant attention in medical imaging14,15 and their application in predicting cHE has demonstrated promising preliminary results16,17. Compared with traditional ML, deep learning allows end-to-end prediction and classification by automatically learning the semantic and spatial features from raw input image pixels without relying on human-designed, engineered features18. Zhong et al.‘s study applied an end-to-end convolutional neural network (CNN) model using NCCT images from 266 patients to predict cHE, achieving a high C statistic of 0.80 and demonstrating superiority to NCCT markers alone19. Recently, Li et al. used a large dataset to develop a two-stage framework for predicting cHE, achieving an AUC of 0.806 in a prospective set20. Moreover, Tran et al.‘s study, which was based on a relatively small dataset (n = 793), also developed a model to predict cHE, achieving a similar AUC of 0.80 with NCCT images21. These studies highlight the ability of deep learning to identify the risk of cHE. However, they did not assess the dynamic evolution of IVH during follow-up, a process may lead to obstructive hydrocephalus requiring external ventricular drainage22,23.

Hence, we applied deep learning models to identify ICH patients at high risk for rHE by incorporating IVH expansion into the cHE definition, evaluating its feasibility for predicting high-risk rHE. This study aimed to develop a deep learning-based prediction tool using baseline NCCT images to efficiently identify high-risk rHE in ICH patients and compare its discriminative ability with traditional baseline ML models, which rely on clinical-radiologic and handcrafted radiomics features.

Materials and methods

Ethics declarations and consent to participate

This study was conducted in accordance with the Declaration of Helsinki principles. Ethical approval was obtained from the institutional review boards of both participating centers (Approval No. K2023-138). The requirement for written informed consent was waived due to the retrospective nature of the study. Additionally, the study was registered with the Research Registry (Registration No. researchregistry10383; https://www.researchregistry.com/).

Patient cohort and clinical variables

A total of 775 spontaneous ICH patients who had undergone baseline and follow-up NCCT were consecutively enrolled from the two centers. The exclusion criteria were as follows: (1) age < 18 years; (2) baseline CT obtained more than 24 h after the onset of ICH symptoms or last seen well for baseline CT; (3) follow-up CT obtained > 72 h after baseline CT; (4) primary IVH; (5) previous surgical treatment with external ventricular drain placement or craniotomy; and (6) obvious artifacts observed on the CT images. Finally, 556 patients from center 1 were divided into a training set (389) and an internal-testing set (167) via a 7:3 stratified random sampling method, and all 219 patients from center 2 formed an independent external-testing set. Figure 1 illustrates the process of population selection. Clinical data including sex, age, history of alcohol consumption, smoking status, diabetes mellitus, hypertension, Onset to baseline CT time interval, admission systolic/diastolic pressure, international normalized ratio (INR), antithrombotic and anticoagulation treatment, were obtained from the medical records of the patients.

Fig. 1
figure 1

Flowchart of patient recruitment. ICH, intracerebral hemorrhage; Center 1, The First Affiliated Hospital of Chongqing Medical University; Center 2,The First Affiliated Hospital of Nanjing Medical University; NCCT, noncontrast computed tomography; IVH, intraventricular hemorrhage; non-rHE, non-revised hematoma expansion; rHE, revised hematoma expansion.

Image acquisition and analysis

Admission and follow-up NCCT scans were performed at each participating center according to the local acquisition protocols, details of the CT scan protocols are provided in Supplementary Table S1.To balance the variations between different CT devices and protocols, we resampled all NCCT images into a uniform voxel spacing of 1.0 × 1.0 × 5.0 mm³ for subsequent analysis. The overall design of this study is presented in Fig. 2. Image analysis was independently performed by two radiologists (A with 4 years and B with 6 years of neuroimaging experience) on the baseline NCCT images without knowledge of the follow-up CT images, patient outcome data, or demographics. Axial NCCT images were analyzed to determine the presence of subarachnoid hemorrhage and IVH, the location of the ICH, and NCCT markers. Nine features of NCCT markers were evaluated and illustrative examples are shown in Fig. 3. The consensus definitions from the 2022 AHA/ASA guidelines were followed to throughout the analysis10. One month later, radiologist A performed a second independent assessment using the same method (see Supplementary Material for more information). Inter- and intra-observer agreement of the subjective CT image features was evaluated using Cohen’s kappa coefficient.

Fig. 2
figure 2

Overall pipeline of this study. Phase 1: Evaluation of clinical-radiologic features in intracerebral hemorrhage (ICH) patients, including clinical data acquisition, noncontrast computed tomography (NCCT) markers analysis, and BRAIN score assessment. Phase 2: Generation of initial rough segmentation masks on NCCT images with semiautomatic segmentation software followed by manual adjustments to refine the segmentation regions and obtain precise 3D volumes of interest for the hematoma. Phase 3: Extraction and selection of radiomics features with high stability (ICC > 0.80), followed by further refinement using the least absolute shrinkage and selection operator (LASSO). Phase 4: Establishment of baseline predictive models with the support vector machine (SVM) algorithm and logistic regression. Phase 5: Development of 2D and 3D convolutional neural network models using NCCT images and gradient-weighted class activation mapping (Grad-CAM) for model visualization. Phase 6: Comparison of the predictive performances of the different models. ICC, intraclass correlation coefficient.

Fig. 3
figure 3

NCCT markers definitions, illustrative examples, and intra- and inter-observer kappa values. The definitions and illustrative examples of NCCT markers, and the intra- and inter-observer kappa values for radiologists A and B regarding the subjective markers. HU, Hounsfield units; ICH, intracerebral hemorrhage.

The semiautomatic segmentation software of ITK-SNAP 3.8.0 was employed to obtain the initial and follow-up volumes of ICH and IVH. Radiologist A utilized semiautomatic segmentation methods, including thresholding, classification, clustering, and edge attraction to generate rough segmentation masks on the NCCT images. Manual adjustments were subsequently made to refine the segmentation regions and obtain exact 3D hematoma volumes of interest (VOI). Radiologist B independently segmented the images using the same method. To ensure the consistency and repeatability of the segmentations, 60 NCCT images were randomly selected to calculate the intraclass and interclass correlation coefficients (ICC), in which ICC values above 0.80 indicated good agreement. rHE was defined as any of the following: absolute increase in ICH volume ≥ 6 mL or relative increase ≥ 33%, increase in IVH volume ≥ 1 mL, or de novo IVH7,8.

Baseline model construction based on clinical-radiologic and radiomics features

Univariate logistic regression analysis was used to assess the associations between clinical-radiologic parameters and rHE. Variables with p < 0.05 were included in multivariate logistic regression analysis to identify independent predictors of rHE. The radiomics workflow included 3D VOI segmentation, radiomics feature extraction, selection, and model establishment. PyRadiomics 3.0.1 (https://pyradiomics.readthedocs.io/en/latest) was used to extract the radiomics features. The gray values were discretized with a fixed bin width of 25 Hounsfield units (HU) was applied to all NCCT images. A total of 851 features (744 wavelet decomposition features, 75 texture features, 14 shape-based features, and 18 first-order statistics) were extracted from each 3D VOI, and standardized with z-score. Features with high inter- and intragroup stability (ICC > 0.80) were retained for subsequent analysis. The optimal radiomics features in the training set were then selected with the least absolute shrinkage and selection operator (LASSO) algorithm.

We developed four baseline ML prediction models: (1) a logistic regression model based on the BRAIN score (B for baseline intracerebral volume, R for recurrent intracerebral hemorrhage, A for anticoagulation treatment, I for IVH hemorrhage, N for hours from onset to CT), calculated using simple clinical variables (see Supplementary Table S2 for scoring criteria); (2) a logistic regression model using independent clinical-radiologic predictors of rHE, named the clinical-radiologic model; (3) a combined-logistic model; and (4) a combined-SVM model, both incorporating radiomics features and independent clinical-radiologic predictors.

Deep learning model development and interpretation of model visualization

Four CNN models (ResNet-101, ResNet-152, DenseNet-121, DenseNet-201) based on 2D/3D images were developed. Details of these models are provided in the Supplementary Material. According to the semiautomatically generated 3D segmentation masks, the largest 2D rectangular region and smallest 3D bounding box of the hematoma on baseline NCCT images were cropped and input into the 2D-CNN and 3D-CNN models, respectively. The original CT images were adjusted to a window width of 80 HU and window level of 40 HU during cropping. For the 2D-CNN models, the input size was resized to 224 × 224 pixels, with a batch size of 128. The model weights were initialized by pretraining on the ImageNet dataset. For the 3D-CNN models, the input size was 112 × 112 × 112 voxels, with a batch size of 16. To mitigate overfitting, batch normalization layers were added after convolutional layers, a dropout layer with a rate of 0.5 was incorporated, and online data augmentation (including random horizontal flipping, cropping, and translation) was applied during training. The Adam optimizer was used with a learning rate of 0.01 and 100 epochs. Models were implemented using Python 3.7.6 and PyTorch 1.8.1.

To visualize the decision-making process of the optimal deep learning model decision-making process, the gradient-weighted class activation mapping (Grad-CAM) was employed to generate 2D attention maps by extracting feature maps from the final CNN layer24 which emphasized the importance of hematoma regions in predicting the risk of rHE. These attention maps were subsequently resized and overlaid onto the original NCCT images to identify crucial regions in the target image for classification.

Statistical analysis

Continuous variables are expressed as the mean ± standard deviation or median (interquartile range [IQR]), and categorical variables are expressed as frequencies and percentages (%), Student’s t-test or the Mann-Whitney U test was applied to assess differences in continuous variables between groups, and the chi-square test or Fisher’s exact test was used to compare categorical variables between groups. The radiologists’ interobserver agreement for NCCT image features was assessed with Cohen’s kappa coefficient. AUC, Accuracy, Sensitivity, Specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV) and F1-score, were computed to evaluate model performance. Sensitivity and specificity were calculated based on a predicted probability of 0.5, and differences were evaluated using McNemar’s chi-square test. The DeLong test was used to assess the differences among AUCs. A two-sided p-value < 0.05 was considered to indicate statistical significance. The analyses were conducted by the MedCalc version 20.009, SPSS version 26.0 and R version 3.6.0.

Results

Clinical-radiologic characteristics and radiomics signature analysis

Table 1 summarizes the baseline clinical-radiologic characteristics of 775 patients with spontaneous ICH. Among them, 118 (mean age, 63.03 years ± 12.20 [SD]; 87 men) in the training cohort, 51 (mean age, 61.00 years ± 15.02 [SD]; 41 men) in the internal-testing cohort, and 80 (mean age, 63.89 years ± 14.58 [SD];52 men) in the external-testing cohort developed rHE according to follow-up CT. Intra-observer kappa values for radiologists A and B ranged from 0.937 to 0.973, while inter-observer kappa values ranged from 0.835 to 0.960 (Fig. 3). Univariate and multivariate regression analyses of clinical-radiologic features in the training cohort are shown in Table 2. Multivariate analysis identified onset to baseline CT time interval (OR 0.914; 95% CI, 0.853–0.979), baseline ICH volume (OR 1.040; 95% CI, 1.016–1.064), IVH presence (OR 3.557; 95% CI, 1.663–7.610), and hypodensity (OR 2.328; 95% CI, 1.270–4.267) as independent predictors of rHE. Of 851 radiomics features extracted from each 3D VOI, 488 with ICC > 0.80 were selected, and nine optimal features were identified using LASSO-based screening in the training set (Supplementary Table S3 and Figure S2).

Table 1 Baseline clinical-radiologic features of the datasets.
Table 2 Univariate and multivariate analysis of clinical-radiologic predictors for revised hemorrhage expansion in the training set.

Predictive performance of the baseline prediction models

The baseline BRAIN score and the clinical-radiologic model shown comparable predictive performance. In both internal- and external-testing sets, their AUCs and sensitivities were relatively low (Table 3). In contrast, the combined-logistic and combined-SVM models, which incorporated the four independent clinical predictive factors and the nine radiomics features, demonstrated better performance in identifying high-risk rHE in ICH patients. In the internal-testing, the AUCs were 0.682 (95% CI: 0.605–0.752) and 0.692(95% CI: 0.616–0.761), with sensitivities of 0.530 and 0.412, specificities of 0.759 and 0.827, respectively. In the external-testing, the AUCs were 0.694(95% CI: 0.629–0.755) and 0.703(95% CI: 0.638–0.763), with sensitivities of 0.488 and 0.338, specificities of 0.712 and 0.863, respectively (Table 3; Fig. 4E and F). To some extent, these findings indicate that the radiomics features may have provided additional information for predicting the risk of rHE.

Table 3 Performance comparison of different prediction models.
Fig. 4
figure 4

Receiver operating characteristic (ROC) curves of different prediction models in the testing sets. ROC curves of the 2D-CNN models in the internal-testing (A) and external-testing sets (B); ROC curves of the 3D-CNN models in the internal-testing (C) and external-testing sets (D); performance comparison between the 2D-ResNet-101 model and the four baseline machine learning models in the internal-testing (E) and external-testing (F) sets. SVM, support vector machine.

Performance comparison and visualization analysis of the deep learning models

In both test sets, the predictive performance of the 2D-CNN models was significantly greater than that of the 3D-CNN models. All four 3D-CNN models exhibited substantial generalization errors in the two-center datasets (Table 3; Fig. 4A-D). In the external-testing set, the 2D-ResNet-101 had the best overall performance among the eight deep learning models with an AUC of 0.777 (95% CI: 716–0.830), accuracy of 0.767, sensitivity of 0.637, specificity of 0.842, PPV of 0.699, NPV of 0.801, F1-score of 0.667, while in the internal-testing set, these values were 0.782 (95% CI: 0.712–0.842), 0.766, 0.667, 0.810, 0.607, 0.847 and 0.636, respectively. Grad-CAM generated heatmaps that visually emphasize the key regions used to classify rHE and non-rHE in ICH. The red regions indicate areas most influential to the 2D-ResNet-101 model’s classification, primarily located at the hematoma and its periphery regions (Fig. 5).

Fig. 5
figure 5

Visualization of attention maps for the 2D-ResNet-101 model. The gradient-weighted class activation mapping (Grad-CAM) of the 2D-ResNet-101 deep learning model in intracerebral hemorrhage patients with revised hematoma expansion (rHE) (A) and non-rHE (B). To visually verify the decision-making process of the 2D-ResNet-101, Grad-CAM was applied to generate attention maps. These maps were then overlaid onto the input images, producing final overlay attention maps that highlight the hematoma regions most impacting the model predictions. The red regions indicate areas most influential to the model’s classification, primarily located at the hematoma and its periphery regions.

The 2D-ResNet-101 model outperformed both the baseline clinical-radiologic model and the BRAIN score. In the internal-testing set, the AUCs increased by 0.094 (p = 0.048) and 0.127 (p = 0.013), and the sensitivity by 0.216 (p = 0.013) and 0.373 (p < 0.001), respectively. In the external-testing set, the AUC values increased by 0.087 (p = 0.022) and 0.119 (p = 0.003), and the sensitivity by 0.174 (p = 0.009) and 0.287 (p < 0.001), respectively. There were no significant differences in specificity in the test sets. The confusion matrices (Fig. 6) show that the clinical–radiologic model and the BRAIN Score misclassified 28/51 and 36/51 rHE cases as non-rHE in the internal-testing set, and 43/80 and 52/80 cases in the external-testing set, respectively. In contrast, the 2D-ResNet-101 model reduced rHE misclassifications to 17/51 and 29/80 cases in the internal and external test sets, respectively, significantly lowering the risk of missed diagnosis (p < 0.05, McNemar’s test). Additionally, the 2D-ResNet-101 model was superior to the two baseline combined models, showing improvements in AUC and sensitivity across both test sets (Fig. 7).

Fig. 6
figure 6

Confusion matrices of 2D ResNet-101 and baseline models in the internal-testing (A) and external-testing sets (B).

Fig. 7
figure 7

Comparison of prediction performance between the 2D-ResNet-101 model and baseline machine learning models.Comparison of AUC, sensitivity, and specificity between the 2D-ResNet-101 model and four baseline prediction models in the internal- (A) and external-testing sets (B). McNemar’s chi-squared test was used to compare differences in sensitivity and specificity, while the DeLong test was used to assess the differences between AUC. *Indicate a significant difference (p < 0.05).

Discussion

In this study, we developed 2D/3D CNN models based on NCCT images to predict high-risk rHE in ICH patients and compared its performance with four baseline ML models. The main findings showed that the developed 2D-ResNet-101 model had the optimal predictive performance, demonstrating significant improvement over the BRAIN score and clinical-radiologic model in both the internal- and external-testing sets. Furthermore, it exhibited higher sensitivity and accuracy than the two combined models in the testing sets. These findings suggest that the deep learning model may provide more comprehensive information about hematoma heterogeneity compared to routine clinical predictive indicators and radiomics features alone can, thus more effectively predicting the rHE. This model could allow the identification of patients who may benefit from anti-expansion therapies in the acute ICH settings.

Spontaneous ICH is the deadliest acute stroke type, with high morbidity and mortality25,26. Notably, in real-world clinical scenarios, parenchymal hematomas often extend into the ventricular space27 and the extent of this extension correlates exponentially with patient outcomes28. In the present research, we included IVH expansion in the definition of cHE and explored potential clinical-radiologic factors affecting rHE. Multivariate regression analysis identified significant differences in the onset to baseline CT time interval, ICH volume, and presence of IVH between the groups, with patients who developed rHE showing shorter baseline scan intervals, larger ICH volumes, and a higher likelihood of IVH (Table 2). These findings highlight the urgent need for rapid assessment and intervention to limit ICH growth and improve outcomes, especially for infratentorial hemorrhage. This hemorrhage may disrupt neural pathways related to the Guillain-Morath triangle, a network critical for movement coordination and control, and dysfunction of which can lead to a variety of neurological disorders, such as post-stroke palatal tremor29. According to the 2022 AHA/ASA guidelines, NCCT markers are valuable potential imaging predictors for identifying patients at risk of rHE10. Our analysis showed that hypodensities were the only independent risk factor among the nine NCCT markers, likely indicating areas of incomplete blood clotting prone to instability and further bleeding30,31. Hypodensities also overlap with other NCCT signs32 and their high prevalence may support their role as a predictor. We also developed the BRAIN score and a clinical-radiologic model based on routinely available clinical variables, but these demonstrated limited predictive performance in the testing sets. The sensitivity of these models ranged from 0.350 to 0.488, suggesting a substantial risk of missing rHE diagnoses, which could lead to delayed treatment and potentially serious consequences. These findings highlight the limitations of clinical-radiologic features in predicting rHE, likely due to their qualitative or semiquantitative nature, which can introduce subjectivity and inconsistency in predictions33. This was further evidenced by variability in inter- and intra-observer agreement regarding NCCT markers in this study.

Recent studies have shown promising results using traditional machine learning (ML) methods, including radiomics and deep learning, to predict intracerebral hemorrhage (ICH) growth. Feng and Pszczolkowski et al. applied deep learning radiomics or radiomics features derived from NCCT images to predict cHE, achieving AUCs ranging from 0.693 to 0.82016,34. Xia et al. combined radiomics features with clinical-semantic factors to enhance rHE prediction, achieving an AUC of 0.830 compared to 0.690 for clinical-semantic models alone, though this study had a small sample size35. In our study, with a larger two-center sample, the addition of radiomics features to the clinical-radiologic model improved rHE prediction performance in the external-testing set, consistent with previous findings16,35. However, both combined models exhibited reduced generalizability, likely due to the limited robustness of handcrafted radiomics features, which suffer from low reproducibility across different CT devices and protocols36,37. Furthermore, radiomics features may fail to capture the semantic characteristics of NCCT markers16. In contrast, deep learning automatically learns complex, discriminative features directly from images through neural network layers, eliminating the need for manual extraction of hard-coded features18. Most studies have focused on using deep learning models to predict cHE20,21,38,39,40. In these studies, the follow-up hematoma volume may include both parenchymal hemorrhage and IVH hemorrhage. However, IVH expansion may occur independently of parenchymal hematoma, a factor often overlooked in large dataset studies, such as those by Li20 and Teng40 which limits confidence in deep learning’s ability to predict rHE risk. Our results demonstrate that 2D CNN models based on baseline NCCT images outperform traditional ML models, suggesting that 2D deep learning may significantly enhance predictive accuracy for rHE.

In this study, we developed eight deep learning models to predict rHE, with the 2D-CNN models outperforming the 3D-CNN models in the testing sets. The differences in performance among the different 3D-CNN or 2D-CNN models may be attributed to the differing internal architectures of each network41. Previous studies have demonstrated that 3D images, which contain richer 3D spatial information compared to 2D images42 typically achieve superior performance in disease prediction tasks43. However, in our study, the 3D-CNN models exhibited limited predictive capability, possibly due to their higher complexity and larger number of parameters, which may not be well-suited for small sample sizes of 3D data43,44. Additionally, the lack of pretrained model weights and the low resolution of 3D NCCT images along the z-axis (5 mm slice thickness) could have further hindered their performance45,46. Although the 2D-CNN models achieved relatively high performance, their lack of spatial information may hinder accurate modeling of peri-hematomal structures. An approach that balances the advantages of both 2D and 3D modeling may optimize the trade-off between computational efficiency and model generalizability for limited datasets47.

Among the 2D-CNN models in our study, the 2D-ResNet-101, a deep network with 101 layers utilizing residual connections, demonstrated superior predictive performance and improved generalization48. While deeper networks can learn more complex representations, increasing depth does not always lead to better model performance due to challenges in gradient descent49,50,51. This was further supported by our finding that, in most CNN models, greater depth reduced performance on the external-testing set (Table 3). In our study, ResNet outperformed DenseNet, possibly owing to its simpler residual structure and lower memory complexity, which may confer greater robustness under relatively small-sample conditions52,53. Previous studies have shown the effectiveness of deep residual networks in ICH disease classification21,54. Grad-CAM visualizations demonstrated that the 2D-ResNet-101 model primarily focused on the hematoma and its periphery for decision-making, consistent with observations reported by Zhao et al. and Trans et al.21,55. Notably, rHE tends to demonstrate more irregular morphology and internal density heterogeneity compared to non-rHE (Fig. 5A). This peripheral-focused attention pattern may correspond to NCCT markers of active multifocal bleeding, such as irregular shape56 (Fig. 5A, Case2). These findings may support for Fisher’s ‘avalanche model’ of HE, which proposes that initial bleeding disrupts adjacent vessels, leading to surrounding secondary hemorrhage57. Furthermore, the 2D-ResNet-101 model achieved significantly higher sensitivity than the baseline models, without significant decrease in specificity, indicating that a higher proportion of ICH patients at high risk for rHE can be identified early, thereby helping to ensure that these patients receive timely, early-stage anti-expansion treatments or surgical intervention, as needed.

This study has several limitations. First, due to its retrospective design, some important clinical parameters such as Glasgow Coma Scale scores were unavailable. Therefore, a prospective study is necessary to validate the deep learning model’s performance and further explore the relationship between rHE and clinical variables. Second, the relatively small sample size limits the generalizability of the findings. A multi-center trial with larger datasets is essential to assess the model’s applicability in real-world clinical settings. Third, while the current standard for rHE relies on semiautomatic delineation software with manual adjustment, detecting small volume changes, particularly in IVH expansion (≥ 1 mL), can be challenging due to technological limitations. Implementing fully automated, high-precision IVH delineation may enhance accuracy and reduce human error. Finally, the developed deep learning models primarily focused on image-based predictions without incorporating clinical-radiologic variables. However, medical decisions are multifactorial and not solely based on imaging findings. Future research should aim to integrate these variables to further improve model performance.

Conclusion

In conclusion, the developed 2D deep learning prediction model outperformed traditional baseline ML models in predicting rHE in ICH patients. This model can serve as a potential risk stratification tool by identifying high-risk patients, thereby providing timely targeted medical interventions to streamline decision-making in emergency conditions.