Prediction of functional outcomes in aneurysmal subarachnoid hemorrhage using pre-/postoperative noncontrast CT within 3 days of admission

Yin, Pengzhan; Wang, Jiaqi; Zhang, Chao; Tang, Yongxiang; Hu, Xiankuo; Shu, Hongmin; Wang, Jun; Liu, Bin; Yu, Yongqiang; Zhou, Yunfeng; Li, Xiaohu

doi:10.1038/s41746-025-01953-z

Download PDF

Article
Open access
Published: 24 August 2025

Prediction of functional outcomes in aneurysmal subarachnoid hemorrhage using pre-/postoperative noncontrast CT within 3 days of admission

Pengzhan Yin¹^na1,
Jiaqi Wang²^na1,
Chao Zhang³,
Yongxiang Tang⁴,
Xiankuo Hu⁵,
Hongmin Shu¹,
Jun Wang¹,
Bin Liu¹,
Yongqiang Yu¹,
Yunfeng Zhou³ &
…
Xiaohu Li¹

npj Digital Medicine volume 8, Article number: 542 (2025) Cite this article

6759 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Aneurysmal subarachnoid hemorrhage (aSAH) is a life-threatening condition, and accurate prediction of functional outcomes is critical for optimizing patient management within the initial 3 days of presentation. However, existing clinical scoring systems and imaging assessments do not fully capture clinical variability in predicting outcomes. We developed a deep learning model integrating pre- and postoperative noncontrast CT (NCCT) imaging with clinical data to predict 3-month modified Rankin Scale (mRS) scores in aSAH patients. Using data from 1850 patients across four hospitals, we constructed and validated five models: preoperative, postoperative, stacking imaging, clinical, and fusion models. The fusion model significantly outperformed the others (all p<0.001), achieving a mean absolute error of 0.79 and an area under the curve of 0.92 in the external test. These findings demonstrate that this integrated deep learning model enables accurate prediction of 3-month outcomes and may serve as a prognostic support tool early in aSAH care.

Prediction of the 180 day functional outcomes in aneurysmal subarachnoid hemorrhage using an optimized XGBoost model

Article Open access 01 July 2025

External multicenter validation of the eSAH score for predicting outcomes after subarachnoid hemorrhage

Article Open access 31 December 2025

Controlling nutritional status score during hospitalization as a predictor of clinical outcome in patients with aneurysmal subarachnoid hemorrhage

Article Open access 07 August 2023

Introduction

Aneurysmal subarachnoid hemorrhage (aSAH) is a life-threatening subtype of hemorrhagic stroke, with approximately one-third of survivors experiencing substantial disability¹. Functional outcomes are primarily influenced by early brain injury and in-hospital complications, including delayed cerebral ischemia (DCI) and chronic hydrocephalus (CH)^2,3. Given that functional outcomes after aSAH are influenced by early brain injury and complications, accurate prediction of functional outcomes within the initial 3 days of presentation is crucial in clinical settings. It supports efficient allocation of medical resources, enables timely therapeutic interventions, and aids decisions regarding ICU monitoring, timing of rehabilitation, and prognosis discussions with patients and families.

Owing to their immediate availability, previous studies have predominantly employed clinical scoring systems—such as the World Federation of Neurological Surgeons (WFNS) scale⁴, the Hunt-Hess score⁵, and the modified Fisher scale (mFS)⁶—to estimate 3-month outcomes using the modified Rankin Scale (mRS)⁷. However, these tools are limited by their inherent subjectivity and frequently fail to capture the complex interactions between individual patient profiles and radiological findings⁸. Although some investigations have explored machine learning models incorporating clinical data and preoperative CT perfusion parameters, these approaches—despite showing promise^9,10—are hindered by limited accessibility, labor-intensive post-processing, and inter-institutional variability, thereby constraining their applicability in routine practice.

Recent advances in deep learning (DL) have markedly improved the analysis of radiological images, enhancing diagnostic accuracy across multiple domains. DL models are capable of autonomously extracting features from medical images, thereby minimizing human bias in feature selection and uncovering latent patterns that may not be perceptible through conventional visual interpretation, such as subtle indicators of edema, hemorrhage distribution, or ventricular changes¹¹. Noncontrast CT (NCCT), the first-line imaging modality in both the acute and follow-up phases of aSAH management, enables reliable identification of hemorrhage severity, acute hydrocephalus, cerebral edema, and infarction preoperatively, and facilitates postoperative assessment of rebleeding, infarction, CH, and other complications. As such, NCCT is well-suited for outcomes prediction through DL-based approaches¹². Furthermore, the utility of NCCT has been validated in other neuroimaging applications, such as stroke or traumatic brain injury prediction, reinforcing its relevance in prognostic modeling^13,14.

Since clinical scores and periprocedural NCCT are essential for every aSAH patient, obtaining this data is relatively convenient. Preoperative imaging reflects the initial injury burden and is pivotal for treatment risk assessment and planning, while postoperative imaging provides insight into treatment effects and emerging complications. DL algorithms can conveniently and accurately extract this information from NCCT images. However, to date, no study has developed a multimodal fusion model integrating pre- and postoperative NCCT images with clinical data for predicting mRS score in aSAH patients. Existing literature predominantly focuses on clinical variables and either preoperative or postoperative imaging features^{10,15,16,17,18}, with limited efforts to integrate both. Therefore, this study seeks to address this gap by leveraging multicenter datasets to integrate clinical information with both preoperative and postoperative NCCT scans, thereby developing DL models aimed at predicting functional outcomes in patients with aSAH.

Results

Patients

A total of 3302 patients were initially enrolled. Following exclusions based on medical record review (n = 1236) and picture archiving and communication system evaluation (n = 216), 1850 patients remained eligible for analysis (median age, 59 years; IQR: 51–68 years; 650 men). The cohort was distributed across the following centers: First Affiliated Hospital of Wannan Medical College (WN, n = 1178), First Affiliated Hospital of Anhui Medical University (AH, n = 244), Fuyang People’s Hospital (FY, n = 225), and Tongling People’s Hospital (TL, n = 203) (Fig. 1), and then these test cohorts were also aggregated to form the Test-combined dataset (n = 672). There were no statistically significant differences in baseline characteristics between included and excluded patients at all centers (all p > 0.05). A summary of patient characteristics is provided in Table1.

Table 1 Patient characteristics, complications, and outcomes across hospital cohorts

Full size table

Model performance

The workflow for model construction is illustrated in Fig. 2. Model performance metrics for the preoperative, postoperative, stacking imaging, clinical, and fusion models are presented in Table 2 and Fig. 3, the confusion matrices of all models across different test sets are presented in the Supplementary Fig.1. (The details of model construction were described in the Methods section). The fusion model demonstrated superior performance across all validation sites (AH, FY, TL, and Test-combined), consistently achieving area under the curve (AUC) values greater than 0.90. Specificity of the fusion model was notably high at all centers, exceeding 90%, with the highest observed at TL (93.5%). Sensitivity of the fusion model varied across sites, with the highest values recorded at TL (84.1%). The mean absolute error (MAE) of the fusion model remained consistently low across all centers (range, 0.73–0.88). Comparative performance of the models on the Test-combined set is illustrated in Fig.4. Among the fusion and two alternative fusion models, the main fusion model remained superior to the two alternative models, with significantly lower MAE and higher AUC (all p < 0.05). The importance analysis of two alternative fusion models, together with the comparisons of predictive performance among the main fusion model and the two alternative fusion models, were detailed in Supplementary Note1.

**Fig. 2: Workflow of model construction.**

**Fig. 3: Trend of model performance in 4 test sets.**

**Fig. 4: Model performance comparison in Test-combined set.**

Table 2 Model performance across test sets

Full size table

For the regression task, the objective was to predict the continuous mRS score ranging from 0 to 6. The fusion model achieved the lowest MAE on the Test-combined set, significantly outperforming all other models (p < 0.001). The stacking imaging model also demonstrated significantly lower MAE values compared with both the preoperative and postoperative models (p < 0.001).

For the classification task, the objective is to predict dichotomized functional outcomes as either good (mRS ≤2) or poor (mRS >2). The fusion model attained the highest AUC on the Test-combined set, significantly outperforming all other models (p < 0.001). The stacking imaging model also achieved a significantly higher AUC value than the preoperative and postoperative models (p < 0.01). The performance using mRS>3 as the poor functional outcome sees in Supplementary Table1, which yielded comparable performance to the model based on mRS >2.

In the Test-combined set, the fusion model significantly outperformed the WFNS score, Hunt-Hess score, and mFS in terms of both regression and classification tasks. Details are provided in Supplementary Note2.

Subgroup analysis of DCI and CH

In the subgroup analysis, we evaluated the performance of all models across different patient populations. Details are shown in Supplementary Fig.2.

Among DCI subgroups, both the preoperative and postoperative models demonstrated significantly superior regression performance in patients with DCI compared with those without (MAE: 1.206 vs. 1.549, p < 0.001; 1.008 vs. 1.167, p = 0.021, respectively). In contrast, the clinical and fusion models exhibited significantly poorer regression performance in the DCI group (MAE: 1.298 vs. 0.876, p < 0.001; 0.958 vs. 0.714, p < 0.001, respectively). The stacking imaging model showed no significant difference in regression performance between groups (MAE: 1.001 vs. 0.886, p = 0.134). Despite these differences, the fusion model consistently achieved the lowest MAE across both DCI and non-DCI groups. With respect to classification task performance, all five models maintained comparable AUC values across DCI and non-DCI subgroups, with no statistically significant differences observed in the functional outcomes’ prediction (all p > 0.05).

Within CH subgroups, the preoperative model yielded significantly better regression performance in patients with CH relative to those without (MAE: 1.262 vs. 1.502, p = 0.004), whereas the postoperative model demonstrated comparable regression performance across groups (MAE: 1.026 vs. 1.147, p = 0.108). Conversely, the stacking imaging, clinical, and fusion models exhibited significantly higher MAE values in the CH subgroup (MAE: 1.074 vs. 0.875, p = 0.018; 1.535 vs. 0.845, p < 0.001; 1.025 vs. 0.716, p < 0.001, respectively). Nevertheless, the fusion model continued to yield the lowest MAE in both CH and non-CH cohorts. Regarding classification task performance, all models demonstrated stable AUC values across CH and non-CH subgroups (all p > 0.05), with the fusion model exhibiting the highest overall discriminative capacity.

Model interpretation

To enhance interpretability, Fig.5 presents the predictive performance of the preoperative and postoperative models alongside Grad-CAM activation maps for four representative cases. The highlighted regions predominantly correspond to areas of subarachnoid hemorrhage, intraventricular blood, and low-density regions—findings that are clinically relevant to outcomes prediction in aSAH patients.

**Fig. 5: Examples of five models predicting different functional outcomes and their corresponding activation maps of pre- and postoperative NCCT images.**

For the stacking model, SHapley Additive Explanations (SHAP) weight analysis in Fig. 6a indicated a substantially higher contribution from the postoperative model (0.879) compared with the preoperative model (0.121). Figure 6b demonstrates a complementary relationship between the preoperative and postoperative models.

**Fig. 6: Importance analysis of stacking, clinical and fusion models.**

In the clinical model (Fig.6c), least absolute shrinkage and selection operator (LASSO) coefficients shows that the Hunt-Hess score emerged as the most influential variable (0.678), followed by the subarachnoid hemorrhage early brain edema score (0.379), the mFS score (0.356), and age (0.250), all of which demonstrated moderate to low importance. SHAP weight analysis shows in Supplementary Fig.3a.

Permutation importance is shown in Fig.6d, the fusion model included 2 imaging features—predictions from the preoperative and postoperative models (which together formed the stacking imaging model), and 4 clinical features, including Hunt-Hess score, mFS score, subarachnoid hemorrhage early brain edema score, and age. The stacking imaging model was the dominant predictor in the fusion model, with the highest importance score (0.967). Among clinical variables, the Hunt-Hess score remained the most impactful variable (0.126). SHAP weight analysis is shown in Supplementary Fig.3b.

Discussion

In this study, we developed and validated DL models incorporating preoperative and postoperative NCCT imaging data with clinical information to predict the 3-month functional outcomes (mRS score) in patients with aSAH. By utilizing imaging acquired at both time points, this approach captured the temporal progression of patient status, including the extent of initial brain injury (preoperative) and subsequent treatment-related changes or complications (postoperative). Furthermore, the integration of imaging data with clinical variables in a fusion model yielded superior predictive performance compared with traditional clinical models, both for specific mRS score and for the classification of poor functional outcome. Notably, within the fusion model, postoperative NCCT data contributed more substantially than preoperative data, this may be due to post-treatment findings such as early infarction, hydrocephalus, or residual hematoma, which often emerge after surgery and are strongly associated with poor outcomes. While the use of postoperative imaging may introduce the possibility of temporal bias, as it is temporally closer to the outcomes assessment, we believe this effect is minimal given the narrow time window (within 3 days) and the clinical relevance of the postoperative changes. Meanwhile, the Hunt-Hess score emerged as the most influential clinical variable. Importantly, the classification task performance of the fusion model remained robust and was not significantly affected by the presence of DCI or CH, underscoring its efficacy in predicting poor functional outcome.

For predicting poor functional outcome, the fusion model significantly outperformed the preoperative, postoperative, stacking imaging, and clinical models on the Test-combined set (p < 0.001). When predicting continuous mRS score, the fusion model demonstrated superior performance relative to all other models in terms of MAE (p < 0.001). These findings highlight the advantage of combining multimodal data from distinct imaging phases and clinical variables, thereby capturing complementary aspects of both initial brain injury and post-treatment alterations. This approach aligns with previous research underscoring the value of multimodal data integration for outcomes prediction^19,20. While earlier studies have predominantly utilized readily available admission clinical data for functional outcomes prediction^21,22, recent investigations have explored CT perfusion imaging in this context^9,23,24. In addition, Huang et al. incorporated radiomic features from CTA-based aneurysm imaging with clinical parameters to predict 3-month outcomes, with a promising result²⁵. Although research involving MRI remains limited, Sener S et al. reported that diffusion tensor imaging parameters assessed around day 12 after injury correlated with 6-month mortality in patients with severe aSAH, suggesting a potentially valuable research direction²⁶. However, this study leveraged NCCT imaging, which is more accessible and standardized across institutions compared with these advanced modalities, thereby enhancing the generalizability and clinical applicability of the model.

Analysis of feature importance elucidated the contribution of each data source to model performance. The postoperative imaging model demonstrated greater importance relative to the preoperative model, likely because of its capacity to capture treatment-related changes and complications that are highly predictive of outcomes. Nevertheless, the preoperative model provided essential baseline information, particularly regarding initial injury severity, which remains prognostically relevant despite the potential for subsequent recovery. In the Test-combined set, augmenting the postoperative model with preoperative data significantly reduced MAE (from 1.12 to 0.92, p < 0.001). Further integration of clinical variables into the fusion model enhanced performance relative to the stacking imaging model. Among clinical features, the Hunt-Hess score was the most influential factor. Early neurological status is a direct indicator of injury severity, and lower levels of consciousness are strongly associated with poorer prognosis, consistent with previous studies emphasizing the predictive utility of standardized neurological assessments^27,28,29. Although some clinical scores are derived from NCCT image assessments and may partially overlap with the stacking imaging model, they retained high importance, suggesting they capture diagnostically salient features that may not be fully learned by the DL model. This is likely because these scores are guided by expert judgment, focusing on critical imaging findings, and incorporate clinical experience and standardized diagnostic frameworks, thereby enhancing interpretability. Grad-CAM visualizations supported this notion, revealing that the pre- and postoperative models primarily attended to regions of brain edema and subarachnoid hemorrhage—findings that correspond with clinical scoring criteria. The combination of expert-derived assessments and stacking imaging data may thus contribute to greater prediction stability and reliability. In addition, Shan et al. proposed a radiomics model based on manual segmentation of the cerebral hemorrhage area, which also provided a certain level of interpretability³⁰. However, our DL model operates on NCCT data without manual segmentation, offering broader applicability, while also providing interpretability from a different perspective through Grad-CAM.

Prior studies have underscored the prognostic complexity introduced by complications such as DCI and CH in aSAH patients^31,32,33. In the present study, subgroup analyses demonstrated consistent AUC values for all models among patients with and without complications, including DCI and CH. However, for MAE, the differences between the two subgroups varied greatly across different models. These findings suggested that while such complications may introduce variability in the prediction of continuous outcomes, the models retained strong discriminative capacity for predicting functional outcomes. This discrepancy may be attributable to the limited clinical variables incorporated into the fusion model, which may be insufficient to accurately predict mRS score in patients with complications. The inclusion of additional imaging biomarkers or complication-specific molecular features could potentially improve MAE in predicting continuous mRS score.

In this study, a modified ResNet-50 was selected as the backbone for image-based DL framework due to its proven effectiveness and stability in medical imaging tasks, enabling it to meet clinical expectations with lower resource consumption^14,34,35. Compared to Transformer-based architectures, which typically require larger datasets and higher-dimensional inputs to achieve optimal performance, ResNet-based framework offers a more practical and robust solution for this medical dataset^36,37,38,39. For multimodal integration, we employed support vector regression, which demonstrated superior predictive performance over other methods such as random forest and gradient boosting. This advantage may be attributed to the continuous and low-dimensional nature of our integrated features, which align well with our framework. Although SHAP are more naturally aligned with tree-based models, we placed greater emphasis on predictive performance. As such, support vector regression was selected as an optimal trade-off between accuracy and interpretability.

This study offers several notable strengths. First, the use of a multicenter dataset enhanced the generalizability and robustness of the models across different populations and imaging protocols. Unlike many DL studies that limit distribution shifts and thereby risk overestimating performance and reducing clinical relevance, our models were evaluated under realistic test conditions incorporating heterogeneous data. The consistent performance under these conditions suggests the models identify meaningful imaging biomarkers rather than dataset-specific features. Second, the application of interpretability techniques such as Grad-CAM, SHAP, LASSO coefficients, and permutation importance strengthened the clinical utility of the models by elucidating key predictive contributors. The postoperative model exhibited greater importance for outcomes prediction, aligning with clinical understanding that post-treatment imaging changes are pivotal for prognosis. Importantly, the models were designed for flexible application at various stages of clinical care. The preoperative model can be used when only admission NCCT is available, while the fusion model yields enhanced predictive performance when additional postoperative imaging or clinical data are accessible, which is particularly beneficial in settings lacking comprehensive imaging resources. Third, rather than limiting analyses to binary classifications, our approach directly predicted continuous mRS score, thereby offering a more nuanced assessment of functional outcomes.

Despite these strengths, the study is subject to several limitations. First, although the models were deliberately validated across multiple independent institutions with diverse imaging protocols and patient populations, additional validation in larger and more geographically heterogeneous cohorts is needed. Domain adaptation techniques could also be leveraged to mitigate distribution shifts and improve model robustness in varied clinical environments. Second, the retrospective design may introduce biases in follow-up assessments and treatment consistency. Moreover, CH was only tracked for ≤14 days, and the 3-month follow-up may not be sufficient to assess long-term prognostic outcomes. While our fusion model outperformed conventional clinical scores, we acknowledge the potential tradeoff between its increased complexity and clinical applicability, especially in early decision-making settings. Future work should consider incorporating longitudinal data, such as serial imaging or neurological assessments, to better capture patients’ evolving clinical trajectories and mitigating potential temporal confounding issues. Additionally, more complex architectures such as transformers may benefit from the integration of molecular features and multi-modal fusion strategies, especially when applied to larger-scale datasets that can support their data demands.

In conclusion, this study demonstrates the feasibility and predictive utility of integrating preoperative and postoperative NCCT imaging with clinical variables for functional outcomes prediction in aSAH patients. The DL-generated fusion model represents a promising tool for individualized prognostication during the initial 3 days of presentation, with the potential to optimize patient management and resource allocation in aSAH care.

Methods

Ethics statement

The study was approved by the local ethics committee and institutional review board of each hospital (First Affiliated Hospital of Anhui Medical University: PJ2024-12-59; First Affiliated Hospital of Wannan Medical College:2024-185; Fuyang People’s Hospital: [2025]-6; Tongling People’s Hospital: 2025ky004D), and complied with the Declaration of Helsinki. The ethics committee waived the requirement for informed consent as this retrospective study analyzed only existing, fully anonymized clinical data with no additional patient interventions. All data were deidentified prior to analysis to protect participant privacy. Informed consent for the use of anonymized clinical data was obtained from patients or their legal representatives upon hospital admission. All methods were carried out following institutional guidelines and regulations.

Patients

This multicenter retrospective study analyzed data from four hospitals, each with distinct patient demographics and imaging protocols. All datasets were independent of one another: WN (n = 2148), AH (n = 405), FY (n = 392), and TL (n = 357). The inclusion criteria comprised adult patients (age ≥18 years) who were initially suspected of having subarachnoid hemorrhage between June 2012 and September 2024 at the participating institutions. Eligible patients presented at symptom onset, underwent preoperative NCCT imaging within 24 hours, followed by digital subtraction angiography. Only those confirmed with aSAH who subsequently received treatment were included in the study. Clinical data were extracted from electronic medical records, and NCCT images were retrieved from the picture archiving and communication system. Patients were excluded if postoperative NCCT images (acquired between Days 1 and 3) were unavailable or showed motion artifacts, or if clinical data or the 3-month mRS score were missing. To assess potential selection bias, baseline characteristics were compared between included and excluded patients across all centers, including age, sex, GCS score, WFNS score, Hunt-Hess score, mFS score, and subarachnoid early brain edema score. A flowchart of patient inclusion and exclusion is presented in Fig. 1.

Clinical outcome and in-hospital complications

The mRS, ranging from 0 (no symptoms) to 6 (death), was obtained from medical records at the 3-month follow-up⁴⁰. DCI was defined as the presence of a new infarct on hospitalization, NCCT or MRI not attributable to treatment, a new hypodense region, or unexplained neurologic deterioration accompanied by a decline in Glasgow Coma Scale (GCS) score⁴¹; DCI status was extracted from medical records. Hydrocephalus was evaluated by radiologists at each center (P.Y., 5 years post-training; X.H., 8 years post-training; C.Z., 8 years post-training; Y.T., 18 years post-training) using hospitalization NCCT. It was categorized as acute (0–3 days post-aSAH), subacute (4–13 days post-aSAH), or CH (≥14 days post-aSAH)⁴². Diagnostic criteria included marked enlargement of the temporal horns or a frontal horn-to-biparietal diameter ratio >30%. Patients discharged within 14 days without radiological evidence of hydrocephalus on preoperative or hospitalization NCCT were classified as not having CH.

Clinical data and NCCT imaging

Clinical data obtained within 24 hours of presentation included age, sex, history of hypertension, GCS score, WFNS score (1–5), Hunt-Hess score (1–5), mFS score (1–4), subarachnoid hemorrhage early brain edema score (0–4), presence of acute hydrocephalus, presence of localized subarachnoid hematoma, aneurysm location and size based on digital subtraction angiography, and treatment modality (coiling or clipping). Details are summarized in Table 1.

Preoperative NCCT images acquired within 24 hours of presentation and postoperative NCCT images obtained between Days 1 and 3 were collected. For patients with multiple post-treatment scans, only the earliest available image was analyzed. NCCT imaging was performed using various scanners; specific acquisition parameters are provided in Supplementary Note 3. All images were resampled to a standardized voxel size of 320 × 320 × 64 and normalized to canonical orientation, without skull stripping. An image display window level of 40 and a window width of 80 were applied. Subsequently, Z-score normalization was performed.

Model construction

The workflow for model construction is illustrated in Fig. 2. The WN dataset (n = 1178) was utilized for model development, while the remaining three datasets (AH, FY, TL) served as independent testing sets to assess the model’s robustness in the presence of real-world heterogeneity. These were also aggregated to form the Test-combined dataset (n = 672). Center selection and rationale for Test-combined sees in Supplementary Note 4. Five models were developed: preoperative, postoperative, stacking imaging, clinical, and fusion models.

For the preoperative and postoperative models, a modified ResNet-50 architecture incorporating three-dimensional convolutional neural networks was implemented to directly generate continuous outputs for the regression task (see Supplementary Fig. 4)¹⁴. A stratified 5-fold cross-validation procedure was applied within the WN dataset for manual hyperparameter optimization. During training, CT volumes were resampled to 320 × 320 × 64 voxels and normalized to a canonical orientation. Z-score normalization was applied across all volumes. Random data augmentation techniques—including random flipping (random horizontal or vertical flips with 50% probability), affine transformations (scale: 0.9–1.1, rotation: ±15°, translation: ±10 voxels), gamma correction (γ ∈ [−0.3, 0.3]), contrast adjustments (range: 0.75–1.25), and additive Gaussian noise (μ = 0, σ ∈ [0, 0.1])—were applied to enhance generalizability. Model training was performed using the AdamW optimizer with an initial learning rate of 8 × 10⁻⁴, weight decay of 1 × 10⁻⁵, and a batch size of 10. Training was conducted over a maximum of 80 epochs, with an 8-epoch warm-up phase. If validation performance plateaued for 12 epochs, the learning rate was reduced by a factor of 0.3. Early stopping was employed with a patience value of 10 epochs. A weighted mean squared error loss function was used⁴³, with class weights assigned based on sample distribution, assigning a weight of 2 to a positive outcome (mRS > 2.5). Output values were constrained within the range of 0 to 6 to align with the mRS. The final model was retrained on the entire WN dataset using the optimal hyperparameters derived from cross-validation.

For the construction of the stacking imaging model, support vector regression was employed to integrate predictions from the preoperative and postoperative models. Each base model was trained independently using stratified 5-fold cross-validation with a radial basis function kernel. Final predictions were derived by aggregating the validation outputs from each fold and restricting them to the continuous interval between 0 and 6.

For the clinical model, multicollinearity of clinical variables was addressed through stepwise variance inflation factor selection. Features with variance inflation factor values greater than 5 were sequentially removed until all remaining features had values below the threshold. The remaining features were standardized and subsequently processed using LASSO regression (α = 0.01, determined through cross-validation)⁴⁴. Features with non-zero coefficients were retained and further refined using recursive feature elimination to identify the optimal subset for modeling. The final model generated a continuous prediction of mRS score within the range of 0 to 6. Further details of the feature selection procedure are described in Supplementary Note 5.

For the fusion model, predictions from the stacking imaging model were integrated with the selected clinical features to develop a support vector regression model. This model also produced a continuous mRS prediction value ranging from 0 to 6. Besides, we also evaluated two alternative fusion models, fusion-Alt1 and fusion-Alt2, which combine selected clinical features with predictions from either preoperative or postoperative models, respectively. For the stacking imaging, clinical, and fusion models, support vector regression was compared with random forest and gradient boosting regression for multimodal integration. Details are provided in the Supplementary Note 6.

Model evaluation for regression and classification tasks

For the regression task, the objective was to predict the continuous mRS score ranging from 0 to 6. Model performance was assessed using mean absolute error (MAE).

For the classification task, which was derived from the regression outputs, the objective was to predict functional outcomes by dichotomizing the mRS score as good (mRS ≤2) or poor (mRS >2) using a threshold of 2.5. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC). Sensitivity and specificity were calculated based on the continuous predictions using the same threshold of 2.5. To assess the robustness of our approach, additional analyses were performed using mRS>3 as the poor functional outcome.

To assess the predictive performance of the fusion model relative to conventional clinical tools, comparisons were made with widely used clinical scoring systems, including the WFNS score, Hunt-Hess score, and mFS score, in the Test-combined set. Sensitivity and specificity were calculated using the optimal cut-off points determined by the Youden Index.

Subgroup analysis of in-hospital complications

Patients were stratified into DCI versus non-DCI groups and CH versus non-CH groups based on the presence of complications. Model performance within each subgroup was evaluated by comparing MAE and AUC metrics within each complication group (DCI vs. non-DCI and CH vs. non-CH) in the Test-combined set. These metrics were derived from predictions previously generated by the five models on the Test-combined dataset.

Model interpretation

Multiple interpretability techniques were employed to elucidate model behavior. Grad-CAM⁴⁵ was utilized for both preoperative and postoperative models to visualize salient regions in the NCCT images. The stacking imaging model was interpreted using SHAP⁴⁶ to quantify the contribution of individual model predictions. The clinical model was interpreted via LASSO coefficients, whereas permutation importance was applied to the fusion model to assess the influence of clinical features and stacking model outputs.

Statistical analysis

Model development was based on data from WN, and evaluation was performed using an external cohort comprising AH, FY, and TL. Performance metrics included MAE, AUC, sensitivity, and specificity. Model comparative statistical analyses were conducted using paired t-tests and the DeLong test, where appropriate. Sensitivity and specificity were computed using a threshold of 2.5. MAE and AUC were designated as the primary performance metrics for regression and classification tasks, respectively. Baseline characteristics comparison was conducted using the Mann-Whitney U test or chi-square test as appropriate. A two-sided p-value of <0.05 was considered statistically significant. All statistical analyses were performed using Python version 3.12.

Data availability

The imaging data used in this study are protected and not publicly available due to personal information protection, patient privacy regulations, and institutional data-sharing policies. However, the data can be made available to qualified researchers for academic purposes upon reasonable request. Interested parties may contact the corresponding author, who will review each request for scientific merit and compliance with data governance requirements. Approved requests will be processed within 14 working days.

Code availability

The source code for model development, trained model weights, and full reproduction scripts have been made publicly available at https://github.com/Seaburg97/aSAH.

References

Robba, C. et al. Contemporary management of aneurysmal subarachnoid haemorrhage. Intensive Care Med. 50, 646–664 (2024).
Article PubMed PubMed Central Google Scholar
Veldeman, M. et al. Delayed cerebral infarction after aneurysmal subarachnoid hemorrhage. Neurology 103, e209607 (2024).
Article PubMed Google Scholar
Fang, Y. et al. Cerebrospinal fluid markers of neuroinflammation and coagulation in severe cerebral edema and chronic hydrocephalus after subarachnoid hemorrhage. J. Neuroinflamm. 21, 237 (2024).
Article Google Scholar
Connolly, E. S. Jr. et al. Guidelines for the management of aneurysmal subarachnoid hemorrhage. Stroke 43, 1711–1737 (2012).
Article PubMed Google Scholar
Kasner, S. E. et al. Clinical interpretation and use of stroke scales. Lancet Neurol. 5, 603–612 (2006).
Article PubMed Google Scholar
Sagues, E. et al. Outcomes measures in subarachnoid hemorrhage research. Transl. Stroke Res. 10, 1284–1293 (2024).
Google Scholar
van Donkelaar, C. E. et al. Prediction of outcome after aneurysmal subarachnoid hemorrhage. Stroke 50, 837–844 (2019).
Article PubMed Google Scholar
Degen, L. A. et al. Interobserver variability of grading scales for aneurysmal subarachnoid hemorrhage. Stroke 42, 1546–1549 (2011).
Article PubMed Google Scholar
Yin, P. et al. Machine learning using presentation CT perfusion imaging for predicting clinical outcomes in patients with aneurysmal subarachnoid hemorrhage. AJR Am. J. Roentgenol. 221, 817–835 (2023).
Article PubMed Google Scholar
Rubbert, C. et al. Prediction of outcome after aneurysmal subarachnoid haemorrhage using data from patient admission. Eur. Radiol. 28, 4949–4958 (2018).
Article PubMed Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article PubMed Google Scholar
Neifert, S. N. et al. Aneurysmal subarachnoid hemorrhage: the last decade. Transl. Stroke Res. 12, 428–446 (2021).
Article PubMed Google Scholar
Pease, M. et al. Outcome prediction in patients with severe traumatic brain injury using deep learning from head CT scans. Radiology 304, 385–394 (2022).
Article PubMed Google Scholar
Liu, Y. et al. Prediction of ischemic stroke functional outcomes from acute-phase noncontrast CT and clinical information. Radiology 313, e240137 (2024).
Article PubMed Google Scholar
Maldaner, N. et al. Development of a complication and treatment-aware prediction model for favorable functional outcome in aneurysmal subarachnoid hemorrhage based on machine learning. Neurosurgery 88, E150–E157 (2021).
Article PubMed Google Scholar
Jaja, B. N. R. et al. Development and validation of outcome prediction models for aneurysmal subarachnoid haemorrhage: the SAHIT multinational cohort study. BMJ 360, j5745 (2018).
Article PubMed Google Scholar
García-García, S. et al. Mortality prediction of patients with subarachnoid hemorrhage using a deep learning model based on an initial brain CT scan. Brain Sci. 14, 10 (2023).
Article PubMed PubMed Central Google Scholar
Moriya, M. et al. Interpretable machine learning model for outcome prediction in patients with aneurysmatic subarachnoid hemorrhage. Crit. Care 29, 36 (2025).
Article PubMed PubMed Central Google Scholar
Zhou, Z. et al. Pre and post-operative online prediction of outcome in patients undergoing endovascular coiling after aneurysmal subarachnoid hemorrhage: visual and dynamic nomograms. Brain Sci. 13, 1185 (2023).
Article PubMed PubMed Central Google Scholar
Huenges Wajer, I. M. et al. CT perfusion on admission and cognitive functioning 3 months after aneurysmal subarachnoid haemorrhage. J. Neurol. 262, 623–628 (2015).
Article PubMed Google Scholar
Gaastra, B. et al. CRP in outcome prediction after subarachnoid hemorrhage and the role of machine learning. Stroke 52, 3276–3285 (2021).
Article PubMed Google Scholar
Feghali, J. et al. External validation of a neural network model in aneurysmal subarachnoid hemorrhage: a comparison with conventional logistic regression models. Neurosurgery 90, 552–561 (2022).
Article PubMed Google Scholar
Russin, J. J. et al. Permeability imaging as a predictor of delayed cerebral ischemia after aneurysmal subarachnoid hemorrhage. J. Cereb. Blood Flow. Metab. 38, 973–979 (2018).
Article PubMed PubMed Central Google Scholar
Zhang, C. et al. Early and delayed blood-brain barrier permeability predicts delayed cerebral ischemia and outcomes following aneurysmal subarachnoid hemorrhage. Eur. Radiol. 34, 5287–5296 (2024).
Article PubMed Google Scholar
Huang, T. et al. Can the radiomics features of intracranial aneurysms predict the prognosis of aneurysmal subarachnoid hemorrhage?. Front. Neurosci. 18, 1446784 (2024).
Article PubMed PubMed Central Google Scholar
Sener, S. et al. Diffusion tensor imaging: A possible biomarker in severe traumatic brain injury and aneurysmal subarachnoid hemorrhage?. Neurosurgery 79, 786–793 (2016).
Article PubMed Google Scholar
Thilak, S. et al. Diagnosis and management of subarachnoid haemorrhage. Nat. Commun. 15, 1850 (2024).
Article PubMed PubMed Central Google Scholar
Hofmann, B. B. et al. Revisiting the WFNS score: native computed tomography imaging improves identification of patients with “false poor grade” aneurysmal subarachnoid hemorrhage. Neurosurgery 94, 515–523 (2024).
Article PubMed Google Scholar
Goldberg, J. et al. Survival and outcome after poor-grade aneurysmal subarachnoid hemorrhage in elderly patients. Stroke 49, 2883–2889 (2018).
Article PubMed Google Scholar
Shan, D. et al. Non-contrasted CT radiomics for SAH prognosis prediction. Bioengineering 10, 967 (2023).
Article PubMed PubMed Central Google Scholar
Labib, H. et al. Sodium and its impact on outcome after aneurysmal subarachnoid hemorrhage in patients with and without delayed cerebral ischemia. Crit. Care Med. 52, 752–763 (2024).
Article PubMed PubMed Central Google Scholar
Macdonald, R. L. Delayed neurological deterioration after subarachnoid haemorrhage. Nat. Rev. Neurol. 10, 44–58 (2014).
Article PubMed Google Scholar
Adams, H. et al. Risk of shunting after aneurysmal subarachnoid hemorrhage: a collaborative study and initiation of a consortium. Stroke 47, 2488–2496 (2016).
Article PubMed Google Scholar
Gibson, E. et al. Artificial intelligence with statistical confidence scores for detection of acute or subacute hemorrhage on noncontrast CT head scans. Radio. Artif. Intell. 4, e210115 (2022).
Article Google Scholar
Liu, Y. et al. Functional outcome prediction in acute ischemic stroke using a fused imaging and clinical deep learning model. Stroke 54, 2316–2327 (2023).
Article PubMed PubMed Central Google Scholar
Takahashi, S. et al. Comparison of vision transformers and convolutional neural networks in medical image analysis: a systematic review. J. Med Syst. 48, 84 (2024).
Article PubMed PubMed Central Google Scholar
Yadav, S. S. & Jadhav, S. M. Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 6, 113 (2019).
Article Google Scholar
Jeevan, P. P. & Sethi, A. Which backbone to use: a resource-efficient domain specific comparison for computer vision. arXiv 2406.05612v1 (2024).
Jain, A. et al. A comparative study of CNN, ResNet, and vision transformers for multi-classification of chest diseases. arXiv 2406.00237v1 (2024).
Saver, J. L. et al. Standardized nomenclature for modified Rankin Scale global disability outcomes: consensus recommendations from stroke therapy academic industry roundtable XI. Stroke 52, 3054–3062 (2021).
Article PubMed Google Scholar
Vergouwen, M. D. et al. Definition of delayed cerebral ischemia after aneurysmal subarachnoid hemorrhage as an outcome event in clinical trials and observational studies: proposal of a multidisciplinary research group. Stroke 41, 2391–2395 (2010).
Article PubMed Google Scholar
Kuo, L. T. & Huang, A. P. The pathogenesis of hydrocephalus following aneurysmal subarachnoid hemorrhage. Int. J. Mol. Sci. 22, 5050 (2021).
Article PubMed PubMed Central Google Scholar
Zhou, P. et al. Towards understanding convergence and generalization of AdamW. IEEE Trans. Pattern Anal. Mach. Intell. 46, 6486–6493 (2024).
Article PubMed Google Scholar
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996).
Article Google Scholar
Selvaraju, R. R. et al. Visual explanations from deep networks via gradient-based localization: Grad-CAM. Int. J. Comput. Vis. 128, 618–626 (2017).
Google Scholar
Nohara, Y. et al. N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput. Methods Prog. Biomed. 214, 106584 (2022).
Article Google Scholar

Download references

Acknowledgements

This work was partly supported by the Anhui Provincial Scientific Research Program (2308085Y48; 2023AH040251; 202304295107020028) and National Science Foundation for Distinguished Young Scholars of the Higher Education Institutions of Anhui Province (2022AH020071). We would like to thank Dr. Jianying Li for reviewing the manuscript and providing valuable suggestions. We also thank Dr. Phoebe Chi from Liwen Bianji (Edanz) for editing an early draft of the manuscript.

Author information

These authors contributed equally: Pengzhan Yin, Jiaqi Wang.

Authors and Affiliations

Department of Radiology, the First Affiliated Hospital of Anhui Medical University; Research Center of Clinical Medical Imaging; Anhui Province Clinical Image Quality Control Center, Hefei, Anhui, China
Pengzhan Yin, Hongmin Shu, Jun Wang, Bin Liu, Yongqiang Yu & Xiaohu Li
School of Information, Wannan Medical College, Wuhu, Anhui, China
Jiaqi Wang
Department of Radiology, First Affiliated Hospital of Wannan Medical College, Wuhu, Anhui, China
Chao Zhang & Yunfeng Zhou
Department of Radiology, Tongling People’s Hospital, Tongling, Anhui, China
Yongxiang Tang
Department of Radiology, Fuyang People’s Hospital, Fuyang, Anhui, China
Xiankuo Hu

Authors

Pengzhan Yin
View author publications
Search author on:PubMed Google Scholar
Jiaqi Wang
View author publications
Search author on:PubMed Google Scholar
Chao Zhang
View author publications
Search author on:PubMed Google Scholar
Yongxiang Tang
View author publications
Search author on:PubMed Google Scholar
Xiankuo Hu
View author publications
Search author on:PubMed Google Scholar
Hongmin Shu
View author publications
Search author on:PubMed Google Scholar
Jun Wang
View author publications
Search author on:PubMed Google Scholar
Bin Liu
View author publications
Search author on:PubMed Google Scholar
Yongqiang Yu
View author publications
Search author on:PubMed Google Scholar
Yunfeng Zhou
View author publications
Search author on:PubMed Google Scholar
Xiaohu Li
View author publications
Search author on:PubMed Google Scholar

Contributions

Guarantors of integrity of entire study, Y.Z, X.L, P.Y, J.W; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, Y.Z, X.L, P.Y, J.W.; clinical studies, C.Z, Y.T, X.H, H.S, J.W, B.L.; experimental studies, P.Y, J.W, C.Z, Y.T, X.H, H.S; statistical analysis, Y.Z, X.L, P.Y, J.W; and manuscript editing, Y.Z, X.L, P.Y, J.W, J.L.

Corresponding authors

Correspondence to Yunfeng Zhou or Xiaohu Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yin, P., Wang, J., Zhang, C. et al. Prediction of functional outcomes in aneurysmal subarachnoid hemorrhage using pre-/postoperative noncontrast CT within 3 days of admission. npj Digit. Med. 8, 542 (2025). https://doi.org/10.1038/s41746-025-01953-z

Download citation

Received: 19 May 2025
Accepted: 13 August 2025
Published: 24 August 2025
Version of record: 24 August 2025
DOI: https://doi.org/10.1038/s41746-025-01953-z

This article is cited by

A Causal and interpretable machine learning framework for postcranioplasty risk prediction and surgical decision support
- Wenbo Li
- Bao Wang
- Ning Yang
npj Digital Medicine (2026)

Subjects

Abstract

Similar content being viewed by others

Prediction of the 180 day functional outcomes in aneurysmal subarachnoid hemorrhage using an optimized XGBoost model

External multicenter validation of the eSAH score for predicting outcomes after subarachnoid hemorrhage

Controlling nutritional status score during hospitalization as a predictor of clinical outcome in patients with aneurysmal subarachnoid hemorrhage

Introduction

Results

Patients

Model performance

Subgroup analysis of DCI and CH

Model interpretation

Discussion

Methods

Ethics statement

Patients

Clinical outcome and in-hospital complications

Clinical data and NCCT imaging

Model construction

Model evaluation for regression and classification tasks

Subgroup analysis of in-hospital complications

Model interpretation

Statistical analysis

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A Causal and interpretable machine learning framework for postcranioplasty risk prediction and surgical decision support

Search

Quick links