High sensitivity in spontaneous intracranial hemorrhage detection from emergency head CT scans using ensemble-learning approach

Takala, Juuso; Peura, Heikki; Pirinen, Riku; Väätäinen, Katri; Terjajev, Sergei; Lin, Ziyuan; Raj, Rahul; Korja, Miikka

doi:10.1038/s41598-025-15835-7

Download PDF

Article
Open access
Published: 15 August 2025

High sensitivity in spontaneous intracranial hemorrhage detection from emergency head CT scans using ensemble-learning approach

Juuso Takala¹,
Heikki Peura¹,
Riku Pirinen¹,
Katri Väätäinen²,
Sergei Terjajev²,
Ziyuan Lin³,
Rahul Raj¹ &
…
Miikka Korja¹

Scientific Reports volume 15, Article number: 29919 (2025) Cite this article

1640 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

Spontaneous intracranial hemorrhages have a high disease burden. Due to increasing medical imaging, new technological solutions for assisting in image interpretation are warranted. We developed a deep learning (DL) solution for spontaneous intracranial hemorrhage detection from head CT scans. The DL solution included four base convolutional neural networks (CNNs), which were trained using 300 head CT scans. A metamodel was trained on top of the four base CNNs, and simple post processing steps were applied to improve the solution’s accuracy. The solution performance was evaluated using a retrospective dataset of consecutive emergency head CTs imaged in ten different emergency rooms. 7797 head CT scans were included in the validation dataset and 118 CT scans presented with spontaneous intracranial hemorrhage. The trained metamodel together with a simple rule-based post-processing step showed 89.8% sensitivity and 89.5% specificity for hemorrhage detection at the case-level. The solution detected all 78 spontaneous hemorrhage cases imaged presumably or confirmedly within 12 h from the symptom onset and identified five hemorrhages missed in the initial on-call reports. Although the success of DL algorithms depends on multiple factors, including training data versatility and quality of annotations, using the proposed ensemble-learning approach and rule-based post-processing may help clinicians to develop highly accurate DL solutions for clinical imaging diagnostics.

Impact of a deep learning-based brain CT interpretation algorithm on clinical decision-making for intracranial hemorrhage in the emergency department

Article Open access 27 September 2024

Decision effect of a deep-learning model to assist a head computed tomography order for pediatric traumatic brain injury

Article Open access 21 July 2022

Noncontrast CT-based deep learning for predicting intracerebral hemorrhage expansion incorporating growth of intraventricular hemorrhage

Article Open access 31 August 2025

Introduction

Spontaneous intracerebral (ICH) and subarachnoid hemorrhage (SAH) account for up to one-third of all strokes¹. Both ICH and SAH are associated with high morbidity and mortality rates^2,3, and intraventricular hemorrhage (IVH) also commonly occurs with either condition. Due to the high disease burden of ICH and SAH, prompt and accurate diagnosis is critical, along with quickly initiated therapeutic actions^4,5.

Since non-contrast head CT scans (NCCTs) have high sensitivity for detecting acute intracranial blood within 12 h from symptom onset, the diagnosis of acute intracranial hemorrhages is based on emergent NCCTs^6,7. The number of medical imaging studies is increasing globally^8,9,10. At the same time, there are increasing concerns regarding the fatigue of radiologists and its effect on diagnostic accuracy^11,12. Therefore, new technological solutions to assist clinicians and radiologists with rapid and accurate interpretation of imaging studies could alleviate this issue. According to a recent review article¹³, multiple algorithms have been developed for this task. However, the performance metrics of these solutions are often poorly reported, and suboptimal for clinical use.

Recently, we trained a model that identified SAH with high sensitivity, although the false positive rate was also high¹⁴. To overcome this false positive problem, the number of images required for training a deep learning model can be exceptionally high. Based on these premises, we aimed to develop a combination of multiple deep learning (DL) algorithms that would detect spontaneous intracranial hemorrhages, namely ICH, SAH and IVH, with a high sensitivity and low false positive rate despite limited amount of training data. Given the inherent diagnostic limitation of modern head CT scanners in identifying subacute blood (accuracy highest in early imaging)^6,15, we aimed to train the solution to give optimal results when applied to cases with acute hemorrhages. To evaluate the solution’s performance, we gathered a comprehensive dataset of emergency NCCT scans from different emergency rooms, imaged using various scanner devices.

Materials and methods

The institutional review board of Helsinki University Hospital (HUH) approved the study and granted a waiver for acquiring informed consents (HUS/365/2017; HUS/163/2019; HUS/190/2021). According to the Finnish legislation, no ethics committee approval is needed for retrospective studies using registry or archive data. We conducted study reporting following the Standards for Reporting of Diagnostic Accuracy Study (STARD) and Checklist for Artificial Intelligence in Medical Imaging (CLAIM) guidelines^16,17.

Deep learning solution

Our solution included four base U-Nets¹⁸ and a metamodel that was trained on top of the base U-Nets. We trained one ICH U-Net, one IVH U-Net and two SAH U-Nets to detect specific types of spontaneous intracranial hemorrhages. The training and performance metrics of one of the SAH U-Nets has been described previously¹⁴. In short, this SAH model was developed with 120 NCCTs, of which 98 included SAH, and 22 were negative for SAH. We trained the new ICH, SAH and IVH U-Nets with 180 NCCTs collected from HUH picture archiving and communication systems. For training the three new U-Nets, we segmented only the bleeding type of interest present in the NCCTs. The new SAH U-Net was developed in order improve the detection of focal SAHs.

The training dataset of the three new U-Nets consisted of 63 (ICH), 50 (IVH) and 67 (SAH) head NCCT multiplanar reconstruction (MPR) reformates (with 512 × 512 dimensions, 3 mm slice thickness). All patients were imaged and treated at HUH prior to October 2021. Segmentations were done using a Philips IntelliSpace Discovery (Philips Healthcare, 3000 Minuteman Rd, Andover, MA) and 3D Slicer¹⁹. In segmentations, Hounsfield unit (HU) based thresholds were used. Eventually, the segmentation masks were saved in a binary format with each hemorrhage type segmentation with a unique label.

After training the base U-Net models, we trained the metamodel with 55 NCCTs randomly sampled from the 180 previously mentioned NCCTs. For these 55 NCCTs, we segmented all three bleeding types. All NCCTs used for model training were imaged prior to October 2021, and therefore did not overlap with our validation dataset.

During the inference phase, the NCCT slice is first fed to the four base models, which produce output segmentations of ICH, IVH and SAH (one segmentation for ICH, one segmentation for IVH and two segmentations for SAH). The four resulting segmentations together with the original NCCT slice are then directed to the metamodel, which produces outputs as the final segmentations for ICH, IVH and SAH. The DL solution performs semantic segmentation (i.e. predicting pixel-wise probability for the presence of intracranial hemorrhage). Unlike conventional stacked generalization, our approach provides the metamodel with both the base model predictions and the original NCCT slice as input.

After the base U-Net and metamodel development, the solution was implemented with a post-processing pipeline for improving the solution detection accuracy. First, segmentation clusters smaller than 10 pixels were removed. This was followed by a soft-voting step. In the soft-voting step, the segmentations from the base models were summed, averaged and compared against the segmentation mask of the metamodel. If the summed segmentation did not overlap with the metamodel segmentation, the segmentation was directed to test-time augmentation (TTA) phase. Otherwise, the TTA phase was not applied. In the final step of the post processing, the positive segmentation clusters were combined with the segmentations of the base U-Nets. If the combined cluster size exceeded 125 pixels, the segmentation was classified as a positive prediction.

During the inference phase, we measured the total processing times for the NCCTs, which included reading the DICOM files, preprocessing the imaging data, predicting the presence of a hemorrhage, post-processing step, and saving the predictions in a NIfTI file format.

The detailed description of DL solution training and post-processing pipeline is provided in the article supplementary material (Supplementary Materials and Methods, Supplement 1). The code for developing the solution and performing the inference tasks are available at GitHub (https://github.com/Juusotak/Intracranial_hemorrhage_metalearning).

Validation dataset

To simulate a real-world emergency imaging setting, we collected a retrospective dataset from 10 different hospitals in the HUH catchment area, which covers over 1,700,000 inhabitants. The validation dataset consisted of all consecutive emergency head NCCTs imaged between October 1st and December 31st 2021. In more detail, if an NCCT scan of adult patients (18 years or older) was performed either with emergent or immediate priority, it was included in the dataset. We also collected corresponding on-call radiology reports, patient reports of the emergency clinic visit, and ambulance reports, when available. In addition to these, we gathered information about patient demographics (age and sex).

The time of the symptom onset and the etiology of hemorrhages were evaluated based on patient and ambulance reports. The on-call radiology reports were considered as ground truths for the presence of intracranial hemorrhage in NCCTs. If the clinical reports stated that the hemorrhage occurred without a history of head trauma, the hemorrhage was classified as a spontaneous intracranial hemorrhage. If the reports mentioned a previous head trauma, or if the etiology could not be confirmed based on available information, the hemorrhage was classified either traumatic or unclear etiology, and the case was excluded from the validation dataset. Primary spontaneous intracranial hemorrhages included non-traumatic aneurysmal and non-aneurysmal SAHs, non-traumatic deep and lobar ICHs, and non-traumatic IVHs. Secondary spontaneous intracranial hemorrhages included SAHs, ICHs or IVHs related to ischemic strokes and tumors, and were excluded from the analysis. The dataset included also acute subdural hemorrhages (ASDHs) and epidural hemorrhages (EDHs), if the medical history did not include preceding head traumas.

The head NCCT scans in which the on-call radiologist had reported hemorrhage were independently annotated on a slice-level by two study authors (JT; Medical doctor with 3 years of experience in neuroimaging and deep learning research and KV; Radiologist in-training with 2.5 years of experience) for the presence of hemorrhage using 3D Slicer¹⁹. Senior study author (MK; Consultant cerebrovascular neurosurgeon with 20 years of experience) solved any conflicts between the annotators either by removing annotations from the slice or accepting the slice annotations.

The final validation dataset included 118 NCCTs in which on-call radiologists reported spontaneous intracranial hemorrhages, and 7679 NCCTs that were reported negative (Fig. 1). Of the 7797 NCCTs, 4078 represented women (median age 74 years old, IQR 29 years) and 3719 men (median age 67 years old, IQR 27 years) (Table 1). The median age of all imaged patients was 71 years (IQR 29 years) (Table 1). The 7797 NCCTs were imaged using 12 different CT scanners from four vendors (Table 1).

Table 1 Validation dataset patient demographics and CT scanner devices used in imaging. IQR = interquartile range.

Full size table

Of the 118 NCCTs in which on-call radiologists had reported hemorrhage, 59 were imaged within 12 h, 19 presumably within 12 h and 17 presumably/certainly within 12 to 24 h from the symptom onset (Table 2). Of the 59 early imaged (within 12 h) spontaneous intracranial hemorrhages, 32 contained only one hemorrhage type, 22 two hemorrhage types and 6 three hemorrhage types. Overall, 49 NCCTs included only one hemorrhage type, 38 included two hemorrhage types, and 10 images included three hemorrhage types (Table 2). The dataset also included one ASDH. The patient presented without a history of head trauma.

Table 2 Hemorrhage types in validation dataset. Counts of different bleed types and combinations of bleed types in dataset. ICH = intracerebral hemorrhage, ivh = intraventricular hemorrhage, sah = subarachnoid hemorrhage, asdh = acute subdural hemorrhage.

Full size table

Statistical analyses

The case- and slice-level metrics were calculated using NumPy (version 1.24.0) and SciPy (version 1.9.3) Python packages. These reported metrics included sensitivity, specificity, false positive rate, negative predictive, positive predictive value, accuracy, and 95% confidence intervals (CI) for these metrics. We used the same Python packages for calculating the validation dataset demographics and interquartile ranges (IQR).

Results

Technical performance

Figure 2 depicts the time range taken for analyzing all slices in one head axial 3 mm NCCT scan (median number of slices 53). The median time of analyzing all slices in a single scan was 6.7 s (range from 4.2 to 33.1 s). Example of DL solution output is presented in Fig. 3.

Performance of metamodel and base U-Nets

Table 3 represents the case-level performance metrics for the four base U-Nets and the metamodel. The sensitivity and specificity of the metamodel was 92.4% (95% CI, 87.6–97.2%) and 53.8% (95% CI, 52.7–54.9%), respectively. The case-level false positive rate of the metamodel was 46.2% (95% CI, 45.1–47.3%). Both SAH1 and SAH2 base U-Nets had 100% sensitivity for hemorrhage detection at the case-level.

Table 3 Case-level performance of the base U-Nets and metamodel for all scans in validation dataset. SAH1 base U-Net is model trained during this study. SAH2 is model which development and validation was previously described by thanellas et al.¹⁴. The best achieved metrics are bolded. TP = true positive, tn = true negative, fp = false positive, fn = false negative, fpr = false positive rate, npv = negative predictive value, ppv = positive predictive value, ci = confidence interval.

Full size table

Supplementary Table 1 (Supplement 1) presents slice-level performance metrics for the base U-Nets and metamodel. The sensitivity and specificity of the metamodel was 71.7% (95% CI, 69.7–73.7%) and 98.2% (95% CI, 98.1–98.2%), respectively. The slice-level false positive rate of the metamodel was 1.8% (95% CI, 1.8–1.9%). At the slice-level, the ICH, IVH, SAH1 and SAH2 base U-Nets had false positive rates of 4.3% (95% CI, 4.3–4.4%), 4.3% (95% CI, 4.3–4.4%), 9.9% (95% CI, 9.8–10.0%) and 10.1% (95% CI, 10.0–10.2%), respectively.

Metamodel performance with full post-processing

Table 4 represents the case-level performance metrics of the metamodel with full post-processing. The case-level sensitivity and specificity of the metamodel were 89.8% (95% CI, 84.4–95.3%) and 89.5% (95% CI, 88.8–90.2%), respectively. The case-level false positive rate for the metamodel was 10.5% (95% CI, 9.8–11.2%).

Table 4 Case-level performance metrics of the metamodel with full post-processing. The performance metrics for scans with hemorrhage are also stratified according to delay from symptom onset to imaging. TP = true positive, tn = true negative, fp = false positive, fn = false negative, fpr = false positive rate, npv = negative predictive value, ppv = positive predictive value, n/a = not applicable, ci = confidence interval.

Full size table

The metamodel showed 69.3% (95% CI, 67.3–71.3%) slice-level sensitivity and 99.6% (95% CI, 99.6–99.6%) slice-level specificity. The slice-level false positive rate of the metamodel was 0.4% (95% CI, 0.4–0.4%). In other words, of the 7679 NCCTs reported negative for hemorrhage by the on-call radiologists, the solution predicted falsely positive 1594 slices out of the 408 426 slices. The slice-level performance of the metamodel with full post-processing is presented in Supplementary Table 2 (Supplement 1). Most of the false positive pixel clusters pointed out blood in normal and highly vascularized anatomical structures, e.g. superior sagittal sinus, cerebellar tentorium, straight sinus, and falx cerebri (Supplementary Figs. 1 and 2, Supplement 1). Detailed summaries of case- and slice-level performance metrics for each component of the DL solution are provided in Supplementary Figs. 3 and 4 (Supplement 1).

Metamodel performance by time of symptom onset

The metamodel with full post-processing detected hemorrhages in 59 out of 59 patients (sensitivity 100.0%) who were imaged within 12 h from symptom onset (Table 4). For 19 patients imaged most likely (not 100% certain) within 12 h from symptom onset, the solution’s sensitivity was also 100% (Table 4). An additional 18 patients were imaged 12 to 24 h from the symptom onset. The metamodel sensitivity with full post-processing among this group was 77.8% (95 CI, 56.3–96.6%) (Table 4). The four missed cases included two small ICHs and two focal small SAHs (Supplementary Figs. 5–8, Supplement 1). For patients imaged between 24 h and 7 days, the sensitivity of the metamodel with full post-processing was 71.4% (95% CI, 47.8–95.1%). The missed hemorrhage cases included one ICH and three focal SAHs (Supplementary Figs. 9–12, Supplement 1). The sensitivity for hemorrhages that were imaged after 7 days from symptom onset or had unclear symptom onset time was 55.6% (95% CI, 23.1–88.0%). The missed hemorrhage cases included three ICHs and one focal SAH (Supplementary Figs. 13–16, Supplement 1). The sensitivity of the metamodel with full post-processing for scans in different symptom-onset time points is represented in Fig. 4.

Cases missed in on-call reports but identified by solution

The metamodel with full post-processing identified five spontaneous intracranial hemorrhages that were not reported in the initial on-call reports. Of these five cases, one was ICH, two were SAHs, and one was IVH. All identified hemorrhages are presented in Supplementary Figs. 17–21 (Supplement 1).

Discussion

The developed DL solution detected spontaneous intracranial hemorrhages with 100% sensitivity on head CT scans imaged within 12 h from symptom onset, while maintaining a low processing time per scan. With full post-processing, the solution falsely detected an intracranial hemorrhage in approximately 1 out of every 10 true negative head NCCTs. Most of these false positive findings were present only in a few slices and in the same anatomical locations. Since the DL solution produces an easily visualized segmentation map, these small false positive pixel clusters could be relatively easily assessed as false positive findings by on-call radiologists or clinicians. Structures containing high vascularity or low pressure (slow flowing) blood, such as the choroid plexuses and intracranial sinuses, makes achieving a zero false positive rate a challenging task if a high sensitivity is a priority.

One of the DL solution’s base models has already been validated previously in both internal and external settings¹⁴. Despite the model had high sensitivity for detecting SAH, there was a relatively high number of false positive findings. Due to the high false positive rate, model‘s standalone clinical usability was considered suboptimal. Therefore, we developed this new solution that combines the outputs of high-sensitivity but low-specificity base U-Net models using a metamodel and post-processing steps. The lowest case-level base U-Net false positive rate was 61.3% which was reduced to 10.5% by applying the trained metamodel and rule-based post-processing steps. This translates to an improvement of over 80% at the case-level false positive rate.

The training material used in this study included only 300 head NCCTs. The training data was segmented by HU threshold-based method. In comparison to often used consensus-based approaches in image annotation, our approach requires less images and human resources. The development of deep learning models commonly relies on large amounts of training data, leading to a large amount of work and costs related to data preprocessing and model training. Thus, the study also suggests that the proposed segmentation method, ensemble-learning approach and development of rule-based post-processing pipeline can be used for training and optimizing clinically potential deep learning systems with limited amounts of training data. Additionally, the proposed approach enabled us to combine existing deep learning models into a unified framework, which in turn can reduce the total development time and also the need for extensive retraining with task-specific data.

Our study has also acknowledged limitations. First, the final number of true hemorrhage cases in the validation dataset can be considered low. However, the dataset represents a true consecutive patient cohort imaged during a 3-month long period in 10 different hospitals with a catchment area of around 1.7 million inhabitants. Second, the validation dataset was collected from 10 different hospitals, but the validation setting was still internal. Therefore, the solution’s results cannot be generalized outside the study country. Third, we did not have a possibility to assess our solution’s usefulness in the clinical workflow, as the solution is not an officially approved medical device. Fourth, using on-call radiologist reports as a ground truth might be considered a shortcoming, as it is recognized that some degree of error is inherent in the interpretation of medical imaging studies^20,21. However, since our aim was to assess if the model could improve real-world diagnostics, using the reports as ground truth was necessary. By using this approach, we were able to evaluate whether the system could identify hemorrhage cases that have been missed in a real-life situation. Fifth, since the sensitivity diminishes by the delay from symptom onset to imaging, the model suits best for acute (12 h) cases. The training data was segmented using threshold-based on HU values, which represented acute blood in ICHs, IVHs, and SAHs. This segmentation method is likely to affect the DL solution’s performance when used for detecting subacute and chronic hemorrhages. This may be considered as a shortcoming, even though the DL solution was designed for emergency use and for acute cases only.

The sensitivity of acute intracranial hemorrhage detection of our solution is essentially similar to the reported performance metrics of commercially available and clinically used solutions^22,23,24,25. The performance of the proposed DL solution is also comparable to other algorithms presented in the literature^26,27. However, reliable comparisons between different solutions are difficult to conduct due to the lack of standardized comparison protocols and datasets.

Although emergency head NCCTs interpreted by radiologists have a high sensitivity for acute blood^6,7, misidentification of acute intracranial hemorrhage still occurs in clinical settings. Our solution successfully detected five cases of acute spontaneous intracranial hemorrhages that were not reported in on-call reports, highlighting the potential impact the solution could have in diminishing misdiagnoses of intracranial hemorrhages in clinical practice. Due to relatively low computational costs of the 2-dimensional U-Net architecture, the solution could also be run in a local set up without requiring costly high-end graphics processing units or central processing units.

In conclusion, the developed DL solution with the post-processing pipeline had high sensitivity of detecting spontaneous intracranial hemorrhages in the acute period. This kind of solution could help to rule out spontaneous intracranial hemorrhages in a clinical setting. Compared to similar applications, to our knowledge, the presented solution is the first primarily intended to exclude acute spontaneous hemorrhages. This way, the solution does not burden or interfere with diagnostics in most cases. Even though the solution is not an officially approved medical device, it could be evaluated for quality assessments and research purposes.

Data availability

Finnish healthcare data for secondary use can be obtained through FINDATA (Social and Health Data Permit Authority according to the Secondary Data Act). The used healthcare data cannot be shared openly. The code for training the deep learning models and running the inference tasks are available at: https://github.com/Juusotak/Intracranial_hemorrhage_metalearning.

References

Krishnamurthi, R. et al. (ed, V.) Global, regional and Country-Specific burden of ischaemic stroke, intracerebral haemorrhage and subarachnoid haemorrhage: A systematic analysis of the global burden of disease study 2017. Neuroepidemiology 54 171–179 (2020).
Article PubMed Google Scholar
van Asch, J. Incidence, case fatality, and functional outcome of intracerebral haemorrhage over time, according to age, sex, and ethnic origin: a systematic review and meta-analysis. Lancet Neurol. 167 https://doi.org/10.1016/S1474 (2010).
Roquer, J. et al. Short- and long-term outcome of patients with aneurysmal subarachnoid hemorrhage. Neurology 95, e1819–e1829 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hostettler, I. C., Seiffge, D. J. & Werring, D. J. Intracerebral hemorrhage: an update on diagnosis and treatment. Expert Rev. Neurother. 19, 679–694 (2019).
Article CAS PubMed Google Scholar
Steiner, T. et al. European stroke organization guidelines for the management of intracranial aneurysms and subarachnoid haemorrhage. Cerebrovasc. Dis. 35, 93–112 (2013).
Article PubMed Google Scholar
Perry, J. J. et al. Sensitivity of computed tomography performed within six hours of onset of headache for diagnosis of subarachnoid haemorrhage: prospective cohort study. BMJ 343 (2011).
Vincent, A. et al. Sensitivity of modern multislice CT for subarachnoid haemorrhage at incremental timepoints after headache onset: a 10-year analysis. Emerg. Med. J. Emermed. https://doi.org/10.1136/emermed-2020-211068 (2021).
Article Google Scholar
Huang, C. C. et al. Utilization of CT and MRI scanning in taiwan, 2000–2017. Insights Imaging 14, (2023).
Christiansen, N. M. et al. Utilisation and time to performance of diagnostic imaging in patients admitted to Danish emergency departments: a nationwide register-based study from 2007 to 2017. BMJ Open 13, (2023).
Smith-Bindman, R. et al. Trends in use of medical imaging in US health care systems and in ontario, canada, 2000–2016. JAMA 322, 843–856 (2019).
Article PubMed PubMed Central Google Scholar
Hanna, T. N. et al. The effects of fatigue from overnight shifts on radiology search patterns and diagnostic performance. J. Am. Coll. Radiol. 15, 1709–1716 (2018).
Article PubMed PubMed Central Google Scholar
Stec, N., Arje, D., Moody, A. R., Krupinski, E. A. & Tyrrell, P. N. A systematic review of fatigue in radiology: is it a problem? Am. J. Roentgenol. 210, 799–806 (2018).
Article Google Scholar
Mäenpää, S. M. & Korja, M. Diagnostic test accuracy of externally validated convolutional neural network (CNN) artificial intelligence (AI) models for emergency head CT scans – A systematic review. Int. J. Med. Inform. https://doi.org/10.1016/j.ijmedinf.2024.105523 (2024).
Thanellas, A. et al. Development and external validation of a deep learning algorithm to identify and localize subarachnoid hemorrhage on CT scans. Neurology https://doi.org/10.1212/WNL.0000000000201710 (2023).
Article PubMed PubMed Central Google Scholar
Dubosh, N. M., Bellolio, M. F., Rabinstein, A. A. & Edlow, J. A. Sensitivity of early brain computed tomography to exclude aneurysmal subarachnoid hemorrhage: A systematic review and Meta-Analysis. Stroke 47, 750–755 (2016).
Article PubMed Google Scholar
Mongan, J., Moy, L. & Kahn, C. E. Checklist for artificial intelligence in medical imaging (CLAIM): A guide for authors and reviewers. Radiol. Artif. Intell. 2, e200029 (2020).
Article PubMed PubMed Central Google Scholar
Bossuyt, P. M. et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 351 (2015).
Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 (eds Navab, N. et al.) 234–241 (Springer International Publishing, 2015).
Google Scholar
Fedorov, A. et al. 3D slicer as an image computing platform for the quantitative imaging network. Magn. Reson. Imaging. 30, 1323–1341 (2012).
Article PubMed PubMed Central Google Scholar
Strub, W. M., Leach, J. L., Tomsick, T. & Vagal, A. Overnight preliminary head CT interpretations provided by residents: locations of misidentified intracranial hemorrhage. Am. J. Neuroradiol. 28, 1679–1682 (2007).
Article CAS PubMed PubMed Central Google Scholar
Erly, W. K., Berger, W. G., Krupinski, E., Seeger, J. F. & Guisto, J. A. Radiology resident evaluation of head CT scan orders in the emergency department. AJNR Am. J. Neuroradiol. 23, 103 (2002).
PubMed Central Google Scholar
Ginat, D. T. Analysis of head CT scans flagged by deep learning software for acute intracranial hemorrhage. Neuroradiology 62, 335–340 (2020).
Article Google Scholar
Matsoukas, S. et al. Pilot deployment of Viz-Intracranial hemorrhage for intracranial hemorrhage detection: Real-World performance in a stroke code cohort. Stroke 53, E418–E419 (2022).
Article PubMed Google Scholar
Heit, J. J. et al. Automated cerebral hemorrhage detection using RAPID. AJNR Am. J. Neuroradiol. 42, 273–278 (2021).
Article CAS PubMed Central Google Scholar
Voter, A. F., Meram, E., Garrett, J. W. & Yu, J. P. J. Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of intracranial hemorrhage. J. Am. Coll. Radiol. 18, 1143 (2021).
Article PubMed PubMed Central Google Scholar
Maghami, M. et al. Diagnostic test accuracy of machine learning algorithms for the detection intracranial hemorrhage: a systematic review and meta-analysis study. Biomed. Eng. Online. 22, 1–23 (2023).
Article Google Scholar
Hu, P. et al. Deep learning-assisted detection and segmentation of intracranial hemorrhage in Noncontrast computed tomography scans of acute stroke patients: a systematic review and meta-analysis. Int. J. Surg. 110, 3839–3847 (2024).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work is part of the AI Head Analysis project of the CleverHealth Network ecosystem (https://www.cleverhealth.fi/en/home), and we thank the ecosystem partners for supporting the project. JT received a research grant from Maire Taposen foundation. The study project was supported by a grant from the State Research Funds (Helsinki University Hospital).

Author information

Authors and Affiliations

Department of Neurosurgery, University of Helsinki and Helsinki University Hospital, P.O. Box 266, Helsinki, 00029, Finland
Juuso Takala, Heikki Peura, Riku Pirinen, Rahul Raj & Miikka Korja
Diagnostic Center, Helsinki University Hospital, P.O. Box 266, Helsinki, 00029, Finland
Katri Väätäinen & Sergei Terjajev
Planmeca, Asentajankatu 6, Helsinki, 00880, Finland
Ziyuan Lin

Authors

Juuso Takala
View author publications
Search author on:PubMed Google Scholar
Heikki Peura
View author publications
Search author on:PubMed Google Scholar
Riku Pirinen
View author publications
Search author on:PubMed Google Scholar
Katri Väätäinen
View author publications
Search author on:PubMed Google Scholar
Sergei Terjajev
View author publications
Search author on:PubMed Google Scholar
Ziyuan Lin
View author publications
Search author on:PubMed Google Scholar
Rahul Raj
View author publications
Search author on:PubMed Google Scholar
Miikka Korja
View author publications
Search author on:PubMed Google Scholar

Contributions

J.T., H.P. and Z.L. developed the deep learning solution (including base model and meta-model training, and development of the post-processing pipeline). H.P., S.T., R.R. and M.K. contributed to the acquisition and curation of the training data. J.T., R.P., K.V. and M.K. contributed to the acquisition and curation of the validation dataset. J.T. and M.K. analyzed the results and wrote the main manuscript text. J.T. prepared all figures and tables. M.K. and R.R. contributed to the administration of the research project. All authors reviewed and edited the manuscript as needed and approved its final version.

Corresponding author

Correspondence to Juuso Takala.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Takala, J., Peura, H., Pirinen, R. et al. High sensitivity in spontaneous intracranial hemorrhage detection from emergency head CT scans using ensemble-learning approach. Sci Rep 15, 29919 (2025). https://doi.org/10.1038/s41598-025-15835-7

Download citation

Received: 18 January 2025
Accepted: 11 August 2025
Published: 15 August 2025
DOI: https://doi.org/10.1038/s41598-025-15835-7