Introduction

Stroke is a global health threat. The 2019 Global Burden of Diseases Study (GBD)1reported that stroke was the second-leading cause of death and the third-leading cause of death and disability combined worldwide. In 2019, 12.2 million individuals suffered from stroke, 6.55 million deaths occurred, and the global stroke burden increased substantially from 1990 to 2019. Acute ischaemic stroke (AIS) is defined as sudden neurologic dysfunction caused by focal cerebral ischaemia for more than 24 h or proof of acute infarction on brain imaging, regardless of the duration of symptoms2. AIS is the primary type of stroke, accounting for approximately 80% of strokes, and is caused by interruption of cerebral blood flow due to arterial occlusion3,4. Numerous clinical trials have demonstrated the significant clinical benefits of endovascular therapy in AIS patients within 6–24 h of stroke onset5,6,7,8,9. Thus, early identification of AIS is extremely significant for guiding early clinical treatment and improving patient outcomes, as ‘time is brain.’

Noncontrast computed tomography CT (NCCT) is the first-line imaging modality that is used to evaluate patients with suspected acute stroke. Unfortunately, although NCCT has good diagnostic performance for acute intracranial haemorrhage, it appears to be insufficiently sensitive to ischaemic stroke, especially AIS3,10,11,12. NCCT AIS findings typically include negative or nonspecific low-density changes, complicating radiologists’ diagnoses. Magnetic resonance imaging (MRI) has excellent advantages in the early detection of AIS, especially diffusion-weighted imaging (DWI) scans. Nevertheless, MRI is less readily available, expensive, difficult for patients, and has longer time costs, which poses a great challenge for most emergency centres. For this high-mortality disease that requires time-sensitive treatment, a technical method that can precisely detect early AIS lesion changes on NCCT must be developed.

In general, quantitatively assessing the brain regions involved in AIS is difficult with NCCT because the variations in density and texture are too subtle to be visually discernible. Radiomics involves analysing and converting medical images into quantitative data and is promising for developing image-driven biomarkers to aid clinical decisions13. Machine learning (ML), a branch of artificial intelligence (AI), has been widely used in neuroscience, including for brain tumours, epilepsy, neurodegenerative diseases, and demyelinating diseases14,15,16,17,18. A recent work by Lisowska et al.19explored the use of context-aware convolutional neural networks for stroke detection. Hu et al.20 evaluated the efficiency of deep learning-based CT perfusion imaging in thrombolytic therapy for acute cerebral infarction with an unknown onset time, demonstrating that the diagnosis effects and image quality were significantly higher in the AI group than in the control group.

Radiomics allows in-depth characterization of phenotypes with distinct lesions, yielding novel predictive indicators. We aimed to develop and validate an NCCT-based radiomics imaging biomarker combined with ML model for detecting early microscopic changes in AIS patients.

Materials and methods

Study design and patient enrolment

In this multicentre study, 1122 eligible patients were retrospectively enrolled from eight cohorts in China. Patients’s details, inclusion and exclusion criteria is shown in Appendix S1. For consistency, the NCCT images of all eligible patients were rereviewed by a radiologist with seven years of experience and a radiologist with twenty years of experience. The patients (n = 592) in cohorts 1 to 5 were combined to generate the main dataset for model fitting, training and parameter tuning, and the remainder of the cohorts were used for independent validation. The baseline characteristics were collected from medical records. The study was approved and the requirement for informed patient consent was waived by the ethical committee of Shandong Provincial Hospital Affiliated to Shandong First Medical University. The study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Figure 1 is the workflow. The main dataset included 592 patients from cohorts 1 to 5 and was used for radiomics feature selection, signature building, model fitting, training and parameter tuning. Among the selected patients, 397 (67.1%) were male, and 195 (32.9%) were female. The average age was 62.9 years, with the patient ages ranging from 29 to 96 years. A total of 391 (66.0%) patients had hypertension, 155 (26.2%) had dyslipidaemia, 187 (31.6%) had diabetes, 160 (27.0%) had coronary artery disease (CAD), 247 (41.7%) smoked, and 227 (38.3%) drank. The detailed demographic characteristics of all cohorts are shown in Table 1.

Fig. 1
Fig. 1
Full size image

The workflow of this study. Including1collecting the magnetic resonance (MR), non-contrast computed tomography (NCCT) images of acute ischemic stroke patients2; image preprocessing of head alignment, registration, and delineation3; radiomics features extraction and selection4; machine learning (ML) classifier construction5; evaluating model performance.

Table 1 Baseline characteristics of AIS patients on all cohorts.

Delineation and feature extraction

Details of pre-processing is shown in Appendix S2. For all cohorts, we collected NCCT and DWI images for each participant. Two junior radiologists delineated the volume of interest (VOI) of the AIS lesions by using ITK-SNAP software (Version 3.8.0) on the DWI images. The VOI covered regions with high signals to delineate lesions, and one senior radiologist with thirty years of experience reviewed the VOI. As a control, we also sketched a VOI with no abnormal signal area on the contralateral side of the DWI image.

Quantitative radiomics data were extracted from well-registered NCCT images by mapping the VOIs using the PyRadiomics tool package (version 3.0.1). The filters included original, Laplacian of Gaussian (LoG) with various sigma levels (1.0, 2.0, 3.0, 4.0, and 5.0), wavelet, square, square root, logarithm, exponential and gradient. Four classes of grey-level matrices were calculated in three dimensions: the grey-level cooccurrence matrix (GLCM), grey-level run-length matrix (GLRLM), grey-level size zone matrix (GLSZM) and grey-level dependence matrix (GLDM). Each VOI generated 1634 radiomics features.

Feature selection and classifier building

We randomly divided the main dataset into a training set (n = 415) and an internal validation set (n= 177) at a ratio of 7:3. For feature selection, we analyzed and eliminated predictors with zero variance using ANOVA. Then, predictors with multicollinearity were excluded via QR decomposition. Next, the values of the screened features were normalized using the z-score method. The least absolute shrinkage and selection operator (LASSO) regression model is known for its sparsity and anti-over-fitting and is commonly used as a variable selection method in biomedicine21,22. In our study, the LASSO model was used to select the most predictive features of microscopic AIS in the training dataset. Eventually, these valuable features were generated with the LASSO regression model via ten-fold cross-validation, with one standard error of the minimum penalty coefficient lambda (λ) as the index.

For the ML model, five classifiers were used for prediction in this study: random forest (RF), support vector machine with the linear kernel (Linear-SVM), support vector machine with radial basis function kernel (RBF-SVM), logistic regression (LR) and multilayer perceptron (MLP). The classifiers were fitted and trained on the training set according to the optimal features, and the best parameters were tuned on the internal validation set.

After training, three independent validation cohorts were used to evaluate the performance of the classifiers. The construction of MLP is shown in Appendix S3. The LASSO plot, visualization of MLP training process is shown in Appendix Figure S1.

Statistical analysis

The t test or Mann‒Whitney U tests were used to evaluate the numerical variables. We used the Delong method to determine the area under the receiver operator characteristic (AUROC) and its confidence interval, the area under the precision-recall curve (AUPRC), calibration curve, decision curve analysis (DCA), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV) and F1-score to assess whether the radiomics features could be used to divide patients into non-AIS and microscopic AIS groups based on NCCT images. The interpretable algorithm SHapley Additive exPlanations (SHAP) was used to calculate the contribution of individual radiomics feature to the ML model predictions.

A p value of < 0.05 was defined as significant in the two-tailed analysis. Statistical analyses were performed with R (version 4.2.1), and MLP model construction was performed with the deep learning framework PyTorch (version 1.11.0) based on Python (version 3.8.0).

Results

Optimal radiomics feature

Initially, 1538 of the 1634 features remained after ANOVA was performed. After the multicollinearity test, 830 features with lower linear correlations were reserved. Ultimately, 44 optimal features were selected according to the LASSO regression model to develop the radiomics signature (Table 2). Of these features, 8 were first-order features, and 36 were higher-order features incorporating 11 GLCM, 12 GLSZM, 3 GLDM, and 10 GLRLM features. Five features originated from the original filter, 20 features were derived from the LoG filters, and 19 features were determined from the wavelet filters.

Table 2 The optimal radiomics features selected by the LASSO model.

Predictive performance validated by independent external cohorts

We separately validated the diagnostic performance of each classifier on three independent cohorts. In cohort 6, the RBF-SVM classifier achieved an AUROC of 0.846 (95% CI 0.808 to 0.884), and an AUPRC of 0.854 (95% CI 0.798 to 0.896). The linear SVM model achieved an AUROC of 0.838 (95% CI 0.798 to 0.878), and an AUPRC of 0.856 (95% CI 0.801 to 0.898). The LR model achieved an AUROC of 0.831 (95% CI 0.790 to 0.872), and an AUPRC of 0.840 (95% CI 0.783 to 0.884). The RF model achieved an AUROC of 0.832 (95% CI 0.791 to 0.872), and an AUPRC of 0.846 (95% CI 0.789 to 0.889). The MLP model achieved an AUROC of 0.838 (95% CI 0.799 to 0.878) (Fig. 2a), and an AUPRC of 0.827 (95% CI 0.769 to 0.873) (Fig. 2d).

Fig. 2
Fig. 2
Full size image

The evaluation of area under the receiver operating characteristic curve (AUROC) for different ML models on cohorts 6–8 (a-c). The evaluation of area under the precision-recall curve (AUPRC) for different ML models on cohorts 6–8 (d-f).

In cohort 7, the RBF-SVM model achieved an AUROC of 0.805 (95% CI 0.722 to 0.889), and an AUPRC of 0.804 (95% CI 0.675 to 0.890). The linear SVM model achieved an AUROC of 0.775 (95% CI 0.687 to 0.864), and an AUPRC of 0.767 (95% CI 0.636 to 0.862). The LR model achieved an AUROC of 0.775 (95% CI 0.687 to 0.864), and an AUPRC of 0.766 (95% CI 0.634 to 0.861). The RF model achieved an AUROC of 0.818 (95% CI 0.737 to 0.898), and an AUPRC of 0.802 (95% CI 0.674 to 0.889). The MLP model achieved an AUROC of 0.806 (95% CI 0.722 to 0.890) (Fig. 2b), and an AUPRC of 0.819 (95% CI 0.692 to 0.901) (Fig. 2e).

In cohort 8, the RBF-SVM model achieved an AUROC of 0.754 (95% CI 0.714 to 0.795), and an AUPRC of 0.733 (95% CI 0.677 to 0.782). The linear SVM model achieved an AUROC of 0.763 (95% CI 0.723 to 0.803), and an AUPRC of 0.732 (95% CI 0.676 to 0.781). The LR model achieved an AUROC of 0.754 (95% CI 0.713 to 0.795), and an AUPRC of 0.730 (95% CI 0.674 to 0.779). The RF model achieved an AUROC of 0.774 (95% CI 0.735 to 0.813), and an AUPRC of 0.770 (95% CI 0.716 to 0.816). The MLP model achieved an AUROC of 0.731 (95% CI 0.689 to 0.773) (Fig. 2c), and an AUPRC of 0.693 (95% CI 0.636 to 0.745) (Fig. 2f).

The performance of the different models on the training and internal validation sets is shown in Table S1, and the efficacy of different classifiers on three independent validation cohorts is shown in Table 3. A comparison of the AUROC, AUPRC, calibration curve, and DCA of the ML models on the training and internal validation sets is shown in Appendix Figure S2. The calibration curves showed good agreement between the predicted and observed values for the different models in independent cohorts 6–8 (Fig. 3a-c). The DCA results demonstrated the good clinical performance of the models (Fig. 3d-f).

Table 3 The performance of the different classifiers on three independent validation cohorts.
Fig. 3
Fig. 3
Full size image

The calibration curve of evaluating the accordance between predictive values and observers for different ML models on cohorts 6–8 (a-c). The decision curve analysis (DCA) for different ML models on cohorts 6–8 (d-f).

Top features ranked by coefficients

For the radiomics signature, we analyzed the features of the top four ranked coefficients, including the LoG (σ = 2 mm) GLCM inverse difference moment normalized (Idmn) feature (coefficient: 0.205), LoG (σ = 1 mm) GLCM Idmn feature (coefficient: 0.193), LoG (σ = 1 mm) GLCM maximum probability feature (coefficient: 0.151), and wavelet (LHL) GLSZM zone entropy feature (coefficient: 0.137) on each independent cohort. We discovered that for these four features, the values of the patients in the microscopic AIS group were generally higher than those of the patients in the non-AIS group. For the LoG (σ = 2 mm) GLCM Idmn feature, p < 0.001 in cohorts 6 and 7 and p = 0.002 in cohort 8. For the LoG (σ = 1 mm) GLCM Idmn feature, p < 0.001 in cohorts 6 and 8 and p = 0.013 in cohort 7. For the LoG (σ = 1 mm) GLCM maximum probability feature, p < 0.001 in cohorts 6 and 8 and p = 0.027 in cohort 7. For wavelet (LHL) GLSZM zone entropy feature, p < 0.001 in cohort 6 and p = 0.033 in cohort 8; however, there were no significant difference in cohort 7, with a p of 0.102 (Fig. 4). A boxplot analysis of the top four features on the training and validation sets is shown in Appendix Figure S3. Moreover, we visualized the top two features with heatmaps. In the No. 1 feature heatmap, the high-signal lesion is red or even dark red, indicating high heterogeneity. The No. 2 feature heatmap displayed mixed heterogeneity in lesions with mixed signals (Fig. 5).

Fig. 4
Fig. 4
Full size image

The boxplot analysis of the top four radiomics features in cohorts 6 (a-d), 7 (e-h), and 8 (i-l).

Fig. 5
Fig. 5
Full size image

Male, 70 years old. The patient was admitted due to slurred speech and inability to move the right limb. CT examination showed non-specific density changes, which made the diagnosis difficult. Immediately followed by MRI, the left side of the brain showed a high signal on DWI. The LoG (σ = 2 mm) GLCM Idmn feature map showed high lesion heterogeneity, and the LoG (σ = 1 mm) GLCM Idmn feature map showed mixed lesion heterogeneity (a). Male, 70 years old. The patient was admitted because of dizziness, nausea, and unsteady walking. CT examination showed non-specific density changes, followed by MRI examination, and DWI showed hyperintensity. The LoG (σ = 2 mm) GLCM Idmn feature map showed high lesion heterogeneity, and the LoG (σ = 1 mm) GLCM Idmn feature map showed mixed lesion heterogeneity (b).

Subgroup analysis and feature contribution calculated by SHAP

We conducted subgroup analyses based on age, gender, and comorbidities (hypertension, dyslipidemia, diabetes, smoking, alcohol consumption, and CAD) to examine differences in predictive performance. Validation using three external independent cohorts (cohorts 6–8) confirmed the model’s robust performance (Figure S4).

For the gender subgroup, the ML model achieved a mean AUROC of 0.829 (range: 0.784–0.831) for males and 0.793 (range: 0.731–0.884) for females. In the age subgroup, participants were divided into two groups based on a threshold of 50 years. The ML model achieved a mean AUROC of 0.808 (range: 0.768–0.842) for those over 50 years and 0.860 (range: 0.813–0.878) for those under 50 years. For the comorbidities subgroup, the ML model achieved mean AUROC of 0.813 (range: 0.772–0.852) for those with hypertension and 0.832 (range: 0.783–0.858) for those without, 0.810 (range: 0.785–0.850) for those with dyslipidemia and 0.818 (range: 0.775–0.847) for those without, 0.833 (range: 0.777–0.862) for those with diabetes and 0.809 (range: 0.777–0.842) for those without, 0.843 (range: 0.810–0.906) for smokers and 0.778 (range: 0.699–0.876) for non-smokers, 0.834 (range: 0.804–0.859) for drinkers and 0.801 (range: 0.761–0.856) for non-drinkers and 0.781 (range: 0.660–0.860) for those with CAD and 0.817 (range: 0.769–0.848) for those without.

The SHAP analysis identified the top three most important features for the model: the original first-order 90Percentile (importance: 0.59), the first-order InterquartileRange from the Laplacian of Gaussian (LoG, σ = 2 mm) transformation (importance: 0.46), and the wavelet-transformed (LLH) first-order InterquartileRange (importance: 0.34) (Figure S5a). The beeswarm plot revealed that higher values of 90Percentile were associated with a lower risk of AIS, whereas higher values of InterquartileRange were linked to a greater likelihood of AIS (Figure S5b).

Discussion

The prognosis of individuals with AIS varies according to the time of diagnosis and intervention, demonstrating the importance of early accurate diagnoses. Considering the scan time and patient cooperation during the scan process, MRI is a less optimal imaging modality than CT. Furthermore, for some emergency centres and low-income countries, MRI is inconvenient and expensive. Specifically, for the Chinese patients of acute cerebral vascular disease, CT is more common than MRI for screening. However, CT also have the flaw of being insensitive to AIS lesions with an onset time of less than 24 h, with an overall sensitivity of 57–71%23,24. Diseases that can mimic AIS include viral encephalitis, tumors, and white matter lesions. Viral encephalitis is often characterized by a history of infection and alterations in immune cells. Brain tumors typically exhibit mass effects, significant edema, and marked enhancement on contrast-enhanced CT imaging. Cerebral white matter lesions commonly appear as bilateral diffuse hypodense areas within the cerebral hemispheric white matter and are frequently linked to chronic conditions such as long-standing hypertension and hyperlipidemia. Here, we developed and validated an NCCT-based radiomics approach and evaluated its diagnostic value for AIS. The results indicated that the radiomics signature model could distinguish non-AIS and microscopic AIS patients.

The most common mechanism of AIS is embolism. Abnormal cerebral blood flow caused by embolism can lead to persistent brain tissue damage. The progression from reversible injury to irreversible necrosis depends on the magnitude and duration of the reduced blood flow. Ischaemia may occur within minutes in the core of the lesion and develop to the peripheral area. It is estimated that 1.9 million neurons are lost during each minute of ischaemia25. Hence, rapid and accurate diagnosis is crucial for developing urgent treatments to restore blood flow and save neurons.

Current guidelines recommend brain imaging within 30 min of patient presentation to facilitate rapid decision-making about thrombolytic therapy26,27,28. NCCT is regarded as the most important diagnostic method for differentiating ischaemic stroke from intracerebral haemorrhage. Intracerebral haemorrhage, which appears as a high-density region on CT images, is considered a contraindication of intravenous thrombolytic therapy. Early imaging signs of acute stroke are associated with cellular hypoperfusion and cytotoxic oedema sequelae. The signs of early infarction in CT images include hyperdense artery, decreased grey matter density, cerebral tissue swelling, and sulcal effacement29,30. However, typical CT signs are rare and present in less than 50% of AIS patients24. The resolution and sensitivity of NCCT are too low to detect prophase changes in ischaemic stroke, especially for identifying infarcts in the posterior cranial fossa and deep brain tissue.

In this study, textural features had the largest contribution to the performance of the ML model for distinguishing microscopic AIS and non-AIS, accounting for ~ 82% (36/44) of the features. Decreased resampling allows more elaborate textural information to be obtained from the images; thus, we resampled the NCCT images to 0.5 × 0.5 × 0.5 mm3. Radiomics features can reflect subtle benign or malignant changes in medical images and reveal regularities that are invisible to radiologists. For example, Idmn indicates local homogeneity in an image. Mabrouk et al. reported that the Idmn values in malignant skin cancer were higher than those in nevi. In our study, the values of the top four features were generally higher in the microscopic AIS group than in the non-AIS group. For the LoG (σ = 2 mm) GLCM Idmn feature, in cohort 6, the median was 0.986 (IQR, 0.982–0.995) in the microscopic AIS group vs. 0.982 (IQR, 0.981–0.984) in the non-AIS group. In cohort 7, the median was 0.985 (IQR, 0.982–0.990) in the microscopic AIS group vs. 0.982 (IQR, 0.981–0.984) in the non-AIS group. In cohort 8, the median was 0.984 (IQR, 0.980–0.990) in the microscopic AIS group vs. 0.982 (IQR, 0.979–0.986) in the non-AIS group. For the LoG (σ = 1 mm) GLCM Idmn feature, in cohort 6, the median was 0.965 (IQR, 0.961–0.983) in the microscopic AIS group vs. 0.961 (IQR, 0.959–0.963) in the non-AIS group. In cohort 7, the median was 0.964 (IQR, 0.961–0.968) in the microscopic AIS group vs. 0.962 (IQR, 0.959–0.965) in the non-AIS group. In cohort 8, the median was 0.966 (IQR, 0.962–0.971) in the microscopic AIS group vs. 0.963 (IQR, 0.960–0.967) in the non-AIS group. For the LoG (σ = 1 mm) GLCM maximum probability feature, in cohort 6, the median was 0.460 (IQR, 0.420–0.502) in the microscopic AIS group vs. 0.425 (IQR, 0.412–0.448) in the non-AIS group. In cohort 7, the median was 0.470 (IQR, 0.434–0.498) in the microscopic AIS group vs. 0.446 (IQR, 0.427–0.470) in the non-AIS group. In cohort 8, the median was 0.483 (IQR, 0.442–0.543) in the microscopic AIS group vs. 0.458 (IQR, 0.430–0.509) in the non-AIS group. The zone entropy feature measures the uncertainty in the distribution of the zone size and grey levels, with higher values indicating more heterogeneity in the texture patterns. For the wavelet (LHL) GLSZM zone entropy feature, in cohort 6, the median was 3.000 (IQR, 2.585–3.462) in the microscopic AIS group vs. 2.750 (IQR, 2.412–3.027) in the non-AIS group. In cohort 7, the median was 2.922 (IQR, 2.322–3.325) in the microscopic AIS group vs. 2.725 (IQR, 2.322–3.000) in the non-AIS group. In cohort 8, the median was 2.750 (IQR, 2.322–3.170) in the microscopic AIS group vs. 2.585 (IQR, 2.322–3.000) in the non-AIS group.

Different ML models have various advantages in distinct tasks, and one model cannot perform all tasks perfectly. We used five types of classifiers and evaluated their performance on this task. According to the results of this work, the models performed well for distinguishing non-AIS and microscopic AIS lesions. Notably, the nonlinear models outperformed the linear models. The mean AUROCs of the RF, RBF-SVM and MLP models were 0.808 (95% CI 0.754 to 0.861), 0.802 (95% CI 0.748 to 0.856), and 0.792 (95% CI 0.737 to 0.847), respectively, in the three independent cohorts. The reason for this result may be that real-world problems are usually nonlinear, so linear models cannot accurately distinguish data with nonlinear distributions. The linear models performed slightly worse than the nonlinear models, with mean AUROCs of 0.792 (95% CI 0.736 to 0.848) for the linear-SVM model and 0.787 (95% CI 0.730 to 0.844) for the LR model. This study has several limitations. Firstly, although it includes data from eight independent centres encompassing 1,122 individuals, its retrospective nature may introduce biases in population selection, feature extraction, and model training. Further exploration is needed to incorporate more comprehensive feature parameters and optimize model selection. Importantly, validation using prospective cohorts is essential to confirm the findings of this study. Secondly, this study did not differentiate between sites of occlusion, as the study population included patients with both anterior and posterior circulatory occlusions. This limitation may impede a deeper understanding of the imaging mechanisms and potential variations in imaging biomarkers associated with different pathophysiological types. To address this, future studies will analyze patients with anterior and posterior circulation occlusions separately to provide greater insight into how these distinct pathophysiological mechanisms influence the model’s performance.

In conclusion, our study demonstrates that combining radiomics with machine learning models can be an efficient, noninvasive, economical, and reliable technique for evaluating early microscopic AIS based on NCCT. Although we used three independent cohorts to validate the results, a larger prospective cohort is still needed. Some challenges, such as the sophistication of the registration methods and the determination of the target volume, should be addressed in further prospective studies. The conception of this method applying in the future can be an automatic segment tool to delineate VOI on CT scan according to the brain’s structural area or functional area, and extract radiomics features to input our model resulting in predictive results. Or, a sliding convolutional kernel (e.g. 3 × 3 × 3 or 5 × 5 × 5) on the brain zone to extract radiomics features sequentially, and input our model to give predictive reference. The generated radiomics heatmap can visualize the heterogeneity zone intuitionally, this strategy can be used to diagnose on CT scan without MRI. Our findings suggest the potential value of noninvasive biomarkers to aid in clinical decision-making.