Introduction

Currently, imaging is the most direct method for assessing thrombi, with computed tomography (CT) being a critical component of acute ischemic stroke (AIS) diagnostic and treatment guidelines1. CT imaging provides essential information such as thrombus location, size, density, permeability, and clot burden score (CBS), which can significantly influence treatment response and recanalization success rates2,3,4,5. However, among the thrombus characteristics currently used for clinical decision-making, only thrombus location has been widely adopted in routine practice1.

In recent years, an increasing number of studies have investigated the impact of thrombus imaging characteristics on the diagnosis, treatment, and prognosis of AIS. For instance, thrombus permeability and size can affect the success rate of intravenous thrombolysis (IVT) with alteplase6; thrombus density is associated with histological composition and etiology and may influence the likelihood of successful recanalization7; and thrombus length and volume could determine the choice of mechanical thrombectomy (EVT) devices and procedural strategies8. Despite the increasing recognition of the importance of these features, their integration into clinical practice remains challenging due to the issues of multicollinearity and the complex interrelationships among features. Furthermore, measuring and analyzing all parameters, particularly in the context of high correlations, may lack practical utility and could complicate clinical decision-making.

Unsupervised clustering analysis is a machine learning algorithm capable of classifying unlabeled data and identifying latent phenotypes9. By grouping highly correlated thrombus features, clustering analysis can help mitigate statistical issues caused by multicollinearity. Previous studies have suggested that specific thrombus phenotypes might influence the selection or predict the success of treatment modalities, such as IVT with alteplase or tenecteplase, as well as specific EVT techniques and devices10,11. However, systematic clustering analyses incorporating both imaging and clinical characteristics to explore distinct phenotypes of AIS patients and their prognostic implications are still lacking.

This study, based on a multicenter dataset, comprehensively analyzed the CT angiography (CTA) and clinical features of AIS patients undergoing EVT. The objectives were to elucidate the distribution and interrelationships of thrombus imaging features and to identify potential patient subtypes through unsupervised clustering and principal component analysis, with the ultimate goal of contributing to the development of personalized treatment strategies.

Methods

Patient selection

This retrospective, multicenter study included data from patients with AIS treated at three medical centers between September 2018 and September 2024. The inclusion criteria were: (1) AIS patients with anterior circulation occlusion; (2) age ≥ 18 years; (3) symptom onset within 24 h and undergoing EVT; (4) pre-treatment non-contrast computed tomography (NCCT) and CTA performed; (5) preoperative imaging confirming occlusion of the internal carotid artery (ICA) or the middle cerebral artery (MCA) (M1/M2). The exclusion criteria were: (1) intracranial hemorrhage on admission NCCT or a history of intracranial hemorrhage; (2) large chronic infarcts exceeding 1/3 of the vascular territory or a pre-stroke modified Rankin Scale (mRS) score ≥ 2; (3) poor image quality (Fig. 1).

Fig. 1
figure 1

Flow diagram of the inclusion of patients in the study.

We adhered to the 2019 Guidelines for the Early Management of Patients with Acute Ischemic Stroke1. Patients eligible for IVT received IVT prior to EVT, while other patients underwent EVT directly. This study followed the ethical requirements of the revised 2013 Declaration of Helsinki (www.wma.net/en/30publications/10policies/b3/index.html) and was approved by the Institutional Review Board [No. 2023-Y-189, 2021-122]. Given the retrospective nature of the study, informed consent was waived by the ethics review committees.

Data collection

Data collection included demographic information (e.g., sex, age), stroke risk factors (e.g., hypertension, diabetes), TOAST classification, time metrics (symptom onset to CT, symptom onset to EVT), and baseline NIHSS score (at admission and 24 h post-procedure). Post-EVT recanalization status was assessed using the Thrombolysis in Cerebral Infarction (TICI) scale, with successful recanalization defined as TICI 2b-312.

Clinical outcomes at 90 days post-stroke were evaluated using the mRS, with scores of 0–2 defined as favorable outcomes and scores of 3–6 as poor outcomes13. For deceased patients, outcome data were confirmed via telephone interviews with family members, while for surviving patients, mRS scores were obtained from outpatient follow-up or hospital readmission records.

Measurement of thrombus characteristics

We utilized the SimpleITK module in Python (compatible with Python 3.x) to perform image registration for NCCT and CTA scans. Thrombus occlusion was identified based on a combination of contrast-filling defects on baseline CTA and hyperdense signs on NCCT. Specific evaluation metrics included occlusion location, distance from the ICA terminus to the thrombus (DT), thrombus length, thrombus density, and CBS14,15,16. Assessment Methods:

  1. 1.

    Placement of Markers: Along the occluded artery, multiple markers were placed: one at the ICA terminus, two marking the proximal and distal boundaries of the thrombus, and three at the proximal, middle, and distal segments for CT value measurements.Markers were synchronized and placed on the registered NCCT and CTA images using ITK-Snap software (http://www.itksnap.org). Three-dimensional visualization in axial, sagittal, and coronal planes was employed to identify optimal measurement views.

  2. 2.

    DT: DT was defined as the total path length from the ICA terminus to the proximal thrombus marker. A DT value of zero was assigned if the thrombus originated at the ICA terminus.

  3. 3.

    Thrombus Length: Thrombus length was determined as the total path length between the proximal and distal thrombus markers.

  4. 4.

    Thrombus Density and Perviousness: Thrombus density was measured using a 1-mm diameter circular region of interest (ROI) at the proximal, middle, and distal thrombus segments on both NCCT and CTA images. For each modality, the CT attenuation values (in Hounsfield Units, HU) from the three locations were averaged to obtain the mean thrombus density. Thrombus perviousness was defined as the difference between the mean CTA and NCCT, calculated as: Perviousness = Mean CTA HU − Mean NCCT HU. Hyperdense vessel sign was defined as a segment of the vessel visibly hyperdense compared to normal vessel segments on NCCT.

  5. 5.

    CBS: CBS was calculated based on anterior circulation occlusions, with a maximum score of 10. Points were deducted according to occluded vascular segments.

All imaging data were independently assessed by two experienced radiologists with 10 and 5 years of expertise in neuroimaging diagnosis, respectively. Both radiologists were blinded to detailed clinical information for each case. To ensure measurement consistency, all evaluators underwent standardized training covering vascular segment definitions and thrombus feature measurement methods. The evaluation results for challenging cases (such as thrombus at vascular bifurcations, long thrombi, or cases without distal collateral vessel filling) with disagreements were assessed by a third reader with 15 years of neuroimaging diagnostic experience, who performed a blinded evaluation. Discrepancies were resolved by majority vote. Images with the following conditions were excluded: uncorrectable registration errors, imaging artifacts (e.g., beam hardening or metal artifacts), excessive noise, insufficient contrast, or incomplete CTA scan fields. Additionally, cases with bilateral thrombi, thrombus lengths < 2 mm, thrombi located near the skull and affected by bone artifacts, or calcified emboli were excluded.

Statistical analysis

Statistical analyses were performed using SPSS 23.0 (IBM, Armonk, NY) and R (version 4.3.3, The R Foundation for Statistical Computing). Continuous variables were expressed as medians with interquartile ranges (IQR) and were compared using the Mann–Whitney U test. Categorical variables were presented as frequencies and percentages, and comparisons were conducted using Fisher’s exact test or the chi-square test. Missing data were handled using multiple imputations via the regression-switching method.

To evaluate inter-observer agreement, intra-class correlation coefficients (ICC) with 95% confidence intervals (CI) were calculated for quantitative indices. For categorical indices, Cohen’s kappa statistics was used, where k < 0.40 indicates poor agreement, 0.40 < k < 0.70 moderate agreement, and k > 0.70 good agreement17.

Correlation analysis among thrombus imaging characteristics was performed using Spearman correlation coefficients, and the results were visualized with the ggplot2 package in R. Correlation strength was categorized as follows: 0.10 ≤|r|< 0.40, weak; 0.40 ≤|r|< 0.70, moderate; |r|≥ 0.70, strong18.

Unsupervised clustering analysis and PCA were applied to classify thrombus features. Prior to clustering, data were standardized using the scale() function in R to ensure comparability across variables with different scales. K-means clustering was performed using the kmeans() function, with the number of clusters determined based on the Elbow method. The total within-cluster sum of squares (WCSS) was calculated for k ranging from 2 to 10, and the optimal number of clusters was identified as the "elbow point," where WCSS began to plateau. Cluster distributions were visualized using the ggplot2 package in R. To confirm robustness, the analysis was repeated with different initialization seeds, and consistency in cluster assignments was evaluated. For PCA, the prcomp() function was used, and data features were displayed in a two-dimensional principal component space. PCA loadings were calculated to reveal which variables contributed most to overall variance, with variable visualizations generated using the factoextra package to depict the weights and directions of features in the principal components.

A two-tailed P-value of less than 0.05 was considered statistically significant.

Results

Characteristics of study patients

This study included 520 AIS patients who underwent EVT from three medical institutions. Of these, 321 were male, accounting for 61.7%. The median age of the patients was 71 years (interquartile range 63–77 years). The specific demographic characteristics, baseline data, and clinical outcomes are detailed in Table 1.

Table 1 Demographic and clinical characteristics of the patients.

Thrombus imaging characteristics and subgroup analysis

For the measurement of quantitative data (DT, thrombus length, thrombus density NCCT, thrombus density CTA, thrombus permeability), the intra-class correlation coefficient (ICC) between the two readers ranged from 0.74 to 0.99. For qualitative CT features (high density sign, k = 0.82; CBS, k = 0.80), the inter-observer consistency was high (Supplemental Material Table S1).

The thrombus imaging characteristics are presented in Table 2. Regarding occlusion location, the middle cerebral artery (MCA) M1 segment had the highest proportion (57.9%), followed by the internal carotid artery (ICA) (28.7%) and MCA M2 segment (13.5%). Significant differences were found in thrombus characteristics across different occlusion locations: ICA thrombi had significantly longer thrombus lengths and lower CBS scores compared to M1 and M2 thrombi. Additionally, M1 segment thrombi had significantly lower CBS scores than M2 segment thrombi (P < 0.05). No significant differences in thrombus permeability were observed across occlusion locations.

Table 2 Thrombus imaging characteristics at different occlusion sites.

Statistical correlations were observed between some thrombus characteristics: CBS score was significantly correlated with occlusion location (r = 0.71, P < 0.05), DT (r = 0.68, P < 0.05), and thrombus length (r =  − 0.70, P < 0.05). These correlations are illustrated in Fig. 2.

Fig. 2
figure 2

Thrombus characteristic correlation matrix. CBS, clot burden score; DT, distance from ICA terminus to the thrombus.

Clustering analysis

Using k-means clustering analysis, the optimal number of clusters was determined to be three based on the Elbow method (Supplemental Material Figure S1). This method identified the Elbow point at k = 3, where within-cluster sum of squares (WCSS) began to plateau, indicating a balance between cluster compactness and separation. Data standardization ensured comparability, and cluster assignments were confirmed to be robust through repeated analyses with different initialization seeds. Based on 27 clinical variables and imaging characteristics, 520 patients were divided into three clusters (Cluster 1: 285 cases, Cluster 2: 46 cases, Cluster 3: 189 cases). The distribution of these clusters is visualized in Fig. 3.

Fig. 3
figure 3

Unsupervised clustering analysis of clinical data and thrombus imaging features.

In cluster 1, thrombus predominantly located in the MCA M1 segment, shorter thrombus lengths, and higher CBS scores. This cluster exhibited better clinical outcome, with 78.5% of patients (224/285) achieving a favorable prognosis. Cluster 2 mainly consisted of male patients with a history of smoking and drinking, with no significant differences in other characteristics. Cluster 3 was characterized by older age, higher NIHSS scores, and a higher incidence of hemorrhagic transformation. Thrombi in this cluster were predominantly located in the ICA, had the longest thrombus lengths, a higher proportion of high-density signs on NCCT, and the lowest CBS scores. In this cluster, 43.3% of patients (82/189) experienced poor clinical outcomes.

Significant differences were found among the three clusters in multiple clinical variables (age, sex, hypertension, atrial fibrillation, history of stroke or TIA, and TOAST classification) and thrombus imaging characteristics (responsible vessel, DT, thrombus length, NCCT density, high-density signs, and CBS) (P < 0.05). The distribution and proportions of thrombus imaging characteristics in each cluster are shown in Fig. 4. In terms of treatment outcomes, recanalization rates (86.3% [246/285] vs. 76.1% [35/46] vs. 85.2% [161/189], P = 0.012) and hemorrhagic transformation rates (38.2% [109/285] vs. 21.7% [10/46] vs. 55.0% [104/189], P = 0.001) also showed statistically significant differences.

Fig. 4
figure 4

Thrombus imaging features across clusters. (AB) Bar Charts of Responsible Vessel and High-Density Vascular Signs across Clusters. In Fig. 4A, Responsible_vessel values are coded as follows: 0 = Internal carotid artery (ICA), 1 = Middle cerebral artery M1 segment (MCA-M1), 2 = middle cerebral artery M2 segment (MCA-M2). In Fig. 4B, High_density_sign is coded as 0 = absent and 1 = present. (CF) Violin Plots of distance to thrombus (DT), thrombus length, thrombus CT density, and clot burden score (CBS) across clusters

Principal component analysis (PCA)

PCA revealed the contribution of each variable to the principal component space. Based on the first two principal components (PC1 and PC2), they explained 13.3% and 9.5% of the total variance, respectively. The PCA plot (Fig. 5) shows the distribution of samples along the principal components, with each point representing a sample, and the color indicating its cluster affiliation. The 95% confidence ellipses represent the distribution range of each cluster. Cluster 1 samples were more compact, indicating higher homogeneity within the cluster, whereas Cluster 2 and Cluster 3 samples were more dispersed, indicating greater heterogeneity and relatively lower clustering quality.

Fig. 5
figure 5

Distribution of samples in the principal component analysis space.

The variables contributing to the principal components are shown in Fig. 6. PC1 was primarily dominated by atrial fibrillation and thrombus permeability, with loadings of 0.45 and a positive correlation between them (orange arrows). PC2 was mainly driven by CBS score, responsible vessel, DT, and thrombus length, with loadings of − 0.45, − 0.44, − 0.41, and 0.41, respectively. This indicates that CBS contributed most to PC2 and was positively correlated with responsible vessel and DT, and negatively correlated with thrombus length.

Fig. 6
figure 6

Contribution of variables in the principal component space.

Discussion

In analyzing thrombus imaging features in AIS patients undergoing EVT, we identified a degree of continuity and collinearity among the evaluated features, which may compromise the accuracy of downstream predictive models. By integrating clinical data and thrombus imaging characteristics, this study employed unsupervised clustering to classify patients into three subtypes, with the validity of the clustering results confirmed through PCA. PCA further demonstrated significant differences between thrombus features and clinical variables in the principal component space, with atrial fibrillation, thrombus perviousness, and CBS playing pivotal roles in clustering. These differences likely reflect disease heterogeneity, underscoring the need for subtype-specific therapeutic strategies.

Previous studies have predominantly focused on the associations between thrombus imaging features and treatment or outcomes, highlighting the independent prognostic value of CBS, thrombus length, and perviousness, while partially exploring interrelationships among thrombus features4,7,21,22,23. However, they often overlooked potential collinearity, which can lead to inconsistent identification of key predictors, biased regression coefficients, and reduced statistical significance. Our findings revealed varying degrees of correlation among thrombus features. CBS, as a critical measure of thrombus burden, emerged as a central element within the network of thrombus features. Specifically, CBS was strongly correlated with the responsible vessel (r = 0.71) and thrombus length (r =  − 0.70) and moderately correlated with DT (r = 0.68). These results suggest a complex and diverse distribution of thrombus features across different occlusion sites, emphasizing the necessity of a comprehensive evaluation of thrombus characteristics to identify core variables with independent predictive value, thereby optimizing prognostic models.

Using unsupervised clustering, patients were categorized into three subtypes, with the validity of clustering further supported by PCA. The analysis revealed significant differences in clinical and imaging features among subtypes, with potential clinical implications. For instance, Cluster 1 was characterized by shorter thrombus length and higher CBS, with thrombi predominantly located in the MCA M1 segment, and a favorable prognosis (78.5% of patients achieved good outcomes). Cluster 2 mainly comprised male smokers and drinkers, with no significant differences in clinical variables. Cluster 3, on the other hand, exhibited the longest thrombus length, lowest CBS, older age, higher NIHSS scores, and poorer prognosis (43.3% of patients with unfavorable outcomes). These subtype-specific differences underscore the heterogeneity in thrombus and patient characteristics and suggest that distinct pathological mechanisms may underlie each subtype. From a clinical perspective, these clusters provide a promising framework for refining patient stratification and individualizing stroke management. For example, patients in Cluster 1 may respond well to standard EVT protocols without the need for intensified intervention. In contrast, the pronounced features of Cluster 3—such as long thrombi and low CBS—indicate the need for more aggressive preoperative imaging, close perioperative monitoring, or even future trials evaluating adjunctive pharmacologic therapies or enhanced EVT techniques. While Cluster 2 did not show clear associations with outcomes, its demographic and behavioral profile may warrant more research into prevention strategies and modifiable risk factor control. This stratification model may also inform the design of future clinical trials tailored to thrombus phenotype and risk profiles, ultimately guiding subtype-specific treatment decisions.

PCA further revealed how specific variables contributed to clustering structure. Atrial fibrillation and thrombus perviousness had the highest loadings in PC1, indicating their dominant role in separating clusters in the component space24,25. In PC2, CBS score, responsible vessel, DT, and thrombus length were the most influential variables, with CBS contributing most and demonstrating a positive correlation with DT and responsible vessel, and a negative correlation with thrombus length. These findings suggest that both clinical (e.g., atrial fibrillation) and imaging-based features (e.g., perviousness, CBS) jointly influence subtype distribution, further reinforcing the importance of integrating multimodal data in stroke phenotyping. The combined use of clustering and PCA provides a multidimensional framework that may enhance precision in outcome prediction and treatment planning.

To further elucidate the potential biological underpinnings of these imaging markers, it is important to consider how specific imaging features correlate with thrombus composition and stroke etiology. Clots with high red blood cell (RBC) content are generally associated with higher attenuation on NCCT and enhanced visibility of the hyperdense artery sign (HAS), and are often linked to large artery atherosclerosis (LAA) or non-cardioembolic sources24,25,26. These RBC-rich clots tend to have lower static friction and are more susceptible to mechanical thrombectomy, requiring fewer passes and resulting in faster recanalization. In contrast, fibrin- or platelet-rich clots are more resistant to both intravenous thrombolysis and endovascular treatment, and are more commonly associated with cardioembolic strokes. They also show lower perviousness on imaging due to tighter structure and reduced permeability to contrast agents. Moreover, CBS and thrombus length have been reported to negatively correlate with RBC content, which further supports the notion that imaging markers may serve as surrogates for histological composition. Thrombus perviousness, defined as the difference in attenuation between CTA and NCCT, has been linked to increased RBC content and better recanalization outcomes, potentially due to increased porosity that facilitates clot fragmentation27. These imaging-histology associations provide a biological rationale for the key imaging features identified in our PCA and clustering analyses and support their relevance in stratifying AIS patients by thrombus phenotype and likely treatment response.

This study has several limitations. First, imaging data were obtained from different scanning systems across multiple centers, which may introduce heterogeneity in measurements. However, this variability also enhances the generalizability of our findings. Second, thrombus length was measured from single-phase CTA images; future studies could incorporate multiphase CTA to improve spatial delineation and temporal accuracy. Third, although the clustering model was developed using a multicenter dataset, it has not yet been externally validated. We plan to further confirm its generalizability and clinical utility in future studies using independent prospective cohorts. Finally, the current analysis focused primarily on thrombus-specific imaging markers; broader inclusion of perfusion parameters, ASPECTS scores, or collateral grading may further enrich subtype definitions in future studies.

In conclusion, by integrating clinical and thrombus imaging features and employing clustering and PCA, this study provides a multidimensional risk assessment framework. These findings advance the stratification of AIS patients, support the development of thrombus imaging-based prognostic models, and inform the design of personalized therapeutic strategies.