Background

Stroke represents the foremost global cause of mortality and long-term disability, disproportionately affecting developing nations; these regions account for 75.2% of stroke-related fatalities and 81.0% of associated disability-adjusted life years1. In China, stroke has emerged as a critical public health challenge, with ischemic stroke (IS) constituting approximately 87% of all stroke cases2. The escalating burden of IS is driven by demographic shifts, including population aging and urbanization, compounded by the persistent prevalence of cardiovascular risk factors such as hypertension, hyperlipidemia, and diabetes mellitus3. Current diagnostic paradigms rely heavily on neuroimaging modalities, particularly computed tomography (CT)4 and magnetic resonance imaging (MRI)5. Therapeutic interventions remain constrained to narrow therapeutic windows for intravenous thrombolysis and endovascular thrombectomy, with these approaches carrying substantial risks of hemorrhagic complications, including intracranial hemorrhage and gastrointestinal bleeding2,6. These limitations underscore the urgent need for novel diagnostic biomarkers and targeted therapeutic strategies.

Emerging evidence underscores the pleiotropic effects of traditional cardiovascular risk factors including hypertension, hyperlipidemia, and hyperglycemia, which are associated with diverse pathologies ranging from malignancies7,8,9 to ischemic cardiovascular and cerebrovascular diseases10,11. Concurrently, the role of immune dysregulation in IS pathogenesis has garnered increasing attention. Krishnan et al. demonstrated that inflammatory cell infiltration disrupts the central nervous system’s immune microenvironment, exacerbating cerebral ischemia12. Thapa et al. further implicated chronic antigen presentation in driving adaptive immune responses that contribute to post-stroke morbidity13. These findings align with the growing therapeutic promise of immunomodulation, as evidenced by its success in oncology14 and cardiovascular disease management15. Notably, immune-regulatory strategies have shown potential to mitigate IS progression, restore neurological function, and improve clinical outcomes16,17positioning immunotherapy as a viable adjunct to conventional therapies.

Traditional epigenetic modifications encompass reversible alterations of proteins (histones) and DNA, enabling the regulation of gene expression without altering the underlying genetic sequence18. In recent years, RNA modifications have emerged as the third layer of epigenetic regulation, playing a pivotal role in modulating RNA metabolism and processing19. Among the various forms of RNA modifications identified—including N1-methyladenosine (m1A), N6-methyladenosine (m6A), and 5-methylcytosine (m5C)—m6A stands out as the most abundant and dynamically regulated modification in eukaryotic cells20. This reversible process is orchestrated by a diverse set of m6A regulators, including methyltransferases (writers), demethylases (erasers), and binding proteins (readers)21. Recent studies have underscored the critical role of m6A in immune response regulation. For example, Han et al. demonstrated that the reader protein YTHDF1 enhances antigen presentation by dendritic cells to CD8 + T cells through the upregulation of lysosomal cathepsin translation, thereby promoting tumor neoantigen cross-presentation and CD8 + T-cell cross-priming, which facilitates immune evasion in tumor cells22. Similarly, Wang et al. revealed that the reader protein HNRNPA2B1 initiates innate immune responses by recognizing viral DNA and facilitating m6A modification during viral infections23. Additionally, Li et al. reported that the deletion of the writer protein METTL3 in T cells disrupts their homeostatic differentiation, highlighting the importance of m6A in immune cell function24. Despite these advances, the precise role and mechanisms of m6A modification in the immune microenvironment of IS remain poorly understood. Therefore, comprehensive investigations into the impact of m6A on IS immune microenvironment dynamics and the identification of key immune-related genes are urgently needed. Such efforts will provide novel insights into the pathogenesis of IS from an epitranscriptomic perspective.

Results

Data preprocessing

The entire analytical workflow is illustrated in Fig. 1. Following data normalization and removal of missing values, standardized gene expression profiles were obtained for the GSE16561 and GSE58294 datasets. The integrated expression profile, comprising 16,138 unique gene symbols across 155 samples, was generated after merging the datasets and correcting for inter-batch variations (Supplementary Table 1).

Fig. 1
figure 1

Flow chart of the analysis. m6A: RNA N6-methyladenosine; IS: ischemic stroke; ssGSEA: single-sample gene-set enrichment analysis; GSVA: gene set variation analysis; WGCNA: weighted gene co-expression network analysis; LASSO: least absolute shrinkage and selection operator; SVM-RFE: support vector machine-recursive feature elimination; ABCA1: ATP binding cassette subfamily A member 1; PRRG4: proline rich and Gla domain 4; CPD: carboxypeptidase D; C19orf24: chromosome 19 open reading frame 24; WDR46: WD repeat domain 46.

Landscape of m6A regulators between healthy and IS samples

Twelve m6A regulators were analyzed, including four writers (methyltransferase 3, N6-adenosine-methyltransferase complex catalytic subunit [METTL3], RNA-binding motif protein 15B [RBM15B], WT1 associated protein [WTAP], and Cbl proto-oncogene like 1 [CBLL1]), six readers (heterogeneous nuclear ribonucleoprotein C [HNRNPC], heterogeneous nuclear ribonucleoprotein A2/B1 [HNRNPA2B1], YTH N6-methyladenosine RNA binding protein F [YTHDF] 1, YTHDF2, YTHDF3, fragile X messenger ribonucleoprotein 1 [FMR1]), and two erasers (alkB homolog 5, RNA demethylase [ALKBH5] and FTO alpha-ketoglutarate dependent dioxygenase [FTO]). Differential expression analysis revealed significant upregulation of WTAP and YTHDF3 in IS samples compared to controls, whereas METTL3, RBM15B, CBLL1, YTHDF1, ALKBH5, and HNRNPA2B1 were downregulated (Fig. 2A, B).

Fig. 2
figure 2

The expression landscape of m6A RNA methylation regulators in IS and the construction of a random forest model to identify key m6A regulators. The box plot (A) and heatmap plot (B) reveal significant differences in the expression of eight m6A regulators between healthy and IS samples. (C), The influence of the number of decision trees on the error rate is demonstrated, with the x-axis representing the number of decision trees and the y-axis indicating the error rate. The error rate remains relatively stable at approximately 450 decision trees. (D), Results of the Gini coefficient method in the random forest classifier. The x-axis indicates the genetic variable, and the y-axis represents the importance index. *p < 0.05; **p < 0.01; ***p < 0.001.

Random forest screening for key m6A regulators

A cyclic random forest model evaluated variable numbers (1–20) and tree counts (1–500), with 450 trees selected as optimal due to stable error rates (Fig. 2C). Eight regulators (WTAP, YTHDF3, METTL3, RBM15B, CBLL1, YTHDF1, ALKBH5, HNRNPA2B1) with importance values > 2 were prioritized (Fig. 2D).

Nomogram model for IS diagnosis

The constructed predictive nomogram, which integrated expression profiles of eight key m6A regulators, effectively stratified IS risk (Fig. 3A). Validation analyses consistently demonstrated superior discriminative capacity in differentiating IS patients from controls: calibration curves revealed optimal consistency between nomogram-predicted risks and actual observations (Fig. 3B); decision curve analysis (DCA) coupled with clinical impact curves highlighted substantial clinical applicability (Fig. 3C, D). Particularly noteworthy was the receiver operating characteristic (ROC) analysis showing exceptional diagnostic accuracy, with an AUC reaching 0.912 (Fig. 3E). These multimodal validations collectively confirmed the model’s precision and clinical implementation potential for IS risk assessment.

Fig. 3
figure 3

The establishment and validation of a predictive nomogram for IS were carried out based on eight m6A regulators. (A), The nomogram of the model. (B), Calibration plot indicates the model’s performance, with the diagonal dotted line representing a perfect prediction by an ideal model. Decision curve analysis (DCA) (C) evaluates the model’s performance, and a closer fit to the diagonal dotted line indicates improved prediction. Clinical impact curve analysis (D) underscores the excellent clinical applicability of the nomogram. Receiver operating characteristic (ROC) analysis (E) reaffirms the model’s efficacy in distinguishing IS patients from healthy subjects.

Distinct m6A methylation patterns in IS

Unsupervised clustering based on the eight regulators identified two m6A modification subtypes: Cluster A (38 samples) and Cluster B (70 samples) (Fig. 4A-C, Supplementary Table 2). Principal component analysis (PCA) confirmed clear transcriptomic separation between subtypes (Fig. 4D).

Fig. 4
figure 4

Unsupervised clustering of the eight key m6A regulators revealed the existence of two distinct m6A modification pattern subtypes in IS samples. (A) Consensus clustering cumulative distribution function (CDF) for k = 2–9. (B) Relative change in the area under the CDF curve for k = 2–9. (C) Heatmap of the matrix of cooccurrence proportions for IS samples. (D) Principal component analysis (PCA) for the transcriptome profiles of two m6A subtypes, showing a remarkable difference in the transcriptome between different modification patterns.

Immune microenvironment characteristics

Comparative immunophenotyping revealed distinct cellular infiltration patterns between IS patients and controls (Fig. 5A). IS cohorts demonstrated significantly elevated infiltration levels of type-17 helper T cells, regulatory T cells (Tregs), plasmacytoid dendritic cells (pDCs), neutrophils, natural killer (NK) cells, mast cells, macrophages, immature dendritic cells (iDCs), γδ T cells, eosinophils, and activated dendritic cells (aDCs). Conversely, reduced proportions were observed in follicular helper T (Tfh) cells, monocytes, immature B cells, CD56dim NK cells, CD56bright NK cells, activated CD8 + T cells, and activated B cells. Cluster B demonstrated enhanced infiltration of pDCs, NK cells, γδ T cells, eosinophils, and activated CD4 + T cells, whereas Cluster A exhibited predominant accumulation of monocytes, iDCs, and CD56dim/bright NK cells (Fig. 5B; Supplementary Table 3).

Fig. 5
figure 5

Evaluation of the variations in immune microenvironment-infiltrating immunocyte abundance between controls and IS patients (A) and across two distinct m6A modification patterns (B).

HALLMARK pathway analysis

Gene set variation analysis (GSVA) revealed significant pathway dysregulation in IS patients compared to controls, including upregulated KRAS signaling, coagulation, and angiogenesis, as well as downregulated UV response, reactive oxygen species pathway, glycolysis, xenobiotic metabolism, inflammatory response, epithelial-mesenchymal transition, mTORC1 signaling, PI3K/Akt/mTOR signaling, complement activation, apical junction signaling, protein secretion, androgen response, late-stage estrogen response, adipogenesis, G2-M checkpoint, IL-6/JAK/STAT3 signaling, cholesterol homeostasis, hypoxia response, and TNF-α signaling via NF-κB (Fig. 6A).

Fig. 6
figure 6

Investigation of the differences in HALLMARKS pathway enrichment scores between controls and IS patients (A) and within two distinct m6A modification patterns (B).

Furthermore, the m6A cluster-B group exhibited higher enrichment of several pathways compared to the m6A cluster-A group. These pathways included pancreas beta cells, upregulated KRAS signaling, downregulated UV response, E2F targets, mTORC1 signaling, protein secretion, androgen response, G2-M checkpoint, IL-6/JAK/STAT3 signaling, TGF-β signaling, Wnt/β-catenin signaling, and TNF-α signaling via NF-κB (Fig. 6B). Additional results from the GSVA enrichment analysis are provided in Supplementary Table 4.

Immune-associated gene modules

Using weighted gene co-expression network analysis (WGCNA), we identified gene modules significantly associated with infiltrating immune cells, with a soft threshold of 12 applied based on a correlation coefficient threshold of 0.9 (Fig. 7A), and a topological overlap matrix (TOM) was constructed by calculating the adjacency and correlation matrices of the gene expression profiles. The resulting gene cluster tree is depicted in Fig. 7B. Hierarchical mean linkage clustering, combined with TOM, was employed to identify gene modules within the gene network, and the corresponding heatmap is shown in Fig. 7C. Notably, the dynamic tree-cutting algorithm identified eight distinct gene modules, as illustrated in Fig. 7D. Among these modules, the black module exhibited significant correlations with IS (r = 0.45, p = 7e-9), m6A cluster-B (r = 0.68, p = 2e-22), NK cells (r = 0.52, p = 5e-12), neutrophils (r = 0.47, p = 1e-09), and pDCs (r = 0.57, p = 7e-15) among the eight identified modules.

Fig. 7
figure 7

Weighted gene co-expression network analysis. (A), Analysis of network topology under various soft-thresholding powers. (B), Visualization of gene clustering dendrograms. (C), Presentation of correlations among the indicated modules. (D), Associations between modules and m6A cluster B and several infiltrating immune cells.

Key gene identification via machine learning

The black module exhibited a significant correlation with gene significance (r = 0.67, p = 2.6e-43; Fig. 8A), and a total of 322 genes within this module were identified (Supplementary Table 5). Using the least absolute shrinkage and selection operator (LASSO) regression algorithm (Fig. 8B) and the support vector machine-recursive feature elimination (SVM-RFE) algorithm (Fig. 8C), 24 and 13 key genes were identified, respectively. Notably, seven overlapping genes—ATP binding cassette subfamily A member 1 [ABCA1], carboxypeptidase D [CPD], proline rich and Gla domain 4 [PRRG4], WD repeat domain 46 [WDR46], ANTXR cell adhesion molecule 2 [ANTXR2], chromosome 19 open reading frame 24 [C19orf24], and plexin domain containing 2 [PLXDC2]—were identified by both machine learning methods (Fig. 8D). A comprehensive list of gene symbols for the key genes identified by these algorithms is provided in Supplementary Table 6.

Fig. 8
figure 8

Associations between gene significance and module membership, and identification of key genes of IS by machine learning. (A), Representative scatterplot representing the correlations between gene significance and module membership in the black module. (B), Represents the key genes identified by LASSO regression. (C), Represent the key genes identified by SVM-RFE algorithm. (D), Venn diagrams illustrate the seven overlapping key genes identified by two machine learning methods.

Internally verify the expression of key genes

The expression levels of ABCA1, PRRG4, PLXDC2, ANTXR2, and CPD were upregulated in IS patients compared to controls, whereas C19orf24 and WDR46 were downregulated (Fig. 9A). Additionally, the m6A cluster-B group exhibited higher expression of CPD, PRRG4, and ABCA1 but lower levels of C19orf24 and WDR46 compared to the m6A cluster-A group (Fig. 9B).

Fig. 9
figure 9

The expression differences of 7 key genes between controls and IS patients (A) and among two distinct m6A modification patterns (B). *p < 0.05; **p < 0.01; ***p < 0.001.

External validation across six independent cohorts using meta-analysis

Critically, the meta-analysis—designed to establish population-level generalizability beyond discovery cohorts—confirmed WDR46 as a protective factor against IS susceptibility (pooled odds ratio [OR] = 0.74, 95% confidence interval [CI]: 0.57–0.97; Fig. 10A). Conversely, CPD (OR = 1.46, 95% CI: 1.01–2.10) and ABCA1 (OR = 1.57, 95% CI: 1.12–2.22) demonstrated robust risk associations across six independent cohorts (Fig. 10B-C; p < 0.05 for all). These clinically interpretable effect sizes, derived through multi-cohort synthesis with covariate adjustment, validate the trans-platform consistency of these biomarkers. Non-significant associations for C19orf24 and PRRG4 (Supplementary Fig. 1 A, B) further highlight the selectivity of this approach.

Fig. 10
figure 10

Meta-analysis of three key genes including WDR46, CPD and ABCA1. Representative graphs showing the effect sizes of WDR46 (A), CPD (B), and ABCA1 (C) on the risk of IS across six distinct datasets. The RE model represents the result of the meta-analysis.

Correlation analysis

As shown in Fig. 11A, the CPD and ABCA1 genes exhibited significant positive correlations with pDCs, neutrophils, NK T cells, macrophages, myeloid-derived suppressor cells (MDSCs), γδ T cells, and eosinophils. In contrast, WDR46 showed a negative correlation with these immune cell types. However, CPD and ABCA1 displayed negative correlations, while WDR46 exhibited positive correlations with CD56 bright natural killer cells, CD56 dim natural killer cells, Tfh cells, activated CD8 T cells, and activated B cells.

Fig. 11
figure 11

Heatmap depicting the correlations between WDR46, CPD and ABCA1 genes and infiltrating immune cells (A), and 50 HALLMARKS pathways (B). *p < 0.05; **p < 0.01; ***p < 0.001.

In Fig. 11B, CPD and ABCA1 demonstrated significant positive associations with pathways such as androgen response, complement activation, G2/M checkpoint, IL-6/JAK/STAT3 signaling, inflammatory response, upregulated KRAS signaling, mTORC1 signaling, protein secretion, TNF-α signaling via NF-κB, and downregulated UV response. Conversely, WDR46 exhibited negative correlations with these pathways. Additionally, CPD and ABCA1 showed significant negative associations with DNA repair, fatty acid metabolism, MYC targets V1 and V2, oxidative phosphorylation, p53 pathway, peroxisome function, unfolded protein response, and upregulated UV response, whereas WDR46 displayed significant positive associations with these pathways.

As illustrated in Supplementary Fig. 2, the interplay between the three hub genes (CPD, ABCA1, and WDR46) and eight key m6A regulatory factors was elucidated. Specifically, WDR46 showed significant negative correlations with WTAP and YTHDF3, while CPD and ABCA1 exhibited significant negative correlations with METTL3 and significant positive correlations with YTHDF1, WTAP, and YTHDF3.

Validation and diagnostic evaluation of key gene expression in clinical specimens

Differential gene expression profiles emerged between IS patients and healthy controls (Fig. 12). Specifically, WDR46 expression was significantly downregulated in patients (p < 0.01), whereas ABCA1 and CPD exhibited marked upregulation compared to healthy subjects (p < 0.01 for both). In contrast, no significant differences were observed in C19orf24 and PRRG4 expression between groups (Fig. 12A). Diagnostic performance analysis revealed robust predictive capacity for the differentially expressed genes: WDR46 achieved an AUC of 0.82 (95% CI 0.74–0.89), with ABCA1 and CPD showing superior discriminative power at AUC 0.88 (95% CI 0.82–0.94) and 0.90 (95% CI 0.84–0.96), respectively (Figs. 12B-D).

Fig. 12
figure 12

Validation and diagnostic evaluation of key gene expression in clinical specimens. (A), Quantitative analysis of key gene expression profiles via RT-qPCR. (B-D), ROC curves illustrating the diagnostic performance of WDR46, ABCA1, and CPD in discriminating IS cases from controls. *p < 0.01.

Discussion

Stroke, a cerebrovascular disorder characterized by disrupted cerebral blood circulation, manifests as either ischemic or hemorrhagic subtypes. Ischemic stroke (IS) accounting for 87% of stroke cases, primarily arises from atherosclerosis and cardioembolism. While atherosclerotic plaque formation and thrombogenesis in cerebral arteries often remain clinically silent, abrupt vascular occlusion induces neuronal dysfunction through ischemia-hypoxia mechanisms, culminating in structural damage and neurological deficits25. Emerging evidence underscores the pivotal role of immune dysregulation in IS pathophysiology. Acute-phase infiltration of innate immune cells into the brain parenchyma exacerbates ischemic injury via inflammatory cascades26. Activated neutrophils, for instance, secrete matrix metalloproteinases (MMPs), inducible nitric oxide synthase (iNOS), and reactive oxygen species (ROS) which collectively disrupt the blood-brain barrier and promote cytotoxic cell death, thereby impeding neurorepair27. Concurrently, circulating monocytes transmigrate into the brain, differentiating into tissue-resident macrophages with dual functional roles: pro-inflammatory M1 macrophages exacerbate neuronal damage through cytotoxic mediator release, while anti-inflammatory M2 macrophages facilitate debris clearance and neuroprotection via trophic factor secretion28,29. Notably, recent studies highlight the regulatory significance of m6A RNA modification in shaping the IS immune microenvironment30though its precise mechanistic underpinnings remain elusive. To address this knowledge gap, we systematically investigated how m6A-driven epitranscriptomic remodeling governs immune cell infiltration and immune-related gene expression in IS.

Our study yielded several pivotal discoveries that collectively advance the understanding of m6A-mediated immune dysregulation in IS. First, comparative analysis revealed significant dysregulation of eight m6A regulators in IS patients versus controls: METTL3, RBM15B, CBLL1, YTHDF1, HNRNPA2B1, and ALKBH5 were downregulated, whereas WTAP and YTHDF3 were upregulated. To translate these molecular signatures into clinical utility, we constructed a predictive nomogram integrating these regulators, which demonstrated robust diagnostic accuracy (AUC = 0.92). This model not only validated the clinical relevance of m6A dysregulation in IS but also served as a critical framework to prioritize regulators for mechanistic exploration. Second, unsupervised clustering stratified IS patients into two distinct m6A modification subtypes (Custers A/B). Cluster B exhibited heightened infiltration of pDCs, NK cells, γδ T cells, and eosinophils, coupled with reduced monocyte and CD56dim/bright NK cell populations compared to cluster A. This immunophenotypic divergence underscores m6A modification as a critical modulator of immune microenvironment heterogeneity in IS. Such molecular subtyping aligns with precision medicine paradigms in oncology31,32suggesting potential utility in guiding immunotherapeutic strategies for IS patients with divergent immune profiles.

Recent advances in bioinformatics have enabled systematic identification of immune-related genes implicated in IS pathogenesis. Li et al. demonstrated that SLAMF1, IL7R, and NCF4 serve as potential therapeutic targets to enhance post-IS functional and histological recovery, with neutrophils emerging as critical cellular mediators for pharmacological intervention33. Ren et al. further identified a panel of inflammation-associated genes (TNFSF10, ID1, PAQR8, OSR2, PDK4, PEX11B, TNIP1, FFAR2, JUN) exhibiting diagnostic utility for IS, while elucidating dynamic shifts in lymphocyte, monocyte, and neutrophil populations during disease progression34. Complementary work by Wei et al. linked IL7R, ITK, SOD1, CD3D, LEF1, FBL, MAF, DNMT1, and SLAMF1 to molecular subtypes and immune dysregulation in IS35. Concurrently, Li et al. identified a robust correlation between elevated CLEC4D expression and neutrophil infiltration dynamics, suggesting its candidacy as a therapeutic target for immunoneuroprotective strategies in post-reperfusion injury models of IS36. Collectively, these studies have both advanced our understanding of immune cell infiltration patterns and uncovered novel molecular markers in IS. However, the regulatory role of m6A RNA modification in shaping immune microenvironment dynamics remains poorly characterized, and mechanistic connections between m6A regulators and immune-related genes are yet to be comprehensively explored. To address these gaps, we employed an integrative approach combining WGCNA with machine learning algorithms, nominating C19orf24, PLXDC2, CPD, PRRG4, ANTXR2, ABCA1, and WDR46 as pivotal immune-related genes in IS pathogenesis. Differential expression analysis revealed marked disparities in C19orf24, CPD, PRRG4, ABCA1, and WDR46 levels between IS patients and controls, as well as across m6A modification subtypes. Meta-analysis integrating six independent cohorts demonstrated robust associations between the candidate genes and IS susceptibility. Specifically, WDR46 (OR = 0.74, 95% CI: 0.57–0.97) exhibited a protective effect against IS development, whereas CPD (OR = 1.46, 95% CI: 1.01–2.10) and ABCA1 (OR = 1.57, 95% CI: 1.12–2.22) were significantly associated with elevated disease risk. These findings were further corroborated by RT-qPCR validation in clinical specimens, where ABCA1 and CPD expression levels were markedly upregulated, and WDR46 was significantly downregulated in IS patients compared to controls. Diagnostic performance assessment via ROC curve analysis revealed strong discriminative power for these biomarkers, with AUC values of 0.88 (95% CI: 0.82–0.94) for ABCA1, 0.90 (95% CI: 0.84–0.96) for CPD, and 0.82 (95% CI: 0.74–0.89) for WDR46, underscoring their potential utility in IS diagnosis. These findings collectively indicate that ABCA1, CPD, and WDR46 hold significant potential as novel molecular biomarkers for the diagnosis and therapeutic management of IS.

Comprehensive characterization of the immune microenvironment in IS revealed significant alterations in distinct immune cell subsets. IS cohorts exhibited significantly elevated proportions of pro-inflammatory subsets, Th17 cells, Tregs, pDCs, neutrophils, NK cells, mast cells, macrophages, iDCs, γδ T cells, eosinophils, alongside reduced infiltration of Tfh cells, monocytes, immature B cells, CD56dim NK cells, CD56bright NK cells, activated CD8 + T cells, and activated B cells. Intriguingly, CPD and ABCA1 exhibited positive correlations with pro-inflammatory immune subsets—including pDCs, neutrophils, NK T cells, macrophages, MDSCs, γδ T cells, and eosinophils—whereas WDR46 inversely correlated with these populations. Conversely, WDR46 showed positive associations with CD56bright/dim NK cells, Tfh cells, and activated CD8 + T cells, suggesting divergent immunomodulatory roles for these genes in shaping IS pathophysiology. Regulatory network analysis further uncovered intricate interplay between these genes and m6A machinery. WDR46 exhibited strong negative correlations with m6A “writers” WTAP and YTHDF3, whereas CPD and ABCA1 inversely correlated with the writer METTL3 but positively associated with YTHDF1, WTAP, and YTHDF3. Pathway enrichment analysis aligned these correlations with functional outcomes: CPD and ABCA1 were positively linked to pro-inflammatory cascades including IL-6/JAK/STAT3 and TNF-α/NF-κB pathways, whereas WDR46 associated with reparative processes, including oxidative phosphorylation and DNA repair. These findings collectively position CPD, ABCA1, and WDR46 as pivotal nodes bridging m6A epitranscriptomic remodeling to immune microenvironment dysregulation in IS. Future studies interrogating their mechanistic roles—such as m6A-dependent post-transcriptional regulation of these genes or their influence on immune cell differentiation—are warranted to advance therapeutic targeting.

Although this study provides novel insights into m6A-mediated immune dysregulation in IS, several methodological limitations warrant acknowledgment. First, inherent technical limitations in cross-platform microarray data harmonization led to the exclusion of 14 m6A regulators due to undetectable expression signals, potentially omitting critical epitranscriptomic players. Future investigations employing platform-agnostic sequencing technologies are required to comprehensively map m6A dynamics in IS. Second, while bulk transcriptomic deconvolution enabled immune cell quantification, this approach lacks the resolution to discern spatially and temporally heterogeneous immune subpopulations. Single-cell RNA sequencing coupled with spatial transcriptomics could refine our understanding of niche-specific immune interactions in the peri-infarct microenvironment. Third, the absence of granular clinical metadata—including stroke severity scales, infarct volume, and comorbid conditions (e.g., diabetes, hypertension)—in public datasets precluded our ability to assess the clinical relevance of m6A modification patterns in relation to patient prognosis or therapeutic response. Finally, although meta-analysis and RT-qPCR validation robustly associated ABCA1, CPD, and WDR46 with IS susceptibility, mechanistic studies combining in vitro and in vivo approaches will further elucidate the pathogenic mechanisms linking m6A regulators to immune-related gene dysregulation and the pathogenesis of IS.

Conclusions

This work establishes m6A modification as a master regulator of immune microenvironment dynamics in IS. The identification of ABCA1, CPD, and WDR46 as m6A-associated immune hubs advances our understanding of IS pathophysiology and highlights their potential as diagnostic biomarkers and therapeutic targets. Future studies should prioritize translational validation of these findings through preclinical models and clinical cohorts to accelerate the development of precision immunotherapies for IS.

Materials and methods

Data acquisition and preprocessing

Gene expression datasets GSE16561 (24 controls, 39 ischemic stroke samples; GPL6883 platform) and GSE58294 (23 controls, 69 IS samples; GPL570 platform) were acquired from the Gene Expression Omnibus (GEO) repository. Processing comprised sequential steps: (1) Platform annotation files converted probe identifiers to gene symbols, generating standardized expression matrices with gene symbols as row identifiers and sample names as column headers; (2) Comprehensive outlier assessment via principal component analysis confirmed retention of all 47 control and 108 IS samples; (3) Genes exhibiting substantial missing values were excluded to enhance data integrity; (4) Expression values from multiple probes corresponding to identical genes were averaged to resolve redundancy. Subsequent normalization implemented platform-specific approaches: quantile normalization via the limma package37 for Agilent arrays and Robust Multi-array Average (RMA) processing for Affymetrix platforms, with exclusion of multi-gene mapped probes. Technical batch effects were addressed through ComBat adjustment (sva package), yielding a consolidated expression matrix (Supplementary Fig. 3). Additional validation cohorts (GSE22255, GSE66724, GSE180470, GSE199819) were similarly processed for meta-analysis integration.

Identification of key m6A regulators

The expression status of 12 m6A regulators—selected from an initial panel of 26 candidates (writers, erasers, readers) curated from established literature21,38—was compared between healthy individuals and IS patients using the Wilcoxon rank-sum test. Fourteen regulators were excluded due to undetectable expression signals in the merged microarray datasets (GSE16561 and GSE58294), a limitation attributed to cross-platform technical biases such as probe sequence mismatches and annotation discrepancies. The retained regulators (METTL3, WTAP, RBM15B, CBLL1, YTHDF1, YTHDF2, YTHDF3, HNRNPA2B1, HNRNPC, FMR1, ALKBH5, FTO) demonstrated consistent detectability across cohorts. Subsequently, a random forest model (randomForest package) with 450 decision trees and node size = 4 was trained to prioritize key regulators. Variable importance, quantified via the Gini coefficient method, identified factors with scores > 239 as biologically significant for nomogram construction.

Nomogram development and validation

The nomogram, a well-established method for visualizing logistic regression prediction models, was developed using the “rms” package in R. This model incorporated the expression profiles of eight key m6A RNA methylation regulators as independent variables to predict the binary outcome (IS vs. non-IS). The resulting nomogram provides a multivariate tool for quantifying the cumulative contribution of these regulators to IS risk stratification. The model’s clinical validity was rigorously assessed through three complementary evaluation frameworks: calibration curves for predictive consistency assessment, DCA for clinical utility evaluation, and ROC curve analysis. Diagnostic efficacy was objectively measured by calculating the AUC, which provides a standardized metric of discriminative accuracy in risk prediction.

m6A modification subtyping

Unsupervised clustering analysis was performed on the expression profiles of eight key m6A regulators to delineate discrete m6A modification patterns in IS samples. The robustness of cluster classification was evaluated using the consensus clustering algorithm40,41with the optimal cluster number determined through iterative stability assessment. To ensure reproducibility, the analysis was iterated 1,000 times via the “ConsensusClusterPlus” package in R42. Furthermore, PCA was applied to validate the transcriptional divergence between identified m6A modification subtypes, confirming distinct molecular trajectories driven by these regulators.

Immune microenvironment profiling

To characterize immune microenvironment heterogeneity, single-sample gene-set enrichment analysis (ssGSEA) was conducted to quantify infiltration levels of 28 immune cell subtypes across healthy controls and IS cases, as well as among stratified m6A modification clusters. This computational approach calculates enrichment scores through rank-based gene set projection, reflecting cell-type-specific abundance within individual samples43. Immune gene signatures were obtained from established immunogenomic resources40with subsequent comparative analyses of immune infiltration scores performed using the Wilcoxon rank-sum test to identify differential cellular distributions between molecularly defined m6A subgroups.

Complementary pathway interrogation was implemented through GSVA to systematically evaluate activity variations across 50 canonical pathways from the HALLMARK collection. The analytical workflow, executed in the R statistical environment using the dedicated “GSVA” package, incorporated molecular signatures retrieved from the publicly accessible MSigDB repository (http://software.broadinstitute.org/gsea/msigdb/index.jsp). Comprehensive pathway analysis encompassed both case-control comparisons and inter-cluster evaluations of m6A modification patterns. To investigate potential interactions between molecular networks and biological processes, Spearman’s rank-order correlation coefficients were computed to delineate associations among core regulatory genes, immune infiltration profiles, and pathway activation states.

Gene co-expression network construction

As a robust framework for constructing gene co-expression networks, WGCNA enables integrative analysis of transcriptomic datasets derived from heterogeneous sources within a single species44. The systems biology paradigm was implemented through WGCNA to establish a biologically interpretable interaction network45. Following variance-driven selection criteria (top quartile of expression variability), an adjacency matrix was constructed using optimal soft-thresholding parameters (power = 12) to achieve scale-free topology (R² >0.85), as detailed in our prior methodological framework46. Network architecture followed established scale-free distribution principles.

Predictive biomarker discovery via machine learning

Advanced machine learning architectures were implemented for feature selection: (1) LASSO regression47 with cyclic coordinate descent optimization (“glmnet” package) applied to co-expression module-derived gene subsets, and (2) SVM-RFE48 using radial basis function (RBF)-kernel classifiers to compute feature importance metrics. The LASSO implementation employed L1-norm regularization to enhance feature selection stability in high-dimensional space49whereas SVM-RFE was executed through the RFE class within scikit-learn’s feature selection module50. This recursive elimination approach, applicable to diverse model architectures, iteratively identifies the optimal feature subset through backward selection, thereby maximizing classification performance. Biomarkers demonstrating consensus selection through both regularization pathways underwent discriminative validation via support vector classification (“e1071” package). Non-parametric statistical testing (Wilcoxon rank-sum) confirmed differential expression patterns of these hub genes across clinical classifications (controls vs. IS) and m6A epigenetic subgroups.

Independent validation of candidate biomarkers via multi-cohort meta-analysis

To evaluate the generalizability and clinical robustness of ABCA1, CPD, PRRG4, C19orf24 and WDR46 across diverse populations and platforms, we analyzed six datasets as discrete entities: discovery cohorts GSE16561 and GSE58294 underwent standalone re-analysis (non-integrated), and four external validation cohorts—GSE22255, GSE66724, GSE180470, and GSE199819—were newly introduced without prior use in any analysis. This design explicitly benchmarks whether expression trends identified in discovery cohorts generalize across independent populations with demographic/technical heterogeneity. Then, a multivariate linear regression framework adjusted for demographic confounders (age and sex) was implemented to assess immune-related transcriptional signatures in IS pathogenesis. To address etiological heterogeneity across the six integrated cohorts—including IS, large-artery atherosclerosis, and cardioembolic stroke subtypes—stroke etiology was incorporated as a stratification covariate within the linear regression framework, thereby controlling for subtype-specific confounding effects. Following cohort-specific differential expression analyses of prioritized IS-associated transcripts, summary effect estimates (ORs) for IS risk were synthesized through fixed-effects modeling using the “metafor” package in R, providing integrated measures of biomarker-disease associations across all datasets.

Study population

This investigation enrolled a cohort of 126 participants (66 controls and 60 IS cases) consecutively admitted to Hunan Provincial People’s Hospital. Stroke diagnosis was established through comprehensive neurological assessments combined with neuroimaging confirmation via cerebral magnetic resonance imaging (MRI), adhering to adhering to the International Classification of Diseases (9th Revision). Exclusion parameters for the stroke cohort encompassed individuals with hematological disorders, type 1 diabetes mellitus, autoimmune conditions, thyroid dysfunction, malignancies, or hepatorenal pathologies. The experimental protocol received ethical validation from the Institutional Review Board of Hunan Provincial People’s Hospital and conformed to ethical principles outlined in the 2008 amendment of the Declaration of Helsinki (accessible at http://www.wma.net/en/30publications/10policies/b3/). Prior to enrollment, written informed consent was obtained from all study participants following full disclosure of research objectives and procedures.

RT-qPCR quantification with ROC curve validation

After participants fasted overnight, 5 mL venous blood specimens were obtained from all study participants through standardized venipuncture procedures. Peripheral blood mononuclear cells (PBMCs) were fractionated by density gradient separation, with subsequent RNA extraction conducted using TRIzol reagent (Invitrogen, USA) following the manufacturer’s optimized protocol. First-strand cDNA synthesis was performed with a PrimeScript RT Reagent Kit (Takara Bio, Japan) under controlled thermal cycling conditions (37 °C for 15 min, 85 °C for 5 s). Quantitative RT‒qPCR was performed using a Taq PCR Master Mix Kit (Takara) on an ABI Prism 7500 sequence-detection system (Applied Biosystems, USA). Gene-specific primer pairs targeting five candidate biomarkers, commercially synthesized and validated by Songon Biotech (Shanghai, China), were employed for target quantification. Gene expression differences between the ischemic stroke patients and controls were assessed using two-tailed independent Student’s t-tests. ROC curves were generated based on gene expression levels of key genes, with AUC values and corresponding 95% CIs calculated through nonparametric analysis implemented in MedCalc software (version 19.7.4; MedCalc Software Ltd, Ostend, Belgium). A p value < 0.05 was considered statistically significant.