Aging associated immunosenescence in rheumatoid arthritis identified by machine learning and single cell profiling

Ji, Xinxin; Li, Lingyun; Jiao, Yuanzhuo; Cheng, Hui

doi:10.1038/s41598-025-15370-5

Download PDF

Article
Open access
Published: 23 August 2025

Aging associated immunosenescence in rheumatoid arthritis identified by machine learning and single cell profiling

Xinxin Ji¹,
Lingyun Li¹,
Yuanzhuo Jiao¹ &
…
Hui Cheng¹

Scientific Reports volume 15, Article number: 31042 (2025) Cite this article

2374 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Rheumatoid arthritis (RA) is increasingly prevalent among older adults, who often experience more severe symptoms and face significant treatment challenges. This study aims to identify specific genes associated with aging in RA and to analyze their immune infiltration using machine learning techniques. We sourced senescent genes from the HARG database and utilized three RA patient datasets obtained from the GEO database. Differential analysis revealed 50 age-related differentially expressed genes (ARDEGs) that intersected with senescent genes. Hub genes were identified through protein-protein interaction (PPI) network analysis as well as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. Machine learning methods, including LASSO regression, random forest (RF), and support vector machine recursive feature elimination (SVM-RFE), were employed to extract feature genes. Single-sample gene set enrichment analysis (ssGSEA) quantified immune cell infiltration, revealing 242 up-regulated and 176 down-regulated differentially expressed genes (DEGs). Notably, high levels of effector memory CD8 T cells and macrophages were found to be associated with robust immune responses. This study successfully identified four biomarkers related to aging in RA, suggesting that STAT1 may serve as a viable therapeutic target. These findings have the potential to enhance treatment strategies and improve patient outcomes while providing valuable insights into immune cell subpopulations in RA.

Identification of essential genes and immune cell infiltration in rheumatoid arthritis by bioinformatics analysis

Article Open access 04 February 2023

Identification of aging-related biomarkers and immune infiltration analysis in renal stones by integrated bioinformatics analysis

Article Open access 01 July 2025

Identification of potential pathogenic genes associated with the comorbidity of rheumatoid arthritis and renal fibrosis using bioinformatics and machine learning

Article Open access 01 July 2025

Introduction

Rheumatoid arthritis (RA) is a chronic, systemic autoimmune disorder primarily affecting the joints. It is characterized by inflammation, discomfort, stiffness, and a decline in joint function¹. As individuals age, the immune system undergoes functional changes that manifest as immunosenescence. Aging represents a complex biological process; one notable alteration is the reduction in both T and B cell activity and quantity. This decline renders older adults more susceptible to autoimmune diseases such as RA. Furthermore, inflammatory senescence has been identified as a significant contributor to the pathogenesis of RA, thereby increasing its prevalence among older populations².

Patients with rheumatoid arthritis (RA) who experience onset after the age of 60 are typically classified as having clinically elderly rheumatoid arthritis, also referred to as elderly-onset rheumatoid arthritis (EORA). The prevalence of elderly patients with RA is steadily increasing in conjunction with an aging population³. Individuals diagnosed with EORA tend to present with larger joint involvement, more severe disease manifestations, and more pronounced systemic symptoms. Additionally, they often pose greater challenges for treatment and management due to a higher burden of comorbidities when compared to those with young-onset rheumatoid arthritis (YORA). To enhance our understanding of the pathophysiology associated with senior RA and establish a foundation for personalized therapeutic strategies and preventive measures, it is essential to identify disease-specific genetic factors, given that no distinct key genes have been established for this condition.

In order to address the challenges associated with the storage, processing, and interpretation of biological data, bioinformatics emerges as an interdisciplinary field that integrates biology, computer science, statistics, and mathematics. As high-throughput sequencing technologies and other biotechnological advancements continue to evolve, bioinformatics is becoming increasingly significant in contemporary biological research. Machine learning encompasses a range of methods and approaches designed for automatic learning and prediction from data. These techniques are extensively utilized within bioinformatics and can be applied across various domains including metabolomics, proteomics, transcriptomics, and genomics. To provide new insights for the early diagnosis and treatment of elderly rheumatoid arthritis (RA), we employed bioinformatics alongside machine learning methodologies to evaluate key genes and investigate immune infiltration patterns related to elderly RA using publicly available databases (Fig. 1).

Results

Preparing the data

R was employed to standardize the three datasets: GSE55457, GSE55584, and GSE55235. Figure 2 illustrates the outcomes of this processing. Figure 3(a) presents the results of principal component analysis (PCA) following the integration of the three datasets and the removal of batch effects.

Differential gene screening for ARDEGs

The merged datasets were ultimately screened for 418 differentially expressed genes (DEGs), comprising 242 up-regulated genes and 176 down-regulated genes. The volcano plot is presented in Fig. 3(b), while the heat map illustrating the top 50 genes is displayed in Fig. 3(c). The aging-related genes (ARGs) obtained from the Human Aging Genome Resource (HAGR) database yielded a total of 1,061 unique ARGs after removing duplicates. An intersection analysis with DEGs identified 50 aging-associated differentially expressed genes (ARDEGs), as depicted in the Venn diagram shown in Fig.3(d).

Analysis of enrichment

In the gene ontology (GO) enrichment analysis at the molecular function (MF) level, we identified several significantly enriched functional categories. Figure 4(a) presents the top ten enriched GO terms organized by gene count. The most highly enriched GO keywords include “DNA-binding transcription activator activity” and “RNA polymerase II-specific DNA-binding transcription activator activity.” Additionally, significant enrichment was observed for “chemoattractant activity” and “transcription regulator binding,” indicating a potential role in cell signaling and transcriptional regulatory networks. In the gene ontology (GO) enrichment analysis at the biological process (BP) level, we also identified several significantly enriched biological processes. Figure 4(b) illustrates the top ten enriched GO terms arranged by gene counts. The most notably enriched GO keywords were “response to glucocorticoid” and “response to corticosteroid.” Furthermore, there was substantial enrichment in “mononuclear cell differentiation” and “epithelial cell proliferation.”

In the KEGG pathway enrichment analysis, we identified several biological pathways that were significantly enriched. The top ten significantly enriched pathways, ranked by gene count, are presented in Fig. 4(c). Among these, the most notably enriched pathways included “epithelial cell proliferation.” The “FoxO signaling pathway,” “Kaposi sarcoma-associated herpesvirus infection,” and “Human T-cell leukemia virus 1 infection” emerged as the three most significantly enriched pathways. Furthermore, both the “PI3K-Akt signaling pathway” and “Breast cancer” pathways demonstrated substantial enrichment as well.

Hub gene screening and PPI network construction

Figure 5(a) illustrates the PPI network graph of ARDEGs, where a darker hue indicates a higher interaction score or greater interaction confidence. This network comprises 1758 edges and 100 nodes. The application of the MCC algorithm from the Cytohubba plugin resulted in the identification of 95 hub genes; the top 20 genes ranked by their scores are presented below (Fig. 5b).

Screening of feature genes

Following LASSO regression analysis, five genes were identified through feature gene screening: PTX1, NR4A1, IL1R1, SFRP1, and EGFR. The cross-validation curves produced during this analysis are illustrated in Fig. 6(a, b). Subsequently, SVM-RFE was employed to further assess the screened genes, as depicted in Fig. 6(c, d), resulting in the identification of 11 feature genes: BCL2, CD44, EGFR, IL1B, JAK2, JUN, MAD2, MAD3, MYC, PPARG, and STAT1. Additionally, Random Forest (RF) was utilized to select the top 10 genes; these are presented in Fig. 6(e, f), which includes CD44, EGFR, FOS, JAK2, JUN, SMAD2, SMAD3, MYC, PPARG, and STAT1. By intersecting the genes identified through the three machine learning techniques and creating a Venn diagram (Fig. 6g), four key genes—STAT1, JUN, MYC, and EGFR—were obtained. The expression levels of these four genes within both the training set and validation set are displayed using box plots in Fig. 7(a, b). It is noteworthy that while JUN, MYC, and EGFR exhibit low expression levels in disease states, STAT1 shows significantly elevated expression. Furthermore, Fig. 7(c) presents ROC curves indicating that STAT1 has the highest AUC value at 0.94.

Analysis of immune infiltration

Analysis of immune infiltration revealed that 28 distinct immune cell types exhibited varying distributions across the samples. Heatmap analysis (Fig. 8a) demonstrated significant variations in the levels of infiltration among different immune cells, suggesting a potential reflection of the variability within the immunological microenvironment. Furthermore, box plot analysis (Fig. 8b) of ssGSEA scores indicated that the infiltration levels of various immune cell types were significantly different between the rheumatoid arthritis (RA) group and the normal control group. Notably, effector memory CD4 T cells, activated CD8 T cells, and natural killer cells showed markedly higher levels in the RA group compared to those in the normal group. These findings suggest that these specific immune cell populations may play a critical role in the pathophysiology of RA. Spearman correlation analysis (as illustrated in Fig. 8c) indicates that color intensity within the heatmap corresponds to Pearson correlation coefficients (r values) between genes and immune cell types. A stronger correlation is represented by darker colors; specifically, red signifies positive correlations while blue denotes negative correlations. STAT1 exhibits a negative correlation with macrophages and regulatory T cells, while demonstrating a positive correlation with various immune cell types, particularly T cells (including Type 1 T helper cells, gamma delta T cells, activated CD8 T cells, etc.) and B cells (such as activated B cells). The majority of immune cell types show a negative correlation with EGFR, especially T cells (such as Type 1 T helper cells, gamma delta T cells, activated CD8 T cells, etc.) and B cells (including activated B cells); in contrast, macrophages display a positive correlation with EGFR. Most immune cell types also exhibit a negative correlation with JUN, particularly T cells (like Type 1 T helper cell, gamma delta T cell, activated CD8 T cell) and B cells (such as activated B cell); conversely, macrophages demonstrate a positive correlation with JUN. MYC is positively correlated with macrophages but negatively correlated with the majority of immune cell types—especially those within the categories of T lymphocytes (e.g. Type 1 T helper cells, gamma delta T cells, activated CD8 T cells) and B lymphocytes (for instance: activated B cells).

Single-cell RNA sequencing analysis

The analysis of the GSE279838 dataset utilizing single-cell RNA sequencing, which compared three healthy controls to three rheumatoid arthritis (RA) groups, yielded several significant findings: (a) Data Quality Control and Batch Correction: Rigorous filtering criteria (≥ 200 genes, ≤ 2500 total RNA counts) reduced the dataset from 28,029 to 1,101 high-quality cells (Supplementary Fig. 1). The distributions of gene counts and RNA counts (Supplementary Fig. 2), along with their correlation (Supplementary Fig. 3), confirmed the integrity of the data. Following Harmony batch correction, UMAP visualization illustrated a reduction in inter-sample heterogeneity (Supplementary Fig. 4a-c). (b) Cell Population Identification and Heterogeneity: Low-resolution clustering at a resolution of 0.1 identified major cell populations including T cells (730), monocytes (165), and granulocytes (206), as detailed in Supplementary Table 1. High-resolution clustering at a resolution of 0.8 successfully identified subpopulations, including activated T cells and pro-inflammatory macrophages (Supplementary Fig. 4d-f). A clustree plot further elucidated the hierarchical clustering relationships among these populations (Supplementary Fig. 5). (c) Cell-Type-Specific Expression of Core Genes: (a) STAT1: This gene exhibited high expression levels in RA monocytes and activated T cells, indicating the activation of interferon signaling pathways (Supplementary Fig. 6, Supplementary Fig. 9a). (b) JUN & MYC: These genes were found to be enriched in fibroblast-like synoviocytes, suggesting their involvement in processes related to synovial hyperplasia (Supplementary Fig. 9c-d). (c) EGFR: Upregulation was observed in RA fibroblasts, correlating with abnormal proliferation patterns within this context (Supplementary Fig. 9b). (d) Functional Validation and Biological Insights: UMAP dimensionality reduction techniques highlighted spatial expression gradients of core genes across various cell populations, as illustrated in Supplementary Fig. 7–8. The annotations for subpopulations provided in Supplementary Table 2, along with the corresponding expression patterns, underscored the pathological relevance of STAT1/JUN within RA pathology while also illustrating the functional diversity associated with EGFR/MYC.

Discussion

Chronic inflammation and synovial hyperplasia are defining characteristics of rheumatoid arthritis (RA), an inflammatory condition that can ultimately lead to joint degeneration and functional impairment. Recent studies have uncovered a complex interplay between aging and RA³. Aging exacerbates the onset and progression of RA through mechanisms such as immune system dysregulation, chronic inflammation, cellular senescence, and metabolic disturbances⁴. Conversely, the adverse effects associated with chronic inflammation and RA treatments may also accelerate the aging process. The objective of this study was to identify potential senescence biomarkers for RA and to investigate the roles and mechanisms of senescence-related genes as well as immune infiltration in RA synovial tissues. This research aims to provide new insights into the underlying causes of RA, particularly in its early stages.

After integrating the three datasets retrieved from the GEO database, a total of fifty ARDEGs were identified through differential expression analysis. These ARDEGs exhibited significant enrichment in molecular functions related to DNA-binding transcription activator activity and RNA polymerase II specificity. Additionally, KEGG pathway analysis revealed substantial involvement in the PI3K-Akt signaling pathway. These results align with previous studies^5,6,7,8. Notably, due to the extensive gene interaction data available in the STRING database and the sensitivity of the MCC algorithm to key nodes within networks, importing these 50 ARDEGs into STRING ultimately increased the number of identified key genes to 95. This finding underscores the importance of incorporating topological properties in the analysis of biological networks⁹, potentially revealing broader biological processes pertinent to disease research. The MCC algorithm, along with three machine learning screenings utilizing Cytohubba—a Cytoscape plug-in—identified four key genes: STAT1, JUN, MYC, and EGFR. Furthermore, gene expression analysis demonstrated that synovial samples from patients with rheumatoid arthritis (RA) exhibited significantly elevated levels of STAT1 and markedly reduced levels of JUN, MYC, and EGFR.

STAT1 (Signal Transducer and Activator of Transcription 1) is a member of the STAT protein family, which plays a crucial role in cytokine signaling and has significant functions in immunological regulation, cell division, and growth¹⁰. Recent mechanistic studies have identified STAT1 as a key regulator that links immunosenescence with rheumatoid arthritis (RA) inflammation. First, STAT1 promotes chronic inflammation through hyperactivation of the IFN-γ/JAK-STAT pathway, which correlates with increased synovial levels of IL-6 and TNF-α (r = 0.65–0.71, p < 0.001)¹¹. Second, STAT1 exacerbates immunosenescence by facilitating T-cell exhaustion—evidenced by upregulation of PD-1—and impairing macrophage polarization, as demonstrated in aging murine models¹⁰. While our bioinformatics approach robustly prioritized STAT1 (AUC = 0.94, Fig. 7b), we recognize that exclusive reliance on computational data constrains mechanistic insights. For example, the observed negative correlation between STAT1 and regulatory T cells (r = −0.58, p = 0.002) necessitates validation through flow cytometry or single-cell RNA sequencing to establish causality. Although the pro-inflammatory role of STAT1 in rheumatoid arthritis (RA) is well-documented¹², its age-specific regulatory mechanisms in elderly RA patients remain inadequately explored. Our findings contribute to existing knowledge in two significant ways: (1) Age-Dependent Expression: We found that STAT1 expression was markedly elevated in elderly RA patients (> 60 years) compared to younger cohorts (p = 0.003, Fig. 7a), whereas EGFR exhibited an inverse trend. This observation suggests that aging may disrupt the equilibrium between STAT1 and EGFR, potentially exacerbating disease progression—an insight not previously reported¹³. (2) Immune Microenvironment Associations: A strong correlation was identified between STAT1 and M1 macrophage infiltration (r = 0.67, p < 0.001), which contrasts with earlier studies focusing primarily on its interaction with Th17 cells¹⁴. These discoveries highlight the unique role of STAT1 in elderly RA and lay a foundation for age-stratified therapeutic strategies. Our findings align with the experimental research conducted by Chen Lili et al., which highlighted the significance of STAT1 as a potential biomarker¹⁵.

The elevated expression of STAT1 in monocytes and activated T cells from patients with rheumatoid arthritis (RA) closely correlates with its function in the interferon signaling pathway. Previous studies have established that STAT1 acts as a critical transcription factor within the interferon-γ (IFN-γ) signaling cascade, exacerbating synovial inflammation by activating downstream pro-inflammatory cytokines such as TNF-α and IL-6¹⁶. In this study, monocytes exhibiting high levels of STAT1 expression were significantly enriched in the IFN-γ response pathway, suggesting that STAT1 may facilitate the polarization of monocytes towards a pro-inflammatory phenotype, thereby contributing to the dysregulation of the local immune microenvironment in RA joints¹⁷. Furthermore, the upregulation of STAT1 expression in activated T cells may enhance Th1/Th17 differentiation and further intensify the autoimmune response¹⁸.

The AP-1 transcription factor family, which includes the c-Jun protein encoded by the JUN (Jun proto-oncogene, AP-1 transcription factor subunit) gene, is associated with synovitis and cellular aging. The activator protein-1 (AP-1) family of transcription factors plays a crucial role in cell proliferation, differentiation, and apoptosis; it encompasses the c-Jun protein derived from the JUN gene. Through its regulation of pro-inflammatory cytokine production—such as IL-1, IL-6, and TNF-α—c-Jun enhances inflammatory responses¹⁹. Furthermore, c-Jun promotes synovial fibroblast-like cell proliferation, leading to synovial hyperplasia and joint deterioration¹⁹. Joint injury is further aggravated when the JNK/c-Jun signaling pathway is activated; this activation stimulates both the growth of synovial cells and the synthesis of inflammatory mediators²⁰. The MYC gene, also referred to as the MYC proto-oncogene and bHLH transcription factor, is a member of the MYC gene family, which includes other proto-oncogenes such as C-, N-, and L-MYC. MYC regulates the expression of various genes by encoding a protein that acts as a basic helix-loop-helix (bHLH)-leucine zipper (LZ) transcription factor²¹. Furthermore, MYC promotes the formation of synovial fibroblast-like cells (FLS) and their aberrant proliferation and invasive capabilities through the PI3K-Akt and MAPK signaling pathways²². A study conducted by Jiawei Yao et al. revealed that the expression levels of JUN and MYC are significantly elevated in the synovial tissues of individuals with osteoarthritis²³. These two genes may serve as potential biomarkers for differentiating between osteoarthritis and rheumatoid arthritis.

Epidermal Growth Factor Receptor (EGFR) is a member of the HER/ErbB family, which regulates EGFR growth, and belongs to the Receptor Tyrosine Kinase (RTK) family. The ErbB family plays a crucial role in controlling cell growth, survival, differentiation, and proliferation. One of the primary contributors to synovial tissue hyperplasia and inflammation in rheumatoid arthritis (RA) joints is the hyperactivation of the EGFR signaling pathway within RA synovial fibroblast-like cells (FLS)²⁴. Notably, while this investigation observed a down-regulation of EGFR expression, numerous studies have reported an up-regulation of EGFR in RA^13,25,26, necessitating confirmation through population-based cohorts. Furthermore, it has been demonstrated that EGFR enhances abnormal proliferation and invasive behavior of synoviocytes by activating both the PI3K-Akt and MAPK pathways²². Abnormal activation of EGFR exacerbates arthropathy in elderly patients with RA²⁷. Although this gene test is currently employed in the treatment of cancer, its application in the management of rheumatoid arthritis (RA) remains infrequent. Nonetheless, it holds significant potential as a therapeutic target for older patients with RA.

Activated B cells, activated CD4 T cells, activated CD8 T cells, CD56dim natural killer (NK) cells, macrophages, and type 17 helper T cells (Th17) were identified as significantly up-regulated immune cell populations in rheumatoid arthritis (RA) through immune infiltration analysis. Conversely, natural killer cells, plasma-like dendritic cells (pDCs), follicular helper T cells (Tfh), and type 2 helper T cells (Th2) exhibited significant down-regulation in RA. The marked upregulation of activated T cells (CD4/CD8), B cells, macrophages, and Th17 suggests that patients with RA experience a robust pro-inflammatory immune response. This observation is consistent with findings from previous studies^28,29. Th17 cells are well-recognized contributors to autoimmune inflammation and likely play a pivotal role in the pathophysiology of the disease. Furthermore, the diminished intra-immune homeostasis observed in RA patients is underscored by the down-regulation of regulatory and suppressive immune cell types such as NK cells and plasmacytoid dendritic cells^30,31, which may exacerbate inflammation and accelerate disease progression.

Significant relationships between various immune cell types and the genes STAT1, EGFR, JUN, and MYC were identified in this study through Spearman correlation analysis. While EGFR, JUN, and MYC predominantly exhibited negative correlations with T cells and B cells, STAT1 demonstrated a positive correlation with these cell types. The favorable association between T cells and B cells with STAT1—a key regulator of the interferon signaling pathway—may underscore its critical role in enhancing immunological responses³². In addition to its involvement in monocyte and lymphocyte differentiation, STAT1 has been shown to positively regulate cytokine production, thereby improving adaptive immune responses^33,34. Moreover, the immunomodulatory function of STAT1 has been validated across several diseases, further supporting the findings of the present study^10,35,36,37. The majority of immune cell types, particularly T and B cells, exhibit a negative correlation with EGFR. Research has demonstrated that EGFR mutations can facilitate immune escape by triggering the PD-1/PD-L1 pathway³⁸. Additionally, EGFR may influence the immune microenvironment in non-tumor contexts by suppressing T cell activity or altering the polarization status of macrophages³⁹. The identification of this inverse relationship suggests that EGFR may play a significant immunomodulatory role in non-tumor disorders. Both JUN and MYC show a positive association with macrophages while exhibiting a negative correlation with T and B cells. This dual function in the inflammatory response may be indicative of their roles. JUN and MYC could either sustain the inflammatory environment through enhanced macrophage activity⁴⁰ or promote inflammation by inhibiting adaptive immune responses⁴¹. Furthermore, MYC is recognized as a crucial downstream molecule within the AKT signaling pathway, which may influence immune responses in both tumor and non-tumor conditions⁴².

Naturally, this study has several limitations. First, the majority of the data were derived from public sources in the United States, necessitating further research that incorporates clinical data. Second, there is no experimental validation for this study; it relies solely on bioinformatics analysis. Future investigations should employ in vivo and ex vivo studies to elucidate the true roles of these genes in specific diseases and their potential therapeutic benefits. Third, the sample size utilized in this study was insufficient; to enhance its reliability moving forward, an increase in sample size is essential. A fourth significant limitation pertains to the use of distinct disease and senescence samples, which excluded individuals with rheumatoid arthritis (RA) as well as those suffering from debilitating conditions. Given that debility often coexists with older RA patients and may interact significantly to influence disease symptoms, treatment responses, and prognosis, this design could affect the generalizability and applicability of the findings. Therefore, future research should consider integrating these two patient sample types to explore potential biomarkers and therapeutic targets while also providing a more comprehensive evaluation of the relationship between aging and RA.

Synovial senescence may be closely associated with immunoinflammation, as suggested by this study’s preliminary investigation into the potential mechanisms involving senescence-related genes in rheumatoid arthritis (RA) synovial tissues. Furthermore, the four core genes identified may serve as novel targets for the diagnosis and treatment of RA due to their remarkable diagnostic capabilities. However, further experimental research is necessary to validate our findings.

Methods

Screening and processing of gene expression datasets

Figure 1 illustrates the flowchart of the research study. To filter the dataset in the GEO database, the search query “(rheumatoid arthritis) AND ‘Homo sapiens’” was employed. The “Entry type” was specified as “Series,” and the search term “Expression profiling by array” was utilized. To ensure that the dataset employs gene expression microarray technology, both “Series” for “Entry type” and “Expression profiling by array” for “Study type” were selected. Table 1 presents detailed information about the datasets. The R programming language’s limma (version 3.62.2) package was used to normalize all four datasets, with each dataset’s normalization results depicted through box-and-line plots. The GSE55457, GSE55584, and GSE55235 datasets were integrated and debatched using the sva package (version 3.54.0). Specifically, the ComBat function from the sva package was utilized with its default parameters to harmonize the three datasets following normalization procedures. Subsequently, the corrected data were visualized through principal component analysis (PCA) to confirm the effective removal of batch effects.

Genes linked to aging download

After acquiring the relevant genes from the Human Aging Genome Resource (HAGR) database (https://genomics.senescence.info/), we combined GeneAge (309)⁴³ and CellAge (949)⁴⁴, subsequently eliminating duplicates to generate a comprehensive list of aging-related genes (ARGs) for further analysis.

Identification of aging-related genes with variable expression

The limma package in R was employed to identify differentially expressed genes (DEGs) within the combined dataset. The screening criteria established were |logFC| > 1 and adjusted P < 0.05. The visualization of DEGs was conducted using the ggplot2 (version 3.5.1) and pheatmap (version 1.0.12) packages. Venn diagrams were utilized to illustrate the aging-related differential genes (ARDEGs), which were derived by intersecting the identified DEGs with aging-related genes (ARGs).

Analysis of differential gene enrichment associated with senescence

The clusterProfiler tool (version 4.14.6) in R, along with the org.Hs.eg.db package, was utilized to perform Gene Ontology (GO) and KEGG (Kyoto Encyclopedia of Genes and Genomes)^45,46,47 enrichment analyses on ARDEGs. The results were presented using bar graphs generated within the R environment, showing only statistical summaries of pathway enrichment analysis without incorporating any original KEGG pathway maps or images.

Building protein-protein interactions (PPIs) and screening for hub genes

The ARDEGs were analyzed utilizing the STRING database (https://string-db.org/), with the species parameter set to Homo sapiens and a maximum limit of 50 interactors. The resulting interaction data were subsequently imported into Cytoscape v3.9.1 for the construction of the protein-protein interaction (PPI) network. Hub genes were identified through the Cytohubba plugin, and the top 10 hub genes exhibiting significant interactions were visualized for further analysis.

Feature gene screening with machine learning

Three machine learning techniques were utilized in this study for the screening of feature genes: Random Forest (RF), Support Vector Machine Recursive Feature Elimination (SVM-RFE), and Lasso Regression (LASSO): The Random Forest algorithm (RF), implemented via the ‘randomForest’ R package (version 4.7.1.2), was employed as a supervised machine learning technique to identify significant features. The key parameters were configured as follows: the number of decision trees was set at 500, and the mtry parameter was optimized to 2 through grid search. For feature selection, the top 10 features with the highest importance scores, evaluated by mean decrease in Gini impurity, were designated as aging signature genes.

Support Vector Machine Recursive Feature Elimination (SVM-RFE): The ‘e1071’ (version 1.7.16) and ‘caret’ (version 7.0.1) R packages are employed for its implementation. This method iteratively trains a support vector machine model while systematically eliminating less influential features, thereby optimizing the feature set and enhancing classification performance. The least absolute shrinkage and selection operator (LASSO) regression is a widely used method in data mining. The R package glmnet (version 4.1.8) was utilized to integrate OA-ARDEGs into the diagnostic model, with the alpha parameter of the glmnet function set to 1. The optimal λ value was determined through ten-fold cross-validation, ultimately leading to the identification of aging signature genes based on this best λ value. Lastly, we perform an intersection analysis of the screened genes. The ROC curves for these genes are evaluated in both the training and validation sets. Additionally, box line plots are generated to examine the expression levels of the identified genes.

Analysis of immune infiltration

The GSVA package (version 2.0.7) in R was utilized to perform the ssGSEA immune infiltration analysis. The pheatmap and ggplot packages were employed to visualize the results of the enrichment score calculations for normal and rheumatoid arthritis (RA) samples across 28 immune cell types. Furthermore, the identified core genes underwent Spearman correlation analysis with immune cells.

Single-cell RNA sequencing analysis

This study performed a single-cell RNA sequencing analysis on the GSE279838 dataset, which comprised three healthy control samples and three rheumatoid arthritis (RA) groups. The following analyses were conducted: Quality Control (QC): A custom basic_qc procedure was implemented to filter out low-quality cells, defined as those with ≥ 200 genes and ≤ 2500 total RNA counts. This process resulted in the retention of 1,101 high-quality cells encompassing a total of 21,900 genes. The QC results were validated through bar plots, violin plots, and scatter plots. Batch Correction and Integration: The Harmony algorithm (group.by = “orig. ident”, PCs = 15) was utilized to eliminate batch effects. UMAP visualization confirmed a uniform distribution of cells post-correction. Multi-Resolution Clustering: Louvain clustering identified major populations at a resolution of 0.1 and subpopulations at a resolution of 0.8. A clustree plot illustrated the hierarchical relationships among these clusters. Core Gene Expression Analysis: Dot Plot displayed the expression proportions/means of STAT1, JUN, MYC, and EGFR across different clusters; Feature Plot (UMAP) mapped their spatial distributions within the cell populations.

Table 1 Descriptive statistics.

Full size table

Data availability

The datasets analyzed in this study are publicly available in the GEO (Gene Expression Omnibus) (https://www.ncbi.nlm.nih.gov/geo/) repository under accession numbers GSE55235, GSE55457, GSE55584, GSE12021 and GSE279838. All processed data and analysis results generated during this study are included in this published article.

References

Matteo, A. D., Bathon, J. M. & Emery, P. Rheumatoid arthritis. Lancet 402, 2019–2033 (2023).
Article PubMed Google Scholar
Li, X. et al. Inflammation and aging: signaling pathways and intervention therapies. Signal. Transduct. Target. Ther. 8, 239 (2023).
Article PubMed PubMed Central Google Scholar
Serhal, L., Lwin, M. N., Holroyd, C. & Edwards, C. J. Rheumatoid arthritis in the elderly: characteristics and treatment considerations. Autoimmun. Rev. 19, 102528 (2020).
Article PubMed Google Scholar
Bauer, M. E. Accelerated Immunosenescence in rheumatoid arthritis: impact on clinical progression. Immun. Ageing. 17, 6 (2020).
Article PubMed PubMed Central Google Scholar
Ting Hao, W., Huang, L., Pan, W. & Ren, Y. L. Antioxidant glutathione inhibits inflammation in synovial fibroblasts via PTEN/PI3K/AKT pathway: an in vitro study. Arch. Rheumatol. 37, 212–222 (2022).
Article PubMed Google Scholar
Miura, M., Naito, T. & Saito, M. Current perspectives in human T-Cell leukemia virus type 1 infection and its associated diseases. Front. Med. 9, 867478 (2022).
Article Google Scholar
Feng, S. et al. Identification of Ferroptosis-Related genes in schizophrenia based on bioinformatic analysis. Genes 13, 2168 (2022).
Article PubMed PubMed Central Google Scholar
Ji, M. et al. Integrated phytochemical analysis based on UPLC–MS/MS and network Pharmacology approaches to explore the effect of odontites vulgaris Moench on rheumatoid arthritis. Front. Pharmacol. 12, 707687 (2021).
Article PubMed PubMed Central Google Scholar
Janjic, V. & Przulj, N. Biological function through network topology: a survey of the human diseasome. Brief. Funct. Genomics. 11, 522–532 (2012).
Article PubMed Google Scholar
Asano, T., Utsumi, T., Kagawa, R., Karakawa, S. & Okada, S. Inborn errors of immunity with loss- and gain-of-function germline mutations in STAT1. Clin. Exp. Immunol. 212, 96–106 (2023).
Article PubMed Google Scholar
Kandhaya-Pillai, R. et al. TNF-α/IFN-γ synergy amplifies senescence-associated inflammation and SARS-CoV-2 receptor expression via hyper-activated JAK/STAT1. Aging Cell. 21, e13646 (2022).
Article PubMed PubMed Central Google Scholar
Dey, P., Panga, V. & Raghunathan, S. A cytokine signalling network for the regulation of inducible nitric oxide synthase expression in rheumatoid arthritis. PLoS One. 11, e0161306 (2016).
Article PubMed PubMed Central Google Scholar
Swanson, C. D. et al. Inhibition of epidermal growth factor receptor tyrosine kinase ameliorates collagen-induced arthritis. J. Immunol. 188, 3513–3521 (2012).
Article PubMed Google Scholar
Zhang, W. et al. Immune cell-related genes in juvenile idiopathic arthritis identified using transcriptomic and single-cell sequencing data. Int. J. Mol. Sci. 24, 10619 (2023).
Article PubMed PubMed Central Google Scholar
Lili, C. et al. Identification of potential biomarkers and immunoregulatory mechanisms of rheumatoid arthritis based on multichip co-analysis of GEO database. J. South. Med. Univ. 44, 1098–1108 (2024).
Google Scholar
Ivashkiv, L. B. IFNγ: Signalling, epigenetics and roles in immunity, metabolism, disease and cancer immunotherapy. Nat. Rev. Immunol. 18, 545–558 (2018).
Article PubMed PubMed Central Google Scholar
Jiao, S. et al. STAT1 mediates cellular senescence induced by angiotensin II and H2O2 in human glomerular mesangial cells. Mol. Cell. Biochem. 365, 9–17 (2012).
Article PubMed Google Scholar
Schnell, A., Littman, D. R. & Kuchroo, V. K. Th17 cell heterogeneity and its role in tissue inflammation. Nat. Immunol. 24, 19–29 (2023).
Article PubMed PubMed Central Google Scholar
Zhang, W. et al. Immune Cell-Related genes in juvenile idiopathic arthritis identified using transcriptomic and Single-Cell sequencing data. IJMS 24, 10619 (2023).
Article PubMed PubMed Central Google Scholar
Loeser, R. F. et al. Deletion of JNK enhances senescence in joint tissues and increases the severity of Age-Related osteoarthritis in mice. Arthritis Rheumatol. 72, 1679–1688 (2020).
Article PubMed PubMed Central Google Scholar
Levens, D. L. & Reconstructing, M. Y. C. Genes Dev. 17, 1071–1077 (2003).
Article PubMed Google Scholar
Yu, Y. & Chen, Y. Role of the PI3K-AKT signaling pathway in proliferation and apoptosis of synovial cells in rheumatoid arthritis. Chin. J. Cell. Mol. Immunol. 30, 1326–1329 (2014).
Google Scholar
Jiawei, Y., Xiongfeng, X., Peng, Y. & Bo, Q. Screening of differential genes and validation of key genes in synovial tissue of osteoarthritis. Chin. J. Tissue Eng. Res. 26, 2881–2887 (2022).
Google Scholar
Ge, Y. et al. Identification of differentially expressed genes, signaling pathways and immune infiltration in rheumatoid arthritis by integrated bioinformatics analysis. Hereditas 158, 5 (2021).
Article PubMed PubMed Central Google Scholar
Killock, D. Targeting EGFR to fight synovitis. Nat. Rev. Rheumatol. 8, 247–247 (2012).
Article PubMed Google Scholar
Huang, C. M. et al. Rheumatoid arthritis is associated with rs17337023 polymorphism and increased serum level of the EGFR protein. PLoS One. 12, e0180604 (2017).
Article PubMed PubMed Central Google Scholar
Yuan, F. L. et al. Epidermal growth factor receptor (EGFR) as a therapeutic target in rheumatoid arthritis. Clin. Rheumatol. 32, 289–292 (2013).
Article PubMed Google Scholar
Chen, S. J. et al. Immunopathogenic mechanisms and novel Immune-Modulated therapies in rheumatoid arthritis. Int. J. Mol. Sci. 20, 1332 (2019).
Article PubMed PubMed Central Google Scholar
Jang, S., Kwon, E. J. & Lee, J. J. Rheumatoid arthritis: pathogenic roles of diverse immune cells. Int. J. Mol. Sci. 23, 905 (2022).
Article PubMed PubMed Central Google Scholar
Azizov, V. & Zaiss, M. M. Alcohol consumption in rheumatoid arthritis: A path through the immune system. Nutrients 13, 1324 (2021).
Article PubMed PubMed Central Google Scholar
Guo, Z. et al. Identification and validation of metabolism-related genes signature and immune infiltration landscape of rheumatoid arthritis based on machine learning. Aging (Albany NY). 15, 3807–3825 (2023).
Article PubMed Google Scholar
Stolzer, I. et al. STAT1 coordinates intestinal epithelial cell death during Gastrointestinal infection upstream of Caspase-8. Mucosal Immunol. 15, 130–142 (2022).
Article PubMed Google Scholar
Yin, G. et al. Classification of bladder cancer based on immune cell infiltration and construction of a risk prediction model for prognosis. Zhejiang Da Xue Xue Bao Yi Xue Ban. 53, 47–57 (2023).
PubMed Google Scholar
Chen, Y., Shi, Z. W., Strickland, A. B. & Shi, M. Cryptococcus neoformans infection in the central nervous system: the battle between host and pathogen. JoF 8, 1069 (2022).
Article PubMed PubMed Central Google Scholar
Yang, Q. et al. Unusual talaromyces Marneffei and Pneumocystis jirovecii coinfection in a child with a STAT1 mutation: A case report and literature review. Front. Immunol. 14, 1103184 (2023).
Article PubMed PubMed Central Google Scholar
Jing, D. Progress in molecular diagnosis and treatment of chronic mucocutaneous candidiasis.
Marié, I. J. et al. Tonic interferon restricts pathogenic IL-17-driven inflammatory disease via balancing the Microbiome. eLife 10, e68371 (2021).
Article PubMed PubMed Central Google Scholar
Chen, N. et al. Upregulation of PD-L1 by EGFR activation mediates the immune escape in EGFR-Driven NSCLC: implication for optional immune targeted therapy for NSCLC patients with EGFR mutation. J. Thorac. Oncol. 10, 910–923 (2015).
Article PubMed Google Scholar
Nan, X., Ling, X., Wanfang, Z. & Fuxiang, Z. Immune-related genes and their determined immune cell microenvironment to predict the prognosis of gastric adenocarcinoma. Natl. Med. J. China 102, 840–846 (2022).
Google Scholar
Peng, M. Research progress in the regulation of inflammatory response by macrophage polarization. Adv. Clin. Med. 12, 6796–6803 (2022).
Article Google Scholar
Dhanasekaran, R. et al. MYC overexpression drives immune evasion in hepatocellular carcinoma that is reversible through restoration of Proinflammatory macrophages. Cancer Res. 83, 626–640 (2023).
Article PubMed PubMed Central Google Scholar
Weber, L. I. & Hartl, M. Strategies to target the cancer driver MYC in tumor cells. Front. Oncol. 13, 1142111 (2023).
Article PubMed PubMed Central Google Scholar
Tacutu, R. et al. Human ageing genomic resources: new and updated databases. Nucleic Acids Res. 46, D1083–D1090 (2018).
Article PubMed Google Scholar
Avelar, R. A. et al. A multidimensional systems biology analysis of cellular senescence in aging and disease. Genome Biol. 21, 91 (2020).
Article PubMed PubMed Central Google Scholar
Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. & Ishiguro-Watanabe, M. KEGG: biological systems database as a model of the real world. Nucleic Acids Res. 53, D672–D677 (2025).
Article PubMed Google Scholar
Kanehisa, M. & Goto, S. K. E. G. G. Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article PubMed PubMed Central Google Scholar
Kanehisa, M. Toward Understanding the origin and evolution of cellular organisms. Protein Sci. 28, 1947–1951 (2019).
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

School of Nursing, Shanxi Medical University, Taiyuan, 030000, China
Xinxin Ji, Lingyun Li, Yuanzhuo Jiao & Hui Cheng

Authors

Xinxin Ji
View author publications
Search author on:PubMed Google Scholar
Lingyun Li
View author publications
Search author on:PubMed Google Scholar
Yuanzhuo Jiao
View author publications
Search author on:PubMed Google Scholar
Hui Cheng
View author publications
Search author on:PubMed Google Scholar

Contributions

X.J. conceived the experiment(s), X.J. and L.L. conducted the experiment(s), X.J. and Y.J. analysed the results, H.C supervised the research and provided guidance. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hui Cheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ji, X., Li, L., Jiao, Y. et al. Aging associated immunosenescence in rheumatoid arthritis identified by machine learning and single cell profiling. Sci Rep 15, 31042 (2025). https://doi.org/10.1038/s41598-025-15370-5

Download citation

Received: 18 February 2025
Accepted: 07 August 2025
Published: 23 August 2025
DOI: https://doi.org/10.1038/s41598-025-15370-5

Subjects

Abstract

Similar content being viewed by others

Identification of essential genes and immune cell infiltration in rheumatoid arthritis by bioinformatics analysis

Identification of aging-related biomarkers and immune infiltration analysis in renal stones by integrated bioinformatics analysis

Identification of potential pathogenic genes associated with the comorbidity of rheumatoid arthritis and renal fibrosis using bioinformatics and machine learning

Introduction

Results

Preparing the data

Differential gene screening for ARDEGs

Analysis of enrichment

Hub gene screening and PPI network construction

Screening of feature genes

Analysis of immune infiltration

Single-cell RNA sequencing analysis

Discussion

Methods

Screening and processing of gene expression datasets

Genes linked to aging download

Identification of aging-related genes with variable expression

Analysis of differential gene enrichment associated with senescence

Building protein-protein interactions (PPIs) and screening for hub genes

Feature gene screening with machine learning

Analysis of immune infiltration

Single-cell RNA sequencing analysis

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links