Background & Summary

Rheumatoid arthritis (RA) is a chronic inflammatory autoimmune disease that typically presents as symmetrical involvement of small joints, characterized by joint swelling, ongoing bone destruction, and permanent joint deformity1,2,3. Current clinical treatments of RA include glucocorticosteroids, non-steroidal anti-inflammatory drugs and disease-modifying antirheumatic drugs (DMARDs). DMARDs, which are further classified into conventional synthetic DMARDs (csDMARDs), biological DMARDs (bDMARDs), and targeted synthetic DMARDs (tsDMARDs), play a critical role in inhibiting bone erosion and joint destruction. Among these, b/tsDMARDs target specific inflammatory molecules, offering rapid and effective symptom relief. Although RA remains incurable, 20–40% of RA patients achieve ≥ 70% improvement criteria based on American College of Rheumatology (ACR70) criteria after six months of csDMARDs treatment, such as methotrexate (MTX). For patients who are insufficient responders to MTX, approximately 20% achieve similar improvements with bDMARDs (e.g., adalimumab, targeting TNF-α) or tsDMARDs (e.g., tofacitinib, targeting the downstream JAK-STAT pathway)4,5,6,7,8. Current clinical guidelines recommend combining b/tsDMARDs with csDMARDs for patients who show inadequate response to csDMARDs9. However, these guidelines lack detailed prioritization for the selection between bDMARDs and tsDMARDs, offering tailored recommendations only for specific patient groups. For instance, advises against the use of TNF inhibitors and instead recommends other bDMARDs or tsDMARDs for RA patients with heart failure10. Similarly, the European Medicines Agency (EMA) recommends tofacitinib only for patients aged ≥ 65 years, smokers, or those with cardiovascular or malignancy risk factors, and only when no suitable alternatives are available11. Despite these recommendations, a significant proportion of RA patients in clinical practice do not have contraindications such as cancer, heart failure, or cardiovascular risks. For this broader population, the selection of the most appropriate b/tsDMARD and the development of precise treatment strategies remain inadequately supported by reliable evidence. Understanding the changes in cellular immune profiles before and after treatment with b/tsDMARDs is thus critical for optimizing drug selection and advancing precision medicine in RA management.

Previous studies investigating the cellular immune response in RA have primarily focus on synovial tissue (ST). While ST is a key target organ in RA, its inaccessibility outside surgical settings limits its utility for routine diagnosis or treatment evaluation. Synovial fluid (SF), in contrast, is co-located with ST in the joint cavity’s inflammatory environment and is readily obtained through arthrocentesis. Previous studies have identified potential biomarkers in SF, such as C-reactive protein and calcium granule binding protein12,13, while flow cytometry has revealed imbalances in macrophage subpopulations. However, these methods lack the resolution to comprehensively characterize the transcriptional, functional diversity, and cell-cell communications of SF cells in RA. Single-cell RNA sequencing (scRNA-seq) addresses this gap by enabling high-throughput transcriptome profiling at single-cell resolution, offering a deeper understanding of cellular heterogeneity and intercellular interactions of immune microenvironment in RA.

In this study, we collected SF samples from RA patients pre- and post-treated with TNF-α/JAK inhibitors and constructed a single-cell atlas of RA SF. This dataset offers a valuable resource for identifying therapeutic targets, understanding the mechanisms of TNF-α/JAK inhibitors (i.e., adalimumab and tofacitinib), and exploring the pathogenesis of inflammatory diseases.

Methods

Ethical statement

This study was approved by the Ethics Committee of West China Hospital, Sichuan University (Approval Number: 2020-1151). All participants provided informed consent for the sharing of their transcriptomic data and were made aware of any associated risks. The open access use of these data was permitted by the Ministry of Science and Technology of China (Registration Number: 2022BAT2228).

Participant information

In our study, nine patients who met the 2010 revised criteria14 for RA established by the American College of Rheumatology were prospectively and randomly enrolled from department of rheumatology and immunology, West China Hospital, Sichuan University. Informed consent was obtained from all participants. At baseline, all patients were required to exhibited high or moderate disease activity and to have an inadequate response to methotrexate. In addition, these patients were required not received glucocorticoids or any bDMARDs or tsDMARDs within 3 months prior to enrollment. Clinical and demographic details are provided in Supplementary Table 1-2. To comprehensively define the transcriptional atlas of SF cells from RA patients, who are MTX insufficient responders, we performed scRNA-seq on nine paired samples from RA patients before and after one month of treatment with TNF-α or JAK inhibitors (referred to as RA-BT and RA-AT, respectively). Specifically, nine enrolled RA patients were treated with either adalimumab (n = 5, 40 mg every 2 weeks) or tofacitinib (n = 4, 5 mg twice a day) in conjunction with a stable dose of MTX. A second knee arthrocentesis was performed one month later to collect SF cells from the same joint.

SF cells isolation

The patient’s knee joint synovial fluid was extracted under aseptic conditions. Hyaluronidase (20426ES60, Yeasen) was added to the fresh SF sample obtained at a final concentration of 100 U/ml and incubated at 37 °C for 30 min. SF samples from each RA patient were obtained from the same joint longitudinal to avoid potential bias.

Cell suspension preparation

To prepare single-cell suspensions for sequencing needs, the fresh SF samples were centrifuged at 800 × g for 10 min, the supernatant was collected and stored at −80 °C, while the cell pellets were suspended by 1 ml red blood cell lysis buffer (R1010, Solarbio) to lyse the red blood cells and incubated on ice for 6 min, followed by centrifugation at 800 × g for 5 min. Next, 800 μl of pre-chilled PBS (BL302A, Biosharp) and 300 μl of pre-chilled debris removal solution (130-109-398, Miltenyi; reagent for cell debris removal through density gradient centrifugation) were added to suspend the cell pellets. After gently adding additional 800ul PBS and centrifuging at 800 × g for 5 min, the liquid will be divided into three phases. The top two phases were discarded, and 2 ml PBS were added to wash the lower phase of pellets once, centrifugation was performed at 800 × g for 5 min to remove contaminants and obtain pure cells. Finally, we collected a 20 μl aliquot for cell counting after discarding the supernatant and resuspended it by adding 500 μl PBS. All centrifugation procedures were carried out at 4 °C. Finally, high-quality single-cell suspensions were prepared for scRNA-seq.

Single cell library preparation and sequencing

The cell suspension was placed into chromium microfluidic chips, and a 10 × Chromium Controller (10× Genomics) was used to barcode the samples. Next, using components from a Chromium Single Cell 3’ (v3) reagent kit (10× Genomics), sequencing libraries were created using RNA from the barcoded cells after reverse transcription, in accordance with the manufacturer’s instructions. Sequencing was carried out using the Illumina NovaSeq 6000 platform in accordance with the manufacturer’s instructions. Single cell library preparation and subsequent next generation sequencing was performed by Novogene Co., Ltd. All the scRNA-seq data has passed the quality control.

Data processing and analysis

The scRNA-seq data was analyzed using our previously established workflow15,16,17,18,19,20,21,22,23. Briefly, raw FASTQ files generated by 10× Genomics were aligned and quantified using Cell Ranger (v4.3.0) against the GRCh38 human reference genome (v6.1.2). The Read10X function of the Seurat package was used to read the output of Cell Ranger. Doublets were identified and removed using the scrublet tool in Python, with an expected doublet rate of 0.1. Next, to facilitate downstream analyses, cells were distinctly labeled using the RenameCells function and integrated into a single aggregate object using the Merge function. To ensure data quality, stringent filtering parameters were applied to weed out any empty oil beads, mortality, and doublet cells, excluding cells with fewer than 200 or more than 4,000 detected genes, or with mitochondrial content exceeding 15%. Global scaling normalization was performed using the ‘LogNormalize’ method with a scaling factor of 10,000 to equalize overall gene expression across cells. The FindVariableFeatures function identified the top 2,000 highly variable genes, which were used for dimensionality reduction via principal component analysis (PCA). The first 30 principal components were selected, and the ScaleData function was applied to scale the expression of all genes, followed by regression of mitochondrial content to remove confounding variation. Sample batch effect correction was performed using the RunFastMNN function24, which is based on a multi-canonical correlation analysis algorithm. For clustering, the FindNeighbors and FindClusters functions with the built-in Louvain method were applied. Clusters were visualized using uniform manifold approximation and projection (UMAP). Cluster-specific marker genes were identified using the non-parametric Wilcoxon rank-sum test with Bonferroni correction (adjusted p value < 0.05) implemented in the FindAllMarkers function (min.pct = 0.25, logfc.threshold = 0.25). Cell populations were annotated based on canonical marker genes from published literature. Due to the inherent instability of neutrophils during processing and their predominant origin from a small number of samples, this cell type was excluded from subsequent analyses to mitigate technical bias. The proportion of each cluster was estimated by dividing the count of cells within each cluster by the total cell count in the respective sample. These ratios were statistically compared across treatment groups to assess significant differences in cell composition. The detailed workflow for data processing, including the steps for transferring data and generating figures and analyses presented in this study, is available on our GitHub repository at (https://github.com/Xiaxy-XuLab/RA-SF-data_discriptor).

Data Records

The dataset is available at the National Center for Biotechnology Information (NCBl) Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo) under accession number GSE29611725. To facilitate the reproduction of single-cell transcriptomic profiling, RDS (a file format to save Seurat objects) files for the RA SF scRNA-seq samples are included in the same GEO accession. The raw sequence data generated in this study have been deposited in the Genome Sequence Archive at National Genomics Data Center, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (https://ngdc.cncb.ac.cn/gsa-human/browse/HRA011646) and are available under accession number HRA01164626.

Technical Validation

Following SF cell isolation, cell suspension preparation, and single-cell library construction, scRNA-seq was conducted (Fig. 1). The raw scRNA-seq metrics for each sample are summarized in Supplementary Table 3. The total number of reads per sample exceeded 342 million. The Q20 base ranged from 94.34% to 97.22%, Q30 from 89.28% to 93.84%, and the GC content centered around 45%. Cell-based quality control metrics (Supplementary Table 4) showed that each sample contained more than 2,000 captured cells, with mean reads per cell ranging from 31,819 to 192,851 and median UMI counts per cell ranging from 2,420 to 10,387. The number of genes detected per sample varied from 11,318 to 23,562, with the median genes per cell ranging from 1,259 to 2,875. Sequencing saturation levels ranged from 60% to 90%, and potential doublets per sample ranged from 40 to 811 (expected doublet rate: 0.1). Mitochondrial RNA (mtRNA) levels were consistently low, reflecting high cell quality. After rigorous quality control, we obtained 100,387 high-quality cells across 18 samples, with cell counts per sample detailed in Supplementary Table 4. In summary, we generated high-quality single-cell transcriptomic data from nine matched pairs of synovial fluid samples collected from RA patients before and after treatment with adalimumab/tofacitinib.

Fig. 1
figure 1

Schematic overview of experimental design. We collected SF samples from the knee joint of patients with RA who had not received b/tsDMARDs treatment at least in the last three months. RA patients were treated with adalimumab (40 mg every 2 weeks) or tofacitinib (5 mg twice a day) for 1 month after enrollment. After 1 month, patients with RA receive the second arthrocentesis of the knee to obtain the SF from the same joint. After the SF cells isolation, cell suspension preparation and single cell library preparation, a single-cell RNA sequencing was performed. RA, rheumatoid arthritis; ADA, adalimumab; TOF, tofacitinib.

To elucidate the cellular composition of SF from these RA patients, we employed the Seurat software package to stratify cells that passed quality control. A total of seven major cell clusters were identified and visualized using UMAP analysis (Fig. 2a). Cell clusters were annotated based on canonical marker genes derived from published literature27,28,29,30,31, as shown in a dot plot (Fig. 2b). Fibroblasts, a major component of ST samples in RA according to previous reports32,33, were present in lower proportions in SF and were primarily detected in one patient, while macrophages and T cells were the predominant cell types in SF (Fig. 3). Overall, we presented the cellular composition of each sample (Fig. 3a) and provided a statistical analysis of the cellular proportions based on the treatment status of samples (Fig. 3b).

Fig. 2
figure 2

Cell clustering and identification of scRNA-Seq samples. (a) Visualization of seven main clusters across 100,387 cells using uniform manifold approximation and projection (UMAP). (b) Dot plot illustrating the normalized expression level of cell type-specific marker genes across SF clusters. The dots’ size and color spectrum indicate positive percentage and average expression (log1p transformed) of particular markers genes in each cell type, respectively. (c) Visualization of cells based on the treatment status of samples by using UMAP. BT, before treatment; AT, after treatment.

Fig. 3
figure 3

Proportion analysis of cell clusters. (a) Quantification of relative proportions of distinct cell clusters in SF among patients. The horizontal coordinates depict cell proportions, the vertical coordinates depict patients. The left and right sides represent samples before and after treatment, respectively. (b) Proportional analysis. Proportions of distinct primary cell clusters within each group.

To investigate the molecular changes induced by adalimumab/tofacitinib treatment, we profiled the expression of established RA-related genes across cell types. Consistent with previous scRNA-seq-based findings in ST samples from RA32,34,35, IL6, IL1A, IL12A, and IL12B exhibited low positive expression, while JAK1 and JAK3 were expressed at higher levels compared to TNF (Fig. 4a). Expression of JAK1 and JAK3 was significantly elevated in RA-BT samples and decreased following treatment with adalimumab/tofacitinib (Fig. 4b). While the quantitative validation confirms the persistent low expression of JAK2/3 across most cellular subpopulations, tofacitinib selectively downregulated JAK1/JAK2/JAK3 in T cells and JAK2/JAK3 in macrophages, whereas adalimumab showed no such specificity (Fig. 4b).

Fig. 4
figure 4

Expression of disease-related genes in SF cells. (a) Violin Plot depicting the expression level of well-reported RA-related genes in the major cell types. (b) Dot plot illustrating the pairwise comparisons of expression levels of well-reported RA-related genes in different cell clusters. The p values were calculated using a two-sided Wilcoxon test. The dot size and color spectrum indicate p value (-log10 transformed) and fold change (log2 transformed) of gene expression, respectively.

Usage Notes

Our dataset comprises scRNA-seq data derived from SF samples of RA patients collected before and after treatment with tofacitinib/adalimumab. It provides a comprehensive profile of cellular composition, molecular signaling pathways, and functional characteristics. This dataset offers a valuable resource for addressing a range of research questions, including elucidating the mechanisms of action of tofacitinib and adalimumab, identifying novel therapeutic targets for RA, and exploring the pathogenesis of other inflammatory diseases. The raw sequence data generated in this study have been deposited in the GSA-human database, researchers can apply to the Data Access Committee (DAC) via the GSA platform. DAC approval for access to the dataset will be granted to all users that agree to the terms and conditions of the data access agreement. For detailed application procedures, please refer to the official guide (https://ngdc.cncb.ac.cn/gsa-human/document/GSA-Human_Request_Guide_for_Users_us.pdf), and data access requests are typically processed within two weeks.