Introduction

B lymphocytes (or B cells) descend from bone marrow (BM) hematopoietic stem cells (HSC) and mediate humoral immunity by secreting antibodies. Early studies based on a selected set of surface and intracellular markers and specific gene knockout mice have greatly added to the knowledge repertoire of B cells. They include but are not limited to, i) B cell development stages as defined by the status of VDJ gene rearrangement of heavy and light chains, ii) central and peripheral tolerance that ensure removal or anergy of self-reactive B cells, iii) well-organized BM niches required for B cell lymphogenesis, and iv) the germinal center (GC) reaction model that contributes to the genesis of affinity-matured B cells with the facilitation of follicular helper T cells (Tfh) and dendritic cells (FDC).

As pointed out by both Lee et al.1 and King et al.2, previous low-dimensional and low-throughput methodologies fall short of full demarcation of distinct B cell subsets and deconvolution of intrinsically connected biological processes. The advent of single-cell technologies now has bridged this gap and led to breakthroughs in understanding B cell heterogeneity, lineage trajectories, B cell receptor (BCR) repertoires, and cell-cell communication networks3. For example, Lee et al. coupled single-cell RNA sequencing (scRNA-seq) and Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-Seq) and revealed two distinct proliferative phases (i.e., pre-BCR-dependent and pre-BCR-independent) in the pre-B cell expansion stage in mice1. Furthermore, by correlating with the canonical markers, they identified YBX3 and EBF1 as the defining features for the two phases, respectively. Another investigation, performed by King et al., focused on B-cell maturation pathway delineation by integrating single-cell transcriptomic and antibody repertoire data2. The collaborators reported a pre-GC state programmed to undergo class switch and the antibody-class-dictated B cell fate decision. More than these, single-cell techniques also prove to be powerful in resolving B cell subset heterogeneity4,5, revealing new niches where B cell develops6, and shedding light on B cell compartment aberrance in B-cell-related carcinoma7.

Given these progresses, there still lacks a comprehensive high-dimensional single-cell analysis that covers a continuum of B cell stages ranging from early progenitor to mature naïve compartment and then to antigen-experienced memory compartment or terminally differentiated PC. Neither the heterogeneity of certain conventional B cell subsets nor the environmental factors contributing to the development of B cells in humans were carefully examined.

Inspired by the success of the aforementioned B cell studies, we analyzed the single-cell transcriptomic and BCR sequencing data of B cells from BM, peripheral blood (PBL) and tonsil GC procured from either in-house experiments or external studies. We provided a gene regulation landscape for 18 well-defined B cell subpopulations covering a continuum of developmental stages. By dissecting the heterogeneity of conventional B cell subsets (i.e., immature, naïve and memory B cells), we revealed interesting properties related to several biological processes, including B cell senescence, homeostatic proliferation (HP) and memory B cell differentiation. Finally, we constructed tissue-specific interaction networks between B and non-B cells and demonstrated the interaction attributes both quantitatively and qualitatively along the B cell developmental axis. Our work resolved the gene regulation dynamics, the heterogeneity of conventional B cells, and the underlying role of surrounding non-B cell types and signaling as B cells develop and mature, which will serve as a valuable resource for future studies.

Results

Study design and the integrated single-cell dataset

This study aimed to identify a full spectrum of B cell subpopulations along the development axis, depict the underlying gene regulations and their relationships, and subsequently delineate cell–cell communications among them and other coexisting cell types. The scRNA-seq data of the enriched early B cell groups7, the tonsil GC B cells2, and peripheral blood mononuclear cells (PBMCs)8 were downloaded via previous works (Table S1). To acquire thorough B cell groups and their corresponding environmental cell types, we obtained BM aspirates and PBL from three individuals free from hematological disorders (FFHD). For each sample, we conducted scRNA-seq for both total mononuclear cells (BM only) and enriched B cells (CD3CD41aCD43CD235a) (see Materials and Methods). In addition, we also performed scVDJ-seq for the B cells and obtained bulk antibody repertoires for the PBL and BM samples via DUMPArts developed earlier in the lab9. Subsequently, we integrated our data with datasets from previous studies and thus secured a wealth of resources for B cell subpopulation identification and the successive analyses of gene regulation and cell-cell interactions (CCIs) crucial for B cell development and maturation (Fig. 1A).

Fig. 1: Study design and the landscape of cell populations in the integrated single-cell dataset.
Fig. 1: Study design and the landscape of cell populations in the integrated single-cell dataset.The alternative text for this image may have been generated using AI.
Full size image

A Overview of the study design. UMAP of 42 cell populations (B) and data sources (C) in the comprehensive integrated data comprising in-house enriched B cells and unsorted BMMCs, and external data (enriched early B progenitors, GC B cells and unsorted PBMC, see Table S1 for details). UMAP of B cell subpopulations (D) and sample sources (E) in the enriched B cell dataset. F Expression of marker genes across 18 B cell subpopulations. G Tissue-specific composition of B cells across donors and as a whole. D1, donor 1; D2, donor 2; D3 donor 3. H Frequency comparison of B cell subpopulations between BM and PBL (only 10 subpopulations that represent at least 1% of the enriched B cells in either BM or PBL are shown).

This study involved 199,004 cells, of which 129,832 (65.2%) were newly generated in our lab and 69,172 (34.8%) were from external datasets. Quality-wise, the enriched B cells and unsorted BM mononuclear cells (BMMCs) captured medians of 1420 and 1414 genes, respectively. The median captured transcripts were 3662 and 4134. Table S2 and Fig. S1 provide detailed quality metrics for scRNA-seq and scVDJ-seq samples. Data integration was performed using Seurat’s “fast integration” method to eliminate the batch effect. We then employed a two-round clustering to identify cell populations unbiasedly. The first round of clustering distinguished various cell lineages (Fig. S2A) and the second round revealed specific cell populations within a typical lineage (Fig. S2B-E) (see Materials and Methods). These well-classified cell populations thus served as the starting point for the downstream analysis after being retro-integrated into the original dataset.

As a result, we identified forty-two cell populations, including 5 hematopoietic stem and progenitor cell (HSPC) populations, 18 B lineage populations, 6T/NK lineage populations, 9 myeloid populations, 3 erythroid/megakaryocyte populations, and 1 non-hematopoietic population (Fig. 1B). The data source distribution also reflects a reasonable cell identity assignment (Fig. 1C). Importantly, our result reproduced the frequencies of cell population in CD45+ BMMC10 and unsorted PBMC8 indicating the analyses approach was reliable and solid (Fig. S3). Table S3 provides detailed statistics, including cell population frequencies in the integrated data. Therefore, this well-annotated integrated dataset provided a solid foundation for investigating B-cell heterogeneity and CCIs underlying B-cell development and maturation.

Given that B lineage is the primary focus of this study, we carefully examined the resulting B cell clusters. Combining the collected canonical marker genes and computed cluster-specific genes, we identified 18 B cell subpopulations, covering the entire spectrum of B cell development and maturation, from early B cell progenitors to terminally differentiated plasma cells (PC) (Fig. 1D-F and Fig. S4). The data and tissue source composition for each B cell subpopulation were provided as Fig. S5. Notably, aside the 14 well-known B cell subpopulations, four minor subpopulations with specific highly expressed genes were identified for B cells of immature and naïve phenotypes (i.e., FTLhi and S100A8hi for immature B cells, FOShi and HSPA1Ahi for naïve B cells) (Fig. 1F), which will be examined in depth later.

We then quantified these B cell subpopulations for the in-house enriched B cell compartment. Overall, B lineage cells represented 85.6% and 92.9% of enriched BM and PBL, respectively. The non-B compartment was dominated by myeloid cells (for both BM and PBL) and mesenchymal stem cells (for BM only) (Fig. S6). B cell subpopulation frequencies vary among donors (Fig. 1G and Table S4). For instance, D1 and D3 had a significantly higher proportion of BM PC (and plasmablast (PB)) and immature B cells, respectively, whereas D3 had a higher proportion of PBL FOShi naïve B cells. Despite the donor variation, the proportions of major B cell subsets (immature, naïve, memory B cells, and PC) in mixed samples recapitulated the estimation previously reported11,12 (Fig. 1H). Nonetheless, it is worth noting that we only observed a minimal fraction (<5‰) of early-stage B cells (i.e., pre-pro, (cycling) pro, and (cycling) pre-B cells) in the enriched BM B cell compartment. In addition to their low frequencies in BM B lineage cells11, the underrepresentation of these early-stage subpopulations can also be attributed to the B cell enrichment strategy13 (see Materials and Methods).

A transcriptomic overview of the 18 identified B cell subpopulations

At the transcriptomic level, the number of captured genes for these subpopulations fluctuated significantly along the developmental axis (Fig. 2A). For example, the three proliferating groups, namely cycling pro, cycling pre and DZ GC, exhibited the maximum number of genes (medians of 3664, 3296 and 2268, respectively). Apart from these proliferating subpopulations, the gene numbers showed a V-shape as B cells mature, where the minimum number occurred in immature and PB subpopulations (577, 561 and 548 for immature, S100A8hi immature and PB, respectively). The decreased number of capture genes in immature B cells can probably account for their low metabolism rate, as reported in a previous study14. It is worth noting that we confirmed these observations in independent projects to avoid potential bias caused by the batch effect (Fig. S7).

Fig. 2: Gene regulation dynamics of B cell development revealed by scRNA-seq analysis.
Fig. 2: Gene regulation dynamics of B cell development revealed by scRNA-seq analysis.The alternative text for this image may have been generated using AI.
Full size image

A Distribution of the number of genes captured by scRNA-seq across the 18 identified B cell subpopulations. B BCR isotype composition for the 10 B cell subpopulations captured in our in-house scVDJ-seq data. C Connectivity of B cell subpopulations as shown by the partition-based graph abstraction (PAGA). The connectivity threshold was set as 0.1 for a concise representation of these B-cell subpopulations. The circle size is proportional to the subpopulation size. Line width denotes the strength of connectivity. D Streamline-based visualization of RNA velocities of the 18 B cell subpopulations. E The differentiation speed of the 18 identified B cell subpopulations as given by the length of the velocity vector. F Subpopulation-specific pseudotime distribution inferred based on RNA velocity. The inset in the top left demonstrates a zoom-in pseudotime window occupied by immature, naïve and memory B cell subpopulations. G Profile of the top-active genes (measured by velocity score) for the 18 B cell subpopulations. All transcription factors and a selected list of representative genes are marked at the side. The gene category is color-coded.

For scVDJ-seq, the capture efficiency of BCRs varied among subpopulations (Table S5). Only less than 20% of primary and S100A8hi immature subpopulations were captured with BCRs. In contrast, the BCR capture rates were 89.4% and 86.6% for PB and PC, respectively. IgM was the dominant isotype in all 3 immature (85.0%, 85.9% and 85.4%), 2 naïve (91.0% and 90.9%) and 2 memory subpopulations (49.5% and 64.9%) (Fig. 2B). Notably, IgM was not the exclusive isotype in the 3 immature subpopulations. IgD was also present in around 15% of the cells in all three immature subpopulations. In contrast, class-switched isotypes, mostly IgG and IgA, dominate the classical memory, PB and PC subpopulations.

Extensive mouse-based experiments have led to a well-established B cell development paradigm15. Although it has no debate on the time order of key stages in B cell development, our partition-based graph abstraction (PAGA) analysis revealed a seemingly non-successive transcriptomic stage transition in the pre-immature-naïve development axis. We found pre B cells were more confidently connected to FOShi naïve B cells rather than the intermediate immature stage (Fig. 2C). Despite this counterintuitive observation, the PAGA graph also reflected tight connections between immature and naïve subpopulations and between three cycling subpopulations (i.e., cycling pro, cycling pre and DZ GC) and the distinctness of PB and PC. The overall transcriptomic similarity among cycling pro, cycling pre and DZ GC subpopulations simply reflected a shared gene program machinery in proliferating cells rather than a developmentally connected relationship.

Then RNA velocity analysis was performed to investigate the gene expression kinetics of B cell development (HSPA1Ahi naïve and GC B cells were not included due to the unavailability of raw sequencing data required in this analysis). The streamlines projected on the UMAP (uniform manifold approximation and projection) showed B cell stage transitions in four compartments (part 1: pre-pro -> cycling pro -> pro -> cycling pre; part 2: immature -> naïve; part 3: IgM+ memory -> classical memory, IgM+ memory -> CD27-IgM+IgD+; part 4: PB->PC), separately delineating the local relationship between subpopulations (Fig. 2D). We subsequently investigated the differentiation speed of these B cell subpopulations and found a contrastingly higher RNA velocity for both the earliest stage B cells (from pre-pro to pre to S100A8hi immature) and terminal differentiated PB and PC compared to the rest subpopulations (Fig. 2E). The lower differentiation speed for immature, naïve and memory subpopulations was also reflected by clearly narrow pseudotime windows they occupied, reflecting a relatively quiescent state (Fig. 2F). Lastly, we profiled the top-active genes (measured by velocity score) across all B cell subpopulations (Fig. 2G). Generally, three gene clusters stood out for their restricted activity in early B cells (cluster 1 or C1), memory B cells (cluster 2 or C2) and PC (cluster 3 or C3), respectively. On the contrary, immature and naïve subpopulations were characterized by a sparsely distributed and low-velocity gene expression pattern, suggesting a non-specific regulation machinery and a comparatively resting status.

A selected list of genes, including transcription factors (TF) and some canonical molecules (e.g., surface markers/receptors, recombination enzymes, BCR-encoded genes, and adhesion molecules) was marked out (Fig. 2G and Fig. S8). Specifically, the TFs, ERG, LEF1, SMAD1, and ZNF704, were most active in the earliest stage of B cells (i.e., pre-pro B cells). In contrast, the E2F family TFs peaked at a later stage (i.e., pre B cells). ZNF331 was the sole TF found active in the naïve subpopulation. SOX5, TP63 and TFEC were active in memory B cells, and PRDM1, CREB3L2 and KLF6 were active in PC. Notably, MYBL2, IRF4 and VDR were active in both early B cells and PC.

Two atypical immature B cell subpopulations are characterized by senescence-associated secretory phenotype

Since we identified two formerly unappreciated immature B cell subpopulations, it is of great interest and significance to map them on the developmental trajectory and decipher their functional status. Applying Monocle2 to the trajectory inference analysis, we found the two minor immature subpopulations distributed in different branches, suggesting a distinct development trajectory (Fig. 3A, B). This was consistent with the developmental topologies revealed by two additional methodologies, namely PAGA (Fig. 2C) and RNA velocity analysis (Fig. 3C). Notably, FTLhi and S100A8hi immature B cell subpopulations expressed C1q and S100A8/A9 molecules, respectively (Fig. S4). The two molecules are typically expressed by myeloid cells, such as dendritic cells, macrophages, monocytes and neutrophils, rather than by B lineage cells16,17. This unusual expression pattern further implied that the two subpopulations were out of the normal B cell differentiation pathway and represented two atypical B cell subpopulations. To double confirm it, we scrutinized 6 additional published human BM scRNA-seq datasets and found the two subpopulations were not common in these external BM samples10,18,19,20,21,22. Briefly, only the S100A8hi immature B cell subpopulation was recovered in two of the six datasets, while none of these datasets contained the FTLhi immature B cell subpopulation (see the last section for details).

Fig. 3: Two minor immature subpopulations are characterized by atypical differentiation and senescence-associated secretory phenotype.
Fig. 3: Two minor immature subpopulations are characterized by atypical differentiation and senescence-associated secretory phenotype.The alternative text for this image may have been generated using AI.
Full size image

Transcriptomic-based inference of developmental trajectory for three immature B cell subpopulations with monocle2. Both B cell subpopulation distribution (A) and pseudotime prediction (B) are shown along the trajectory. C Streamline-based visualization of RNA velocities of the immature and naïve B cell subpopulations. The color scheme is the same as that in Fig. 1D. The top 10 enriched biological processes as sorted by significance score (p-value) for S100A8hi (D) and FTLhi (E) immature B cell subpopulations. A longer and lighter bar indicates a more significant sore. Fisher’s exact test (two-sided, corrected), *, p < 0.05; **, p < 0.01, ***, p < 0.001. F The top 5 regulons specific for the FTLhi immature B cell subpopulation as inferred by the SCENIC workflow. Regulation networks reflecting the two regulons (PPARG(+) (G) and NR1H3(+) (H)) predicted by SCENIC workflow. Transcription factors and their corresponding target genes are surrounded by orange diamonds and blue ellipses, respectively. C1q component genes, namely C1QA, C1QB, and C1QC, are highlighted in pink. BCR heavy chain variable (I) and joining (J) gene usage as determined by scVDJ-seq data across three immature B cell subpopulations. For variable genes, only those with a usage frequency of at least 1% are shown. K Pie charts showing the relative frequencies of the top 100 clonotypes for 3 immature B cell subpopulations. L Venn diagram showing clonotype sharing between three immature B cell subpopulations. The numbers on the diagram indicate the number of clonotypes in each compartment. The numbers with a pink background indicate the clonal similarity as measured by the Jaccard similarity index.

Subsequently, we performed GO enrichment analysis to interrogate the functional state of the two minor immature subpopulations. The result showed that the S100A8hi immature B cell subpopulation was enriched for biological pathways like “Regulation Of Superoxide Anion Generation”, “Positive Regulation Of Reactive Oxygen Species Metabolic Process”, and “Inflammatory Response” (Fig. 3D), which seemed to correlate with the senescence-induced inflammatory process23. As for FTLhi immature B cell subpopulation, lipid metabolism-related pathways including “Long-Chain Fatty Acid Transport”, “Cholesterol Efflux” and “Glycolipid Transport” were enriched (Fig. 3E). Considering the high expression of C1q components (i.e., C1QA, C1QB and C1QC) in this subpopulation, these lipid metabolism pathways may facilitate the exportation of this multi-faced effector component of the innate immune response. It has been reported that serum C1q levels in plasma increase with aging24. The atypical expression of C1q by B lymphocytes probably represents another source of these inflammatory molecules during aging. We then investigated the regulatory factors driving the atypical expression of C1q with SCENIC workflow. The result revealed that NR1H3 and PPARG were the two most specific TFs in FTLhi immature B cell subpopulation among all B cell subpopulations (Fig. 3F and Fig. S9). Moreover, three C1q component genes were also among the gene lists of their associated regulons (Fig. 3G, H).

We further characterized their expressed BCR repertoire with scVDJ-seq data. Although variations of heavy chain variable gene usage were observed among three immature B cell subpopulations, they could not be reproduced in individual donors (Fig. 3I and Fig. S10A), which is also the case for heavy chain CDR3 length distribution (Fig. S10B). In contrast, the heavy chain joining gene, light chain variable and joining genes, and light chain CDR3 demonstrated comparable usage frequencies or length distributions (Fig. 3J and Fig. S10C-E), which suggested a BCR-independent induction pathway for the two atypical B cell subpopulations. The clonality analysis indicated that all three subpopulations featured an evenly distributed clonal size without dominant clonotypes (Fig. 3K). However, we indeed found a clonal relationship between these subpopulations, indicating a modest homeostasis-like proliferation of immature B cells. Notably, we observed a tenfold higher clonal similarity (as measured by the Jaccard similarity index) between primary and FTLhi subpopulations than between S100A8hi and either of the other two subpopulations, suggesting a closer relationship between the two subpopulations (Fig. 3L). However, it should be noted that the shared clonotypes (n = 40) between primary and FTLhi immature B cell subpopulations all come from a single donor (D3). Therefore, whether this observation can be extrapolated remains to be validated.

Naïve B cells show a confined HP and an individualized activation pattern

Naïve B cells are generally deemed as a homogenous B cell subset. However, King et al. described two additional naïve-like subsets coined as “activated” and “preGC” in their GC response study2. With this in mind, we compared the two naïve subpopulations (i.e., FOShi and HSPA1Ahi naïve B cells) in this work with the two they described. A high consistency in the top 10 highly expressed genes was observed between the FOShi naïve B cells and the “activated” subset (Fig. S11), pointing to the activated phenotype of the FOShi subpopulation. The activated state of the FOShi subpopulation was also implied by both PAGA analysis (Fig. 2C) and the development trajectory constructed by Monocle2, where a primary naïve -> FOShi -> HSPA1Ahi development axis was revealed (Fig. 4A, B). Notably, both primary and FOShi naïve B cell subpopulations were found in all three sources (BM, PBL, and tonsil), whereas nearly all cells in the HSPA1Ahi subpopulation come from the tonsil (Fig. S5B). This particular sample source for the HSPA1Ahi subpopulation, together with the niche it occupied in the development trajectory, led us to assume that it represents the “preGC” subset termed by King et al. However, we could not observe additional evidence to support this assumption. The marker genes expressed in the “preGC” subset were not pronounced in the HSPA1Ahi subpopulation (Fig. S12). Moreover, tracing back to King’s dataset showed that these cells primarily come from the “naïve” subset.

Fig. 4: Naïve B cell is characterized by a confined HP and an individualized activation pattern.
Fig. 4: Naïve B cell is characterized by a confined HP and an individualized activation pattern.The alternative text for this image may have been generated using AI.
Full size image

Transcriptomic-based inference of developmental trajectory for three naïve B cell subpopulations with monocle2. Both B cell subpopulation distribution (A) and pseudotime prediction (B) are shown along the trajectory. The top 10 enriched biological processes as sorted by significance score (p-value) for FOShi (C) and HSPA1Ahi (D) naïve B cell subpopulations. A longer and lighter bar indicates a more significant score. Fisher’s exact test (two-sided, corrected), *, p < 0.05; **, p < 0.01, ***, p < 0.001. E Gene expression of a TF (HMGA1) and 6 enzyme genes (GAPDH, NDUFB2, PRDX1, PKM, SOD1 and TPI1) was highly expressed in the HSPA1Ahi subpopulation in three naïve subpopulations and the succeeding GC B cell subpopulations. F Overlapping of commonly expressed genes (CEGs) of three naïve subpopulations. G BCR isotype composition for HSPA1Ahi, LZ GC and DZ GC B cells (based on King’s dataset). BCR heavy chain variable (H) and joining (I) gene usage for the primary and FOShi naïve B cell subpopulations. For variable genes, only those with a usage frequency of at least 1% are shown. J Pie charts showing the relative frequencies of the top 100 clonotypes for 3 naïve B cell subpopulations. K Venn diagram showing clonotype sharing between primary and FOShi naïve B cell subpopulations for two donors (D2 and D3). Numbers on the diagram indicate the number of clonotypes in each compartment.

To gain insight into the two particular naïve-like subsets, we performed GO enrichment analysis to investigate their underlying biological processes. In concordance with the activated phenotype, the FOShi subpopulation was enriched with the term “positive regulation of transcription by RNA polymerase II” (Fig. 4C). In contrast, the HSPA1Ahi subpopulation was enriched with multiple terms related to translations as well as the term “response to unfolded protein” (Fig. 4D). Remarkably, the latter is indeed constituted by genes encoding heat shock proteins (HSPs), as the markers of this subpopulation. It has been reported that HSP90 is upregulated following CD3/CD28 stimulation, suggesting that HSP expression might be regulated via TCR25. We therefore hypothesized that HSP upregulation in B lymphocytes shares a similar mechanism and performed the naïve B cell activation assay by targeting the BCR (i.e., IgM). The results revealed that HSPA1A mRNA expression reached its peak at ~2 h following IgM stimulation, whereas FOS expression reached its maximum as early as 10 min, confirming the sequential upregulation of FOS and HSPA1A upon activation, as indicated by the pseudotime analysis (Fig. S13). We also identified a TF (HMGA1) and 6 enzymes (GAPDH, NDUFB2, PRDX1, PKM, SOD1 and TPI1) with elevated expressions as B cell matures, which involve in cellular metabolism and are possibly implicated in the emergence of this subpopulation (Fig. 4E). Furthermore, we compared the commonly expressed genes (CEGs, expressed in at least 50% cells within a subpopulation) between three naïve subpopulations. It showed that the HSPA1Ahi subpopulation expressed the minimum set of CEGs, which were included mainly by the two other subpopulations (Fig. 4F). The biological significance of this simplified gene expression machinery should be further elucidated.

Then we characterized the expressed BCRs for these naïve subpopulations (King’s BCR data were used for the inclusion of the HSPA1Ahi subpopulation for comparison). Notably, a higher percentage of the HSPA1Ahi B cells were found with BCRs of switched isotypes (IgG and IgA) compared with the other two subpopulations (Fig. 4G vs Fig. 2B), suggesting the emergence of class switch recombination. Although variations of gene usage were observed between the primary and FOShi naïve subpopulations (HSPA1Ahi has a different donor source and is thus not feasible for gene usage comparison) (Fig. 4H-I and Fig. S14A, B), they could not be reproduced across individuals (Fig. S14C, D), indicating no clear gene usage preference for BCRs primed for activation in physiological condition. We also did not observe tissue-specific gene usage patterns for the two subpopulations (Fig. S14E, F). Lastly, clonality analysis revealed that all three subpopulations had no clonal expansions (Fig. 4J). However, we observed a clonal relationship between the primary and FOShi naïve subpopulation for both donors (Fig. 4K). Remarkably, this clonal relationship between subpopulations was only observed in the same source, reflecting a local and confined B cell HP26.

Evidence supporting two distinct development models for memory B cells coexists

The heterogeneity of memory B cells has been appreciated for decades27. A combination of CD27 and several Ig classes (e.g., IgM, IgD, IgG and IgA) is the most widely-used surface marker set in research examining memory B cell subsets. With these surface markers, a dozen memory subsets were isolated artificially, among which some may overlap with each other depending on the selection of marker sets28. Thus, a limited and predefined set of markers does not guarantee a biologically meaningful category of memory subsets featured by distinct origin or function. Benefiting from the high dimensional transcriptome data, we found that there existed two major memory B cell branches in three representative sources (BM, PBL and tonsil). To be exact, classical (referred to as C) and IgM+ (M) memory B cells represented a branch, while CD27-IgM+IgD+ (MD) represented another branch. In the constructed development trajectory, MD was found as an independent population that contained few cells in a transitional state towards the other two subpopulations, suggesting a gap between the two major branches (Fig. 5A-B). Similar results were also observed for memory B cell subpopulations in independent sources (BM and PBL, Fig. S15A-D). The distinctness of the two branches was also demonstrated by the 3D UMAP plot (Supplementary Movie 1) as well as by a lower connectivity between MD and C/M than between C and M (Fig. 2C). Moreover, C and M subpopulations have a more relevant IGHG and IGHA subclass composition (Fig. 5C). These observations coincided with the two well-recognized memory generation pathways, namely T cell-dependent (TD, for C and M) and T cell-independent (TI, for MD) pathways29,30. Both M and C subpopulations have a TD activation history but differ from each other in that the former comes from a primary GC response, while the latter comes from a secondary or consecutive GC response28,31. A longer maturation history for C subpopulation was reflected by a delayed emergence in the development axis compared to M subpopulation (Fig. 2F).

Fig. 5: Integrated analysis reveals two compatible development models for memory B cells.
Fig. 5: Integrated analysis reveals two compatible development models for memory B cells.The alternative text for this image may have been generated using AI.
Full size image

Transcriptomic-based inference of developmental trajectory for three memory B cell subpopulations with monocle2. Both B cell subpopulation distribution (A) and pseudotime prediction (B) are shown along the trajectory. C IGHG and IGHA subclass composition for three memory B cell subpopulations. Numbers in the parentheses indicate the number of B cells. D Doughnut plot showing the tissue-specific composition of three memory subpopulations for the mixed samples and three donors. E CDR3 length distribution. The paired smooth curves are distributions fitted from kernel density estimates. Classical, IgM+ and CD27-IgM+IgD+ subpopulations are denoted as “C”, “M” and “MD”, respectively. F, G BCR heavy chain variable (H) and joining (I) gene usage for classical, IgM+ and CD27IgM+IgD+ memory B cell subpopulations. For variable genes, only those with a usage frequency of at least 1% for classical memory B cells are shown. The error bar shows a 95% confidence interval. H Isotype-specific comparison of SHM level of heavy chain variable genes between three memory subpopulations. Numbers on the boxes indicate the number of unique clonotypes. I Venn diagram showing clonotype sharing between classical, IgM+ and CD27IgM+IgD+ memory B cell subpopulations. The numbers with a pink background indicate the clonotype similarity as measured by the Jaccard similarity index. J Barplots showing the fraction of clonotype sharing between donors, sources and isotypes. Statistical test used in (DG) is two independent samples t test, two-sided. *, p < 0.05; **, p < 0.01; ***, p < 0.001.

We then investigated the tissue-specific composition of the three memory subpopulations. We observed a higher percentage of M and MD subpopulations in BM compared to PBL (Fig. 5D). This tissue difference can be reproduced in donors separately and also by comparing the in-house BMMC and external PBMC dataset (Fig. 5D and Fig. S15E), consistent with a previous study32. Since M and MD subpopulations are not as mature as C, the higher percentage of M and MD suggested BM as a niche with a broader immune potential compared to PBL.

Furthermore, we characterized their BCR repertoires. We found the M subpopulation had significantly shorter heavy chain CDR3 length compared to the other two subpopulations (mean values for M, C and MD are 16.8, 17.4, and 17.1) (Fig. 5E), though the result were not reproduced in all individual donors (Fig. S15F). The CDR3 length difference between M and MD subpopulations has been reported in previous studies33,34. However, we also observed a significant difference between C and M subpopulations (p < 0.001). Since they have the same origin, this CDR3 length difference probably reflects the antigen-based selection in the subsequent GC response35,36,37. For heavy chain gene usage, C subpopulation preferred to use IGHV1 family genes (e.g., 1–69, 1–18, 1–46) (p < 0.05) and IGHJ5 (p < 0.01) (Fig. 5F, G). In contrast, M subpopulation preferred to use IGHV3-7 (p < 0.01), while MD subpopulation preferred to use IGHV4-34 (p < 0.01). We also compared the VH gene somatic hypermutation (SHM) level between 3 memory subpopulations in an isotype-specific manner. For switched isotypes (IgG and IgA), the SHM level for C subpopulation was higher than both M and MD subpopulations, agreeing with previous studies (Fig. 5H). Remarkably, IgM was found with a higher SHM level in M and MD rather than in C subpopulation. In addition, M had a higher IgM SHM level compared to MD. Finally, we investigated the clonal relationship between these memory subpopulations. C subpopulation was found to have a closer relationship with M than with MD, consistent with the notion that C and M share the same generation pathway (Fig. 5I). However, M subpopulation was found with a higher clone similarity score to MD rather than C. A careful examination confirmed that all 32 clonotypes shared by M and MD came from D3 (Fig. S15G). Moreover, we found the two subpopulations had almost equal SHM levels for each of the shared clonotypes (Fig. S15H). A careful examination revealed that all shared clonotypes came from the same donor but could be shared by tissues and isotypes (Fig. 5J), suggesting both the high-level privacy of BCR repertoire and the circulation of mature B cells between BM and PBL. Overall, these results not only supported the well-accepted TD and TI maturation paradigm but also provided evidence correlating M with MD memory B cell subpopulations.

Myeloid cells represent a pivotal component in B-cell development and maintenance

The development and maturation of human B cells are straightforwardly reflected by the phenotype transition of B cells themselves which can occur in genome, epigenome, transcriptome and proteome levels4. These multi-omics changes, however, are delicately regulated by different well-organized microenvironments provided by, for example, BM niches and GCs in secondary lymphoid organs. Signals B cells receive in these microenvironments can be mediated by either the surface-bound ligands from adjacent cells or the secreted signaling molecules such as cytokines, chemokines and adhesion molecules. Having identified both B and surrounding non-B cell types in three sources (i.e., BM, GC and PBL, non-B cell types in BM and PBL mainly come from unsorted samples), we are capable of investigating the latent cell-cell interactions (CCIs) that are possibly critical for B cell development and maturation.

Since this study focuses on extrinsic factors contributing to B cell development, we took into consideration only CCIs where B cell serves as a signal receiver and CCIs mediated by adhesion molecules. Applying CellPhoneDB to the CCI (hereafter defined as a unique combination of two interacting molecules) inference, we identified 144, 67, and 61 CCIs for BM, GC and PBL, respectively (see Materials and Methods). The higher number of CCIs in BM revealed a more complex microenvironment maintained by this niche than by the two other sources. Specifically, among 23 non-B cell types in BM, MSC was found with a maximum number (88) of CCIs with B cell subpopulations (Fig. 6A), agreeing with the paradigm that BM stromal cell (BMSC) represents the paramount component in support of B cell development38. Since MSC represent a heterogeneous population, we identified three MSC subsets (CXCL12hi MSCs, CXCL12lo MSCs, and fibroblasts) together with the endothelial cells according to the expression of canonical marker genes (Fig. S16A, B). Reanalyzing the four subsets, we found CXCL12hi MSCs were reported with the maximum number of CCIs (Fig. S16C), highlighting the classical view that CXCL12-abundant reticular (CAR) cells represent the major functional component of niches for B cell development38. Following MSCs, the myeloid cell, cDC, was found with 60 CCIs. The large number of CCIs between cDC and B lineage subpopulations also reflected a pivotal role of myeloid cells in maintaining B cell homeostasis in BM. From the point of view of B cell subpopulations, we found cycling pro (69) B cell, three memory B cell subpopulations (75, 71, and 64) and PC (82) had more CCIs with surrounding non-B cells (Fig. 6A). In contrast, the primary and S1008Ahi immature B cells and PB presented a minimum number of CCIs.

Fig. 6: Cell-cell interaction analysis of B and non-B cells in three tissues.
Fig. 6: Cell-cell interaction analysis of B and non-B cells in three tissues.The alternative text for this image may have been generated using AI.
Full size image

The number of CCIs predicted by CellPhoneDB between B and non-B cells in BM (A), GC (B) and PBL (C) as shown by heatmaps. The number of CCIs between B and non-B cells in BM (D), GC (E) and PBL (F) as shown in a B cell subpopulation-specific and lineage-specific manner by complex bar plots. The outer hollow bar represents the total number of CCIs predicted for a typical B cell subpopulation. Within the outer hollow bar are solid bars representing the number of CCIs between a typical B cell subpopulation and a typical lineage. The black dot plot overhanging the bar plot denotes the median number of genes for a typical B cell subpopulation. G Representative confocal image demonstrating the spatial juxtaposition of early-stage B cells (CD19+Kappa-Lambda-CD43 ± , pink arrows) and myeloid cells (CD11B+, green arrows) within the bone marrow of a wild-type C57BL/6J mouse. Solid arrows mark cells exhibiting positive fluorescence signals, whereas dashed arrows denote cells lacking detectable signals. Scale bar: 50 µm.

Resembling BM, myeloid cells also represented a vital lineage interacting with B cells in both GC and PBL (Figs. 6B, C and 6D-F). cDC (38) and CD14+ monocyte (29) were the two cell types predicted with the maximum number of CCIs with B cells in GC and PBL, respectively. Remarkably, for each B cell subpopulation, all CCIs can be recovered in myeloid cells in GC (Fig. 6E). Among B cell subpopulations in GC and PBL, three memory B cell subpopulations remained the subset presenting the maximum number of CCIs (Fig. 6B, C). Typically, LZ GC B cell presented the maximum number of CCIs with cDC (i.e., follicular DC), recapitulating the GC response paradigm39 (Fig. 6B).

It should be noted that colocalization of interacting cell types is a prerequisite for some of these predicted CCIs (e.g., CCIs mediated by membrane-bound molecules). We experimentally verified the spatial adjacency between early B cells (CD19+Kappa-Lambda-CD43±) and myeloid cells (CD11B+) in the BM from a wild-type C57BL/6J mouse (due to limited access to human BM samples) (Fig. 6G). Besides, more straightforward evidence for the colocalization of B cells and myeloid cells in human BM and mouse PBL can be found in previous studies18,40. Overall, these results suggested a critical role of myeloid cells (particular cDC and monocytes) in B cell development and maintenance.

TNF signaling and adhesion interaction dominate B and non-B CCIs in BM, GC and PBL

We then investigated the signaling mediated by the predicted CCIs. Overall, 40, 20, and 20 signaling categories were identified for BM, GC, and PBL, respectively (Fig. S17). By sorting the signaling categories according to the number of subject CCIs, we found signaling by tumor necrosis factors (TNF) was the most frequent signaling category in all three tissues (Fig. 7A-C). The TNF superfamily represents a multifunctional group of cytokines that activate signaling pathways for cell survival, apoptosis, inflammatory responses and cellular differentiation41. The enrichment of TNF signaling across tissues suggested an intricate regulation machinery of cell differentiation, survival and apoptosis in B cell development and maintenance. By comparing the signaling categories in three tissues, BM was found with the most disverse categories, reinforcing its microenvironment complexity (Fig. 7D). Despite sharing TNF signaling, the involved CCIs showed discrepancies between tissues. To be specific, BM and GC were found with three and two tissue-specific TNF CCIs, respectively (Fig. 7E). In BM, these tissue-specific CCIs included TNF_TNFRSF1A, TNFSF10_TNFRSF10D and TNFSF4_TNFRSF4. The former CCI was predicted exclusively for B cell precursors (pre-pro, cycling pro, and pro B cells), whereas the latter two CCIs were exclusively for the terminally differentiated PC (Fig. S18). The two GC-specific CCIs were mediated by the same ligand LTA (TNFSF1) and included LTA_TNFRSF1B and LTA_TNFRSF14 (Fig. 7E). The former was memory B cell-specific, while the latter was additionally predicted for naïve B cells (Fig. S18). This stage-dependent regulation by the TNF superfamily could be found not only in the tissue-specific CCIs but also in the shared compartment (Fig. S18).

Fig. 7: Signaling category between B and non-B cells in three tissues.
Fig. 7: Signaling category between B and non-B cells in three tissues.The alternative text for this image may have been generated using AI.
Full size image

Top 20 signaling categories for CCIs between B and non-B cells in BM (A), GC (B) and PBL (C). The top “TNF signaling category” is highlighted in red. Adhesion categories are highlighted in blue. D Signaling category overlap between three tissues. E Tumor necrosis family CCI overlap between three tissues. BM- and GC-specific CCIs are marked. F Adhesion category overlap between three tissues. All adhesion category subsets are marked. G Top 5 signaling categories across B cell subpopulations in BM. In parentheses is the number of total CCIs in a typical B cell subpopulation. The signaling category is color-coded as the legend. Notably, only a subset of signaling categories is provided in the legend to save space. For a complete list of signal categories, please refer to Figure S21. “S” in the parentheses denotes “signaling” in contrast to adhesion CCIs. Four well-known CCIs (H, FLT3L; I, CXCL12; J, IL7; K, SCF) critical for B cell development in BM predicted by CellPhoneDB as shown by the chord diagrams. Non-B cells are colored according to the lineage. Interactions are represented by ribbons linking two entities, with the one pointed by the arrow being the signal receiver.

Apart from TNF signaling, multiple adhesion CCI categories were also pronounced in the top signaling category list, particularly for BM B cells. Notably, all adhesion CCI categories (n = 5, ICAM, Collagen/Integrin, JAM, Prothrombin and Cadherin) predicted in GC and PBL can also be found in BM (Fig. 7F and Fig. S17). Half of the adhesion CCI categories (n = 5, THY1, Fibronectin, Tenascin, Laminin and VCAM) predicted in this study were BM-specific (Fig. 7F and Fig. S17), including the well-known VCAM-mediated adhesion42 (Fig. S19).

We also investigated the lineage-specific (LS) signaling for non-B cells in three tissues. In BM, 10 signaling categories were found to be lineage-specific, of which 7 were MSC-specific (Fig. S20A). These MSC-specific signaling included 3 aforementioned BM-specific adhesion categories (THY1, Fibronectin and Tenascin) and signaling mediated by notch, R-spondin, transferrin and retinoid acid. While in GC and PBL, the lineage-specific signaling categories were mediated mainly by myeloid cells (Fig. S20B, C). In the viewpoint of B cell subpopulations, we found a top signaling category transition from ICAM-mediated adhesion to TNF-mediated signaling along B cell development in BM. The enrichment of ICAM-mediated adhesion further addressed its fundamental role in early B cell development (including cycling pro, cycling pre and pre B cells) (Fig. 7G). As is in BM, TNF- and ICAM-mediated signaling also dominate in B cell subpopulations in GC and PBL (Fig. S21A, B). Remarkably, DZ GC B cells had an increased number of ICAM-mediated CCIs compared to their LZ counterpart, indicating a requirement for a more delicate adhesion pattern for this proliferating subpopulation.

Five well-known essential factors for B lineage commitment and development are FLT3L, CXCL12, IL-7, SCF and RANKL38. All five factors were successfully reported by CellPhoneDB even though RANKL signaling was not significant (p > 0.05, not shown here) (Fig. 7H-K). Notably, SCF (KITLG_KIT) signaling was predicted only between MSC and HSPC (Fig. 7K). The absence of SCF signaling predicted between MSC and early B cells might reflect its elusive role in early B cell development given that SCF-KIT axis is redundant in fetal, neonatal and young mice43 but seems essential in adult mice44. It is widely accepted that the delicate niche required for early B cell development is provided by the coordinated network of different BMSCs, which includes osteoblasts, reticular cells and IL-17-expressing cells38. Consistent with this notion, MSC, which is comprised of BMSCs here, took part in all these canonical signaling pathways (Fig. 7H-K). Dissecting the inferred CCIs in a MSC subset-specific manner, we found CXCL12hi MSC subset had the maximum number of these essential signaling pathways (Fig. S21C), consolidating its paramount role in B cell development. However, additional cell types of other lineages were also found as potential signal senders (FLT3L by T/NK cells and CD16+ monocytes, CXCL12 by macrophages and IL7 by MEP). These results indicate a role for cell populations other than BMSCs in the niche construction for B cell development.

S100A8hi immature B cells represent an age-associated B cell subset

The calprotectin (S100A8/A9) is constitutively expressed by myeloid cells, especially in neutrophils, monocytes and macrophages45,46,47,48. Although it was previously reported that this protein was also detected in B cells in pathological conditions49,50, we assumed that its expression in B cells in the physiological state represented a kind of unusual ectopic expression and deserved further investigation. Considering the characteristics of enrolled subjects (more than 50 years) in this study and S100A8/A9’s association with aging in a previous report51, we hypothesized that this S100A8hi immature B cell subpopulation represented an age-associated B cell (ABC) subset and the ectopic expression of S100A8 in B lymphocyte in physiological state is relevant to senescence-associated secretory phenotype (SASP). To verify this, we conducted in silico analyses, qPCR and in vitro cell culture experiments.

Firstly, we validated the existence of S100A8hi B cells (C8 and C12) in two independent single-cell studies18,20 (Fig. 8A, B). For the former study, the cells were derived from the BMMC of three healthy subjects used as controls in this study. For the latter one, the cells are derived from hematologically healthy BM of patients undergoing total hip arthroplasty. All subjects from both studies are older than 50 years. To consolidate the verification of its association with age, we also dissected BM samples of younger subjects (with age from 2 to 50) from another four available single-cell studies10,19,21,22 and found no evidence supporting S100A8 expression in B cells (Fig. S22). Due to the limited accessibility to human BM samples, we investigated the age-associated expression of S100a8/a9 in B cells from multiple tissues (spleen, kidney, mammary gland, limb muscle, lung and so on) of mice of different ages from an aging study (Fig. S23A, B)52. We observed a significant age-associated expression pattern in mRNA level (Fig. 8C) and obtained a similar result in spleen-only B cells (Fig. S23C, D). Inspired by this mouse-based result, we performed quantitative PCR (qPCR) and FACS on B cells purified from BM, spleen, and PBL of a cohort of C57BL/6 J mice of different ages (3 m, young (Y); 12 m, middle age (M); 22 m, aged (A), n = 3 for each group). Both male and female mice were used in the qPCR experiment. We found a significantly higher S100a8/a9 expression level for female mice in BM, spleen and PBL (Fig. 8D and Fig. S24A-D). However, this age-associated expression pattern was not reproduced in male mice (Fig. S24E-H), which could be explained by the fact that females are more prone to autoimmune disease and might be expected to accumulate more S100A8/A9 B cells with age53. In the FACS experiment, the proportion of the S100A8-positive population in total B cells (CD45+CD19+) was measured and we observed a higher percentage of such cells in the aged group than the middle age group in all three tissue sources (PBL, 26.2% vs 7.4% (p < 0.05); BM, 6.6% vs 3.7%; SP, 14.0% vs 9.0%) (Fig. S25). Moreover, we revealed an immature-like (CD27-IgD-IgM+) and even earlier (CD27-IgD-IgM-, e.g., progenitor) phenotype for this S100A8-positive population, of which the former is consistent with our human scRNA-seq data (Fig. S26).

Fig. 8: Age-associated expression of S100A8/A9 in B lymphocytes.
Fig. 8: Age-associated expression of S100A8/A9 in B lymphocytes.The alternative text for this image may have been generated using AI.
Full size image

A, B UMAP of BM B cell clusters from aged healthy human subjects from two external single-cell studies and the average expression of S100A8 and S100A9 across clusters. The red dashed lines mark an average expression level of 0.1%. The error bar shows a 95% confidence interval. C Age-associated expression of S100A8/A9 in mouse B cells from a mixture of tissues as measured based on scRNA-seq data. SCC, Spearman’s correlation coefficient. D Age-associated expression of S100a8/a9 in the BM B cells of C57BL/6J female mice as measured by qPCR. Y, young (3 m); M, middle age (12 m); O, old (22 m). The error bars represent the standard error. Two independent samples t-test (two-sided), *, p < 0.05; **, p < 0.01. E The average expression level of three ABC marker genes (TBX21, ITGAM, and ITGAX) across B cell subpopulations. F TBX21 regulon specificity score comparison across B cell subpopulations. G ELISA assay demonstrating the ability to secrete S100A8/A9 for human BM B cells. The subjects’ ages range from 56 to 66. The concentration of S100A8/A9 in the cell culture medium was measured at four time points (12 h, 1 d, 2 d and 4 d) for each sample. ARH-77 and blank culture medium were used as negative controls, while the unsorted human BMMC was used as a positive control. H Chord diagram showing the predicted S100A8/A9-TLR4 signaling between B cell subpopulations and surrounding bone marrow non-B cell types. I Schematic diagram demonstrating the occurrence of S100A8hi B cells in the aged population and how they participate in the inflammageing process.

Since ABCs have been reported in previous studies (reviewed by de Mol et al.54 and Yu et al.55), we compared their molecular expression signature with S100A8hi immature B cells to evaluate their developmental relationship. A widely accepted gene set, including TBX21, ITGAM and ITGAX, was reported to distinguish ABC from other B cell subsets in both humans and mice54. We thus profiled the expression of the three genes across B cell subpopulations and found weak expression of ITGAM and ITGAX in classical memory and CD27-IgM+IgD+ memory B cells, respectively (Fig. S27). With a high-resolution mean expression comparison, however, relatively higher expressions of these genes were found with S100A8hi immature B cells than other non-memory B cell subpopulations, demonstrating a similar molecular expression signature between this subpopulation and known ABC (Fig. 8E). Among the three genes, TBX21 encodes a TF sharing a common DNA-binding domain, the T-box. To investigate its transcription regulation activity, we performed single-cell regulatory network inference and clustering (SCENIC) analysis and found S100A8hi immature B cells had the highest regulon specificity score among all non-memory B cell subpopulations (Fig. 8F). Notably, it has been documented that in healthy adults, a small but readily identifiable memory B cell subset expresses T-bet (TBX21)56. This probably accounts for the high expression of these markers and high TBX21 activity in memory B cell subpopulations. Therefore, the highest expression level of known ABC marker genes and the highest transcription activity among all non-memory B cells possibly suggested a shared mechanism contributing to their aged phenotypes.

Although we confirmed the age-associated expression of S100A8/A9 in B lymphocytes, it remains unclear whether this unusual expression pattern can result in the secretion of this molecule into the microenvironment and then contribute to SASP. To address this issue, we conducted short-term cell culture experiments with a serum-free medium (see Materials and Methods). Unsorted human BMMC and blank medium were used as positive and negative controls, respectively. A human B lymphoblast cell line, called ARH-77, was also included as a negative control. S100A8/A9 concentration in cell supernatant or blank medium was measured in 12 h, 1 d, 2 d and 4 d by ELISA. The result demonstrated that the S100A8/A9 concentration in human B cell supernatant was clearly higher than the blank medium but was lower than their respective BMMC control, supporting the secretion ability of B lymphocytes (Fig. 8G). Besides, we also observed a higher S100A8/A9 concentration in the elder group compared to the young group, consolidating its age-associated expression feature in B lymphocytes (Fig. S28). Following this result, we further predicted S100A8/A9_TLR4 signaling between S100A8hi immature B cells and certain myeloid cell types (monocyte, neutrophil, DC) in our own data with CellPhoneDB (Fig. 8H). Moreover, Zhang et al. have revealed an aging driver role of S100A8/A9 with the recombinant human antibody by in vitro experiments51. They demonstrated that S100A8/A9 could elicit oxidative stress, inflammatory response, and proliferative impairment. Considering these facts, the secretion of S100A8/A9 from B lymphocytes in the aged probably participates in the senescence process through multiple mechanisms (Fig. 8I). Taken together, these results implicate an elevated expression of S100A8/A9 in B lymphocytes in the aged population and an involvement of B lymphocytes in senescence.

Discussion

Single-cell sequencing technology significantly advanced our knowledge of the development and maturation of B lymphocytes, shedding light on their heterogeneity, developmental trajectory and BCR repertoire features in both physiological and diseased status. However, a comprehensive single-cell and high-dimensional examination of human B lymphocytes from the earliest progenitors to terminally differentiated PCs was lacking. In this work, we compiled scRNA-seq and scVDJ-seq data of B cells from human BM, PBL and tonsil and provided an integrative analysis of the gene regulation, conventional B cell heterogeneity and cell-cell communication network along the B cell development axis. We found the least active stage for immature B cells, a local and subject-specific HP pattern for naïve B cells, and two compatible development models for memory B cell subpopulations. Moreover, we also identified the myeloid cells as a key cell category in addition to stromal cells in B cell development and maturation, where TNF and adhesion signaling dominate all external signaling types and prevail over the other in a stage-dependent manner. Last but not least, we identified two age-associated B cell subpopulations, S100A8- and C1q-expressing B cells, which probably contribute to the SASP.

The developmental trajectory of B cell progenitor in BM has been constructed in previous studies7,57,58, along with which the gene expression kinetics was depicted, particularly for those TFs. However, the dynamics of transcription activity when B cells develop and mature remain unresolved. We reported immature B cells as the least transcriptionally active stage along the developmental axis, featured by the minimum number of expressed genes (Fig. 2A) and the lowest level of gene activity as measured by RNA velocity (Fig. 2E). This observation coincided with the lowest metabolic rate for immature B cells demonstrated by Zeng et al.14 (Fig. 1A in their publication). Given that both heavy and light chain gene arrangements are completed, this downregulated gene expression activity in such a stage reflects a resting state poised for efflux from BM and subsequent peripheral tolerance before entering the mature B cell repertoire.

B-cell HP is reported in both physiological59 and B-cell deficit60,61 settings. It takes place in response to B cell depletion and represents a mechanism to compensate for this cell loss, thereby being inhibited in a cell dose-dependent manner by feedback from mature B cells26. We provided evidence of HP by demonstrating clonal sharing between different naïve B cell subpopulations that showed no antigen-dependent proliferation history (Fig. 4K). We further found that the clonal sharing between these naïve phenotype subpopulations was tissue-confined in both BM and PBL. Since HP has only been reported previously for peripheral mature B cells59, our finding that naïve B cell HP co-occurs in BM extended the scope for this particular process. Because periphery B cell HP has been related to lymphogenesis regulation in BM60,61, this local B cell HP could have a more straightforward role in this feedback. This hypothesis, together with the interplay between BM and peripheral HP, however, remains to be investigated. For the HSPA1Ahi naïve B cell subpopulation, Yang et al. recently reported a stress-response memory B cell subset with a similar gene expression pattern in tumor microenvironment (TME) in a pan-cancer study62. In their work, the stress response was suggested as a common molecular characteristic across different cell lineages in tumors. Similarly, the HSPA1Ahi naïve B cell subpopulation in this study comes from pediatric patients with recurrent tonsillitis and the inflammatory microenvironment could have mimicked that observed in TME.

The heterogeneity of memory B cells has long been appreciated. Using single-cell transcriptomic data, we unbiasedly identified three memory B cell subpopulations, namely classical (C), IgM+ (M), CD27-IgM+IgD+ (MD). The subsequent development trajectory analysis and BCR repertoire characterization presented substantial evidence supportive of a close relationship between C and M subpopulations, including a transitional developmental pathway (Fig. 5A), a similar Ig subclass composition (Fig. 5C), and a closer clonal relationship (in two of three donors) (Fig. S14B, C), agreeing with the notion that they are both post-GC and developmentally continuous28. However, we also noticed the non-negligible clonal relationship between M and MD subpopulations in one of the three enrolled subjects (Fig. S14D). By examining the shared clonotypes (n = 32), we found a substantial fraction of them carried distinct constant genes and varied SHM levels in the two memory subpopulations, suggesting a dynamical transition between the two (Fig. S14E). As is reviewed by Seifert and Küppers63, the origin of MD subpopulation is disputed among three viewpoints, consisting of a means of naïve B cell diversification, T-dependent and T-independent immune responses. Here, we proposed distinct but compatible models to account for the origin of MD subpopulation.

This integrated single-cell data also enabled us to investigate the CCIs involved in the development and maturation of B cells. According to the conventional paradigm, BMSC is the most paramount component in maintaining the niches required for B cell lymphogenesis, encompassing adipocytes, endothelial cells, osteoblasts and fibroblastic reticular cells64. Upon encountering an antigen, activated B cells will undergo several rounds of proliferation and affinity-based selection facilitated by follicular helper T cell (TFH) and DC (FDC) in the GC39. However, several non-canonical cell types were also reported to reside in GC, such as macrophages, natural killer and CD8 + T cells (reviewed by Victora et al.65). In the circulating blood, direct DC-to-B cell contact was also observed in mice with flow cytometry and microscopic imaging40. With the latest manually curated interaction database, we predicted a great number of candidate CCIs between B cells and myeloid cells across the three representative tissues (i.e., BM, GC and PBL) (Fig. 6D-F), implicating a pivot role for myeloid cells in B cell lymphogenesis and peripheral homeostasis. Moreover, we also detected a top CCI category shift from adhesion interaction to TNF signaling (Fig. 7G), clarifying a particular feature of the external signal dynamics along the B cell development axis. Overall, our analysis revealed the complexity of the microenvironment supporting the development and maturation of B cells.

Lastly, it is worth mentioning that we identified two age-associated B cell subpopulations that express S100A8/A9 and C1q, respectively. Both proteins are typically expressed in myeloid cells, such as macrophages, monocytes, neutrophils and DCs. To the best of our knowledge, their expression in B cells is exclusively reported in diseased settings or organ transplantation cases (i.e S100A8/A9 in SLE49 and COVID-1950; C1q in transplantation66). However, we identified the two subpopulations in donors without reported inflammatory responses. For the S100A8-expressing subpopulation, we validated it in a subset of external BM scRNA-seq data from aged human donors and found a significant correlation between S100A8/A9 expression level and age in mice. Moreover, we experimentally validated the S100A8/A9 secretion ability of B cells, suggesting its role in SASP. In contrast, the C1q-expressing subpopulation was not found in any enrolled external dataset, which could be accounted for by its particularity and rarity. Both subpopulations were found in BM and had a transcriptomic phenotype similar to the primary immature B cell subpopulation. Whether this observation reflects their niche dependency remains to be elucidated.

It should be noted that certain limitations are present in our work. Firstly, this study does not cover all well-known B cell subsets, i.e., transitional B cells, regulatory B cells and B1 cells. To an unbiased delineation of the gene regulation dynamics, we did not delve into the data for these minor subpopulations. Second, the existence and functionality of novel B cell subpopulations present in this study remain to be validated. Similarly, our CCI analysis provides only candidate interactions, which take place when some prerequisites like cell juxtaposition are met. Moreover, the impact of these candidate interactions on B cell development also calls for careful examination. Though that analyses presented in this study are primarily based on data from aged participants where B cell development is expected to decline, we validated those susceptible conclusions with data from solely young participants (Figs. S29 and S30).

Despite these limitations, our work reveals the heterogeneity of conventional human B cell subsets, resolves gene regulation dynamics and underlying CCIs along the B cell development axis, and sheds light on the attributes of several critical B cell biological processes. Thus, it represents a valuable resource for in-depth investigation of B-cell development.

Materials and methods

Sample collection and B-cell enrichment

All human studies were performed in compliance with the guidelines of the Research Ethics Committee of Guangdong Provincial People’s Hospital. BM and matching PBL were collected from three patients who were more than 50 years old and suffered from a herniated disk but were free from hematological disorders. Informed consent was obtained from all participants prior to sample collection. The isolation of PBMCs and BMMCs was performed using standard density-gradient centrifugation with Ficoll-Paque™ (Catalog. LTS1077 for PBMCs, TBD2013LHU for BMMCs, TBD). B lymphocytes were purified using the Human B Lymphocyte Enrichment Set-DM kit (Catalog. 558007, BD Biosciences) from fresh BM or PBL, following the manufacturer’s instructions. Cell counting and trypan blue exclusion viability were performed on a TC20 Bio-Rad automated cell counter.

Single-cell transcriptomic and immunomic library preparation and sequencing

To evaluate the concentration and viability of cell suspensions, filtered trypan blue was added and an automated cell counter (Bio-Rad, TC20) was used. The optimal concentration range for cell stock was determined to be 700–1200 cells/μl to increase the likelihood of reaching the desired cell recovery target. A minimum sample viability of ≥90% was required to improve the recovery rate. Based on the recommended cell concentration and cell counter results, cells were resuspended in a suitable volume of cold PBS containing 0.04% BSA. For the analysis of B lymphocytes and BMMCs, cells were loaded onto the 10x Genomics Chromium single-cell platform. To ensure sufficient cell numbers, two portions of BMMCs and four portions of enriched B lymphocytes (3 for BM and 1 for PBL) were loaded onto the platform. For BMMC analysis, libraries were constructed using the Chromium Next GEM Single Cell 5’ Kit v2 (10x Genomics, 1000263). And for B lymphocyte analysis, libraries were constructed using the Chromium Next GEM Single Cell 5’ Kit v2 (10x Genomics, 1000263) and the BCR Amplification kit (10x Genomics, 1000253). Briefly, GEM generation and barcoding were performed with 10,000 cells per reaction, followed by GEM-RT, post-GEM-RT cleanup, and cDNA amplification to isolate and amplify cDNA for library construction. Complementary DNA post-amplification, cDNA post-target enrichment, and final libraries were evaluated using Agilent Bioanalyzer chips (Catalog. no. 5067-5582) on an Agilent Bioanalyzer 4200. Finally, the libraries were sequenced using an Illumina platform with 150 bp pair-end sequencing.

scRNA-seq data processing and cell population identification

Single-cell gene expression matrices for enriched B cell and unsorted BMMC samples were obtained using CellRanger (v6.1.2) “count” utility with “refdata-gex-GRCh38-2020-A” as the reference transcriptome. The external scRNA-seq data of early B cell samples was obtained from the GSA repository (https://ngdc.cncb.ac.cn/gsa-human/, accession number: HRA000489) as raw sequencing reads in fastq format, of which the expression matrices were obtained following the procedure same as in-house data. The external scRNA-seq data of tonsil samples were obtained from ArrayExpress (https://www.ebi.ac.uk/biostudies/arrayexpress, accession number: E-MTAB-9005) as processing-ready expression matrices. The scRNA-seq data of PBMC from healthy subjects were obtained from CNGB Nucleotide Sequence Archive (https://db.cngb.org/cnsa, accession number: CNP0001102) as a processed Seurat object.

With all scRNA-seq data ready, we utilized Seurat (v4.1.1) to perform data preprocessing, integration, and subsequent procedures contained in a routine Seurat analysis pipeline. Specifically, we retained in each sample only the cells expressing at least 200 genes and the genes expressed in at least three cells. Cells expressing a high percentage (>=10%) of mitochondrial genes or predicted to be doublets by DoubletFinder (v2.0.3) were also discarded in the preprocessing step. Moreover, we removed BCR/TCR-related genes (including variable, junctional and diversity genes, with constant region genes retained) to avoid bias in the subsequent clustering step. Notably, the preprocessed PBMC dataset was not subject to the above quality control. Afterwards, the preprocessed expression matrices were integrated following Seurat “Fast integration using reciprocal principal component analysis (PCA)” practice with default parameters. Notably, we specified the reference to be the unsorted BMMC sample (containing multiple cell types) with the maximum number of cells (i.e., “D2-BM2”, see Table S2) in the integration anchor identification step (“FindIntegrationAnchors”). Subsequently, the integrated expression values were further processed with a routine workflow that includes data scaling, linear (PCA) and non-linear (UMAP) dimensional reduction and cell clustering, which were achieved by “ScaleData”, “RunPCA”, “RunUMAP”, “FindNeighbors” and “FindClusters” utilities with default parameters. Only 30 PCs were retained in the PCA step. The resolution in the cell clustering step was set to 0.8. Cell clusters identified in the first round of clustering were then manually annotated with the lineage information (i.e., HSPC, B, T/NK, myeloid, erythroid/megakaryocyte and MSC) according to the lineage markers (Fig. S2A).

After the lineage identification, an unbiased cell population identification within individual lineages (except for MSC) was conducted through five parallel secondary rounds of clustering. To be specific, cells from a typical lineage were extracted and the expression values from cells of different projects were integrated as that in the first round of clustering. The integration basis transitions from the sample to the project level to avoid the bias caused by the rarity of cells in some samples after the cell lineage split. The integrated data were then subject to the same workflow as above to identify cell clusters in a typical lineage with canonical marker genes (Fig. S2B-E). For B cell subpopulations, known marker genes (Fig. 1F) were used in combination with the computed differentially expressed (DE) genes (with the “FindAllMarkers” utility) (Fig. S4) to define their identities. With all lineage annotations finished, cell populations of different lineages were recombined into a well-annotated integrated dataset ready for downstream analyses.

scVDJ-seq data preprocessing, clonotype assembly, and downstream analyses

CellRanger (v6.1.2) “vdj” utility was employed to obtain VDJ contigs (“filtered_contig.fasta”) from scVDJ-seq datasets. Subsequently, novel allele identification, genotyping and gene reassignment for heavy chain V genes were performed successively to achieve an accurate VDJ repertoire profiling (particularly for SHM level evaluation). IgDiscover (v0.15) was selected to infer novel V alleles for each donor based on bulk sequencing data as recommended by Yang et al.67. Then, we implemented a genotyping method as Zhu et al.68 before VDJ gene reassignment by IgBLAST with individualized reference sequence sets. The corrected V gene assignments for heavy chain contigs were a basis for SHM-level quantification. After correcting gene assignment, we further filtered the data to retain one heavy and one light chain contig for each cell. Cells without productive heavy or light chain contig or with multiple heavy or light chains supported by a similar number of umis (fold change<2, refer to the criteria employed in Dandelion69) will be discarded. The preprocessed contigs were then integrated with the annotated scRNA-seq data to assign B cell subpopulation labels. As a result, contigs from cells not captured by scRNA-seq data were also removed. Finally, the resulting clean and well-annotated contigs will be assembled into clonotypes, within each they share the same V and J genes and CDR3 amino acid sequences in both heavy and light chains. The downstream BCR repertoire analyses (including gene usage, CDR3 length distribution, clonotype sharing and SHM level comparison) were all on a clonal basis, which meant that a unique clonotype provides only an observation on the studied feature. The SHM level is the percentage of nucleotide changes within the recombined variable gene, spanning from FR1 to the germline nucleotides at the start of CDR3.

RNA velocity analysis with scVelo

The spliced and unspliced expression matrices were obtained by running the Python package velocyto70 (v0.17) first on each sample individually (samples from King et al. were not included here due to the unavailability of raw sequencing data). The output loom files were combined using the “combine” function in loompy (v3.0.7) and imported using the “read” function in scVelo71 (v0.2.4). To take advantage of pre-computed dimensionality reduction UMAP embedding, the AnnData storing combined spliced/unspliced matrices were merged with the AnnData storing cell metadata and the associated UMAP embedding using the function “utils.merge”. Since this step resulted in a preprocessed AnnData object (retained only highly variable genes used to integrate samples and pre-computed PCA), we did not implement further preprocessing as indicated in the tutorial. Subsequently, each cell’s moment (means and uncentered variances) was computed using “pp.moments” with default parameters, and these moments facilitated RNA velocity estimation implemented in the function “tl.velocity”, with mode set to “stochastic”. Based on estimated velocities, a velocity graph representing transition probabilities was constructed using the function “tl.velocity_graph”. The velocity graph was then used to embed RNA velocities in the pre-computed UMAP in the form of streamlines, with the function “pl.velocity_embedding_stream”. Top active/important genes were identified using the function “tl.rank_velocity_genes”. The velocity pseudotime was calculated using the function “tl.velocity_pseudotime” with default settings and tuned manually by setting “root_key” to be a selected pre-pro B cell and “end_key” to be a selected plasma cell.

Partition-based graph abstraction of B cell subpopulations

The Seurat object of single-cell data was first converted into “.h5ad” format that is accessible by the Python-based single-cell data processing toolkit Scanpy72 (v1.9.1) with “SaveH5Seurat” and “Convert” utilities embedded in R package “SeuratDisk” (v0.0.0.9020). The converted single-cell data were then imported with the “read_h5ad” function in Scanpy. The partition-based graph abstraction was computed and visualized with the “tl.paga” and “pl.paga” utilities with default parameters, respectively. Notably, the weight threshold was set to 0.09 for a concise visualization of the connectivity between B cell subpopulations.

Development trajectory inference with Monocle2

The development trajectory for immature, naïve and memory subpopulations was inferred by Monocle73 (v2.24.1) with default parameters. To improve computing efficiency and obtain a neat trajectory, 200 cells were randomly selected from each B cell subpopulation (from a mixture of sources (Figs. 3A, B, 4A, B and 5A, B) and independent sources (Fig. S15A-D)) and the corresponding raw expression matrix was first extracted to construct a CellDataSet. Then, “estimateSizeFactors” and “estimateDispersions” were employed to normalize the data with the total library size and estimate negative binomial overdispersion for each gene, respectively. All variable genes defined by the differentialGeneTest function (cutoff of q < 0.01) were used for cell ordering with the “setOrderingFilter” function. Dimensionality reduction was performed with the DDRTree method in the “reduceDimension” step. Cells were then represented onto a pseudotime trajectory using the “orderCells” function and visualized using the “plot_cell_trajectory” function.

GO enrichment analysis

GO enrichment analysis in this study was performed by using the online web server Enrichr74 (https://maayanlab.cloud/Enrichr/, based on an update on June 8, 2023). DE genes for a typical B cell subpopulation were computed with the Seurat embedded function “FindAllMarkers” among only the compared cell subpopulations (for example, only three naïve B cell subpopulations were considered when computing DE genes for a typical naïve B subpopulation). All derived DE genes (padj<0.05) for a B cell subpopulation were used as the input in this analysis. Enriched terms were sorted by p values.

SCENIC (Single-Cell rEgulatory Network Inference and Clustering) analysis

Single-cell gene regulatory network analysis was performed following the published pySCENIC (v0.12.1) protocol75. To save the runtime, 1000 cells from each B cell subpopulation were randomly sampled. Log-transformed counts were used as the input. The Seurat object of sampled cells was converted into “.loom” format using “as.loom” utility to enable data import by the recommended workflow. Then gene regulatory network inference, candidate regulon generation and regulon prediction, and cellular enrichment steps were consecutively executed with the utilities “grn”, “ctx”, and “auc” from the command line. The list of TFs, ranking databases and motif annotation were consistent with the protocol. Subsequently, regulon specificity scores across B cell subpopulations were calculated through the embedded “regulon_specificity_scores” function based on cellular enrichment scores. Only regulons containing C1q components were considered for PPARG and NR1H3 in regulon network construction (visualized by Cytoscape (v3.9.1)).

Prediction of non-B and B cell interactions with CellPhoneDB

Non-B and B cell CCIs were predicted by CellPhoneDB76 (v5.0.0, https://github.com/ventolab/CellphoneDB) in this study. Before this prediction, we split the entire dataset into three subsets (i.e., BM, GC and PBL) according to the tissue source to avoid false positive CCI predictions caused by tissue discrepancy (e.g., CCIs between cell populations not co-existing in the same tissue or CCIs constituted by molecules not co-expressed in the same tissue). After this dataset splitting, cell populations with a percentage less than 0.1% were discarded for a reasonable (removing cell populations that are generally believed to be not present in a typical tissue) and reliable (retaining only CCIs between cell populations with a certain frequency) prediction. The consequent clean datasets were used in downstream CCI prediction by the CellPhoneDB statistical framework with default parameters. Because the major concern of this study is factors contributing to B cell development, we considered only CCIs where B cell serves as a signal receiver and CCIs mediated by adhesion molecules. The signaling directionality was inferred from columns “receptor_a” and “receptor_b” in the output file (statistical_analysis_pvalues.txt). Adhesion CCIs had their “directionality” annotated as “Adhesion-Adhesion” or “classification” containing the keyword “Adhesion”. In-house scripts were developed to process and visualize the prediction result from CellPhoneDB.

Analysis of external human and mice scRNA-seq data

The external human BM scRNA-seq data were obtained in three formats: raw sequencing reads20,21, gene expression matrices10,19 and well-annotated Seurat objects18,22. For the data type of raw sequencing reads, we first obtained the gene expression matrices following the method described previously. The resultant gene expression matrices, together with the downloaded ones, were subjected to a standard scRNA-seq data analysis workflow, which includes data preprocessing, library size normalization, highly variable gene identification, dimensional reduction and unsupervised clustering. Notably, different samples within a dataset were integrated by the Harmony (v1.2.0) algorithm for efficiency. After the primary clustering, B lineage cells were extracted and subjected to a secondary clustering to identify B cell subpopulations. For the well-annotated Seurat objects, B cells were straightforwardly extracted based on the cell type label provided by the original authors. Afterward, the extracted B cells were subjected to a secondary clustering to identify B cell subpopulations for examining C1QA, C1QB, C1QC, S100A8, and S100A9 expression (Fig. S22). Typically, the BMMC scRNA-seq dataset from young participants (<45 years), as reported by Oetjen et al.10, was employed to substantiate our principal conclusions (Figs. S29 and S30). Cell populations were identified according to the expression of lineage-specific marker genes. RNA velocity and CCI analyses were conducted in a manner consistent with those applied to the primary dataset examined above.

For the scRNA-seq data of mice of different ages, the processed h5ad format data was directly obtained through the figshare website (https://figshare.com/projects/Tabula_Muris_Senis/64982). This well-annotated single-cell data was then imported and processed by Scanpy (v1.9.1). Firstly, B lymphocytes were extracted according to the cell label and the expression levels of S100A8/A9 for each cell were calculated by “sc.pp.calculate_qc_metrics” utility. Subsequently, an average expression level of S100A8/A9 was calculated and its correlation (Spearman’s rank-order correlation) with age was measured using the “stats.spearmanr” function in the “scipy” (v1.7.3) module with Python (v3.9.7) programming.

Peripheral naïve B cell activation assay

PBMCs were isolated from heparinized venous blood of healthy adult donors, obtained with informed consent and under institutional ethical approval, using density-gradient centrifugation with Ficoll-Paque™ PLUS (GE Healthcare). Naïve B cells were subsequently purified from PBMCs with the Human Naïve B Cell Isolation Kit (Stemcell, 17254), following the manufacturer’s instructions. The resulting naive B cell fraction was resuspended in pre-warmed RPMI-1640 medium supplemented with 10% fetal bovine serum (FBS; Sigma-Aldrich) and 1% penicillin/streptomycin.

For stimulation, purified naive B cells (1 × 10^6 cells/mL) were treated with 10 µg/mL F(ab’)₂ goat anti-human IgM antibody (AffiniPure; Jackson ImmunoResearch, 109-036-129) for three time course series (0 h–2 h–6 h, n = 4; 0 min–20 min–1 h–2 h, n = 3; 0 min–1 min–5 min–10 min–20 min, n = 3) at 37 °C in a 5% CO₂ humidified incubator. Reactions were terminated immediately by rapid dilution and transfer into ice-cold phosphate-buffered saline (PBS) containing 10 µM sodium orthovanadate (Sigma-Aldrich). Cells were pelleted by centrifugation (1000 × g, 5 min, 4 °C), snap-frozen in liquid nitrogen, and stored at −80 °C until RNA extraction.

Total RNA was extracted with the RNeasy Mini Kit (Qiagen) incorporating on-column DNase I digestion to eliminate genomic DNA. cDNA was synthesized from 0.1 µg total RNA using the High-Capacity cDNA Reverse Transcription Kit following the manufacturer’s protocol. Quantitative real-time PCR (qPCR) was carried out using Power SYBR™ Green Master Mix (Applied Biosystems) on a StepOnePlus™ Real-Time PCR System in 10 µL reactions. FOS was amplified using (sense) 5’-CCGGGGATAGCCTCTCTTACT-3’ and (anti-sense) 5’- CCAGGTCCGTGCAGAAGTC-3’. HSPA1A was amplified using (sense) 5’- ACCTTCGACGTGTCCATCCTGA-3’ and (anti-sense) 5’- TCCTCCACGAAGTGGTTCACCA-3’. Gene expression levels were determined by the comparative threshold cycle (Ct) method (2^−ΔΔCt), normalized to both the unstimulated control (0 min) and the endogenous housekeeping gene (GAPDH), with all samples analyzed in triplicate.

Murine bone marrow immunochemistry

Left and right femora were aseptically harvested from 8-week-old female C57BL/6J mice in accordance with approved Institutional Animal Care and Use Committee protocols. Residual muscle and connective tissues were meticulously removed. The isolated femora were fixed by immersion in freshly prepared 4% paraformaldehyde (PFA) in 0.1 M phosphate buffer (pH 7.4) under constant rotation at 4 °C for 24 h. Following fixation, bones were thoroughly rinsed in PBS (pH 7.4) and decalcified in 10% (w/v) ethylenediaminetetraacetic acid (EDTA, Sigma-Aldrich, E9884; pH 8.0 adjusted with NaOH) under gentle continuous stirring at 4 °C for 14 days, with solution refreshed every 48 h. Completion of decalcification was verified by the absence of resistance to fine-needle probing.

Decalcified femora were rinsed in PBS, dehydrated through a graded ethanol series (70–100%), cleared in xylene, and infiltrated with molten paraffin wax (Paraplast X-tra, Leica Biosystems) under vacuum at 60 °C before embedding in the longitudinal orientation. Serial sections (5 μm thickness) were prepared using a rotary microtome and mounted on positively charged glass slides.

For five-color immunofluorescence staining, deparaffinized sections underwent heat-induced antigen retrieval in 10 mM sodium citrate buffer (pH 6.0) at 95–98 °C for 20 min, followed by permeabilization and blocking with 5% bovine serum albumin (BSA, Sigma-Aldrich) and 0.3% Triton X-100 in PBS for 1 h at room temperature. Sections were sequentially incubated overnight at 4 °C with the following primary antibodies diluted in antibody diluent (Dako, Agilent): Anti-CD19 Rabbit pAb (ServiceBio, GB11061-1; 1:2000), Anti-CD11b Rabbit pAb (ServiceBio, GB115689; 1:5000), rabbit anti-mouse CD43 (Abcam, ab317313; 1:1000), Anti-Mouse Kappa chain antibody (abinScience, MB980013; 1:500) and Anti-Mouse Lambda chain antibody (abinScience, MP678013; 1:500). After thorough washing, sections were incubated for 1 hour at room temperature in darkness with appropriate fluorophore-conjugated secondary antibodies. Nuclei were counterstained with DAPI (1 μg/mL), and coverslipped using Fluoro-Gel II mounting medium. Imaging was performed using a laser scanning confocal microscope (LSM 900, Carl Zeiss).

Cell isolation from mouse organs and fluids

Male (n = 9) and female (n = 9) C57BL/6J mice were purchased from the Guangdong Medical Laboratory Animal Center. All animal experiments were performed in strict accordance with the ethical guidelines of Guangdong Provincial People’s Hospital and approved by the Ethics Committee of Guangdong Provincial People’s Hospital. These experiments adhered strictly to the principles of animal research. The mice were housed in cages under specific pathogen-free (SPF) conditions. Young mice (Y) were 3 months old, middle mice (M) were 12 months old, and old mice (O) were 22 months old.

Murine PBL was aseptically collected from the inferior palpebral vein and dispensed into EDTA-coated tubes to prevent coagulation. Subsequently, the total BM was meticulously flushed from the marrow cavities of the femurs and tibias employing a calibrated 1 ml syringe. This BM suspension was sequentially passed through a 70 µm nylon mesh cell strainer to obtain a purified single-cell suspension in PBS. Additionally, spleens were excised from mice and subjected to mechanical dissociation by gently pressing them through a 70-μm nylon cell strainer utilizing a rubberized 1 ml syringe piston. This ensured gentle yet efficient disaggregation of the splenic tissue into a cellular suspension. Subsequently, the red blood cells were selectively lysed from the PBL, BM and splenic cell suspensions using an ammonium chloride-potassium (ACK) lysis buffer. This step facilitated the removal of erythrocytes, thereby enriching the final cell suspensions with leukocytes for further scientific analysis.

Cell staining and flow cytometric analysis

BM, PBMC and spleen cell suspensions were meticulously collected and subsequently centrifuged at 300 × g for 10 min at 4 °C. The resulting cell pellet was gently resuspended in a solution of PBS supplemented with 0.2% bovine serum albumin (BSA), achieving a final cell concentration of 2 × 10^7 cells per milliliter. The cell suspensions were treated with an Fc Block agent (BD Biosciences, 564765) at a predetermined concentration, followed by a 15-min incubation period at room temperature. Subsequently, the cells were incubated with a precisely formulated cocktail of fluorescently conjugated anti-mouse antibodies, inducing CD45-PE (BD, 553081), CD19-APC (BD, 561738), CD27-BV605 (BD, 563365), IgM-BUV395 (BD, 566217), IgD-BV510 (BD, 563110) and S100A8-FITC (Novus Biologicals, NBP2-25269F). The incubation was carried out in the dark at 4 °C for 30 min. Then the cells were washed twice with PBS supplemented with 0.2% BSA. Following the washes, the cells were resuspended and prepared for flow cytometric analysis. Flow cytometric data acquisition was performed on a suitable flow cytometer equipped with lasers (BD FACSymphony™ S6) and filters corresponding to the fluorochromes used in the antibody cocktail. Prior to data acquisition, compensation settings were adjusted using single-stained controls to correct for spectral overlap between the fluorochromes. The collected flow cytometric data were then analyzed using FlowJo software (FlowJoTM v10).

RNA extraction and RT‒qPCR analyses

Cell pellets were lysed in 1 ml of TRIZOL reagent (Life, 15596026) through vigorous mixing until a uniform lysate was achieved. To facilitate phase separation, 0.2 ml of chloroform was added to the lysate, which was then thoroughly mixed and allowed to equilibrate for 10 min at room temperature. Following this, centrifugation was performed at 12,000 × g for 15 min at 4 °C to separate the phases. The upper aqueous layer, enriched in RNA, was carefully transferred to a clean tube. Subsequently, 0.5 ml of isopropanol was added to the aqueous phase, and the mixture was gently agitated and allowed to stand for an additional 10 min. The samples were then centrifuged at 12,000 × g for 10 min at 4 °C to precipitate the RNA. The resulting RNA pellets were washed with 1 ml of 75% (v/v) ethanol by centrifuging at 7500 × g for 5 min at 4 °C. After removing the ethanol, the RNA pellets were air-dried for 5–10 min to ensure the removal of residual solvents. Finally, the dried RNA pellets were resuspended in RNase-free water to yield the purified RNA samples for downstream applications.

Reverse transcription was conducted utilizing SuperScript™ IV Reverse Transcriptase (Invitrogen, 18091050) as the enzymatic catalyst. Subsequently, quantitative polymerase chain reaction (qPCR) analyses were performed employing a TB-Green-based PCR kit (Takara, RR82WR) in conjunction with a Real-Time PCR system (Bio-Rad). The determination of fold changes in gene expression was achieved through the application of the comparative cycle threshold (∆∆CT) method. As an internal control, Actb served as the housekeeping gene to normalize data variations. The complete set of primer sequences utilized in this study was as follows: S100a8, AAATCACCATGCCCTCTACAAG and CCCACTTTTATCACCATCGCAA; S100a9, ATACTCTAGGAAGGAAGGACACC and TCCATGATGTCATTTATGAGGGC; Actb, CCCTGAAGTACCCCATTGAAC and CCTTTCACGGTTGGCCTTAG.

Lymphocyte cultures

Human BM aspirates (5–10 ml), collected from nine herniated disk patients free from hematological disorders (53–66 years, n = 3; 12–76 years, n = 6) following written informed consent, underwent BMMC isolation via density gradient centrifugation utilizing Ficoll-Paque (1.078 g/mL) as the separating medium. This process entailed overlaying the diluted BM fraction onto Ficoll-Paque and subsequently centrifuging the specimen at 600 × g for 30 min at ambient temperature. Following centrifugation, BMMCs were carefully aspirated from the buffy coat interface and washed twice with PBS. Subsequently, the procured BMMCs were divided into two distinct fractions. One fraction was promptly introduced into the culture medium. The number of cells in this fraction was determined by counting using the TC10 automated cell counter (Bio-Rad Laboratories), and the cell density was adjusted to 1 × 10⁶ cells/mL in the culture medium. The second fraction was subjected to negative selection employing the BD IMag™ Human B Lymphocyte Enrichment Set (BD, 558007), adhering strictly to the manufacturer’s protocol, to isolate Pan-B cells. Briefly, the BMMCs were first incubated with the appropriate amount of the BD IMag™ Human B Lymphocyte Enrichment Set-DM (a cocktail of biotinylated antibodies specific for non-B cell surface markers) for 15 min at room temperature. Then, streptavidin-coated magnetic particles were added, and the mixture was incubated for another 15 min at room temperature. The tube was placed in a magnetic separator for 8 min to separate the non-B cells (which were bound to the magnetic particles) from the Pan-B cells (which remained in the supernatant). The supernatant containing the Pan-B cells was carefully transferred to a new tube. The resulting Pan-B cell population was resuspended in SCGM serum-free culture medium (CellGenix) at a cell density of 1 × 106 cells/mL. Then, 200 μl of the cell suspension (equivalent to 2 × 105 cells per well) was plated onto each well of a 96-well flat-bottomed culture plate, maintained under standard conditions of 37 °C and 5% CO2 for 96 h. Furthermore, the B-cell line ARH-77 was employed as a comparative reference to ensure rigorous experimental control with a cell density of 1 × 106 cells/mL and 200 μl per well (2 × 105 cells per well). Additionally, a parallel experiment involving a complete culture medium devoid of any cellular components was conducted under identical environmental conditions, serving as a blank control to mitigate potential confounding factors. Lastly, cell viability was assessed using the TC10 automated cell counter (Bio-Rad Laboratories) and was consistently above 85% at 96 h. Note that only the Pan-B cell population was included in the comparative analysis between elder and young donors (Fig. S28).

ELISA

The concentrations of S100A8/A9 in the culture medium were quantitatively determined utilizing a Human Calprotectin ELISA Kit (S100A8/A9) (Abcam, ab267628), adhering strictly to the manufacturer’s prescribed protocol. The standard curve was generated by diluting the provided standard solution according to the kit instructions. Each standard solution (100 μl) was dispensed into individual wells of the 96-well plate in duplicate. For the experimental samples, 10 μl of the culture medium from each well of the cell-cultured plates was carefully transferred to the corresponding wells of the ELISA plate (90 μl diluent was pre-added). The plate was then incubated at room temperature for 2 h with gentle shaking (approximately 100 rpm) to allow the S100A8/A9 proteins in the samples to bind to the primary antibodies on the plate. After the incubation, the wells were washed three times with the provided wash buffer to remove any unbound substances. Then, 100 μl of a biotinylated anti-calprotectin antibody was added to each well, and the plate was incubated at room temperature for 1 h with gentle shaking. This step was followed by another three-wash cycle to remove the unbound biotinylated antibody. Next, 100 μl of HRP-conjugated streptavidin was added to each well, and the plate was incubated for 45 min at room temperature with gentle shaking. After a final three-wash cycle, 100 μl of the TMB (3,3’,5,5’ - tetramethylbenzidine) substrate solution was added to each well. The plate was then incubated in the dark for 15–30 min to allow the colorimetric reaction to occur. The reaction was stopped by adding 100 μl of the stop solution provided in the kit. The optical densities (ODs) at 450 nm were recorded using the microplate reader. The concentrations of the S100A8/A9 heterodimer in the experimental samples were derived from the absorbance values by interpolation against the standard curve.

Statistics and reproducibility

Statistical analyses in this study were performed using the Python SciPy module (v1.7.3). Specifically, two-independent-sample t-tests used in Fig. 5E-H, Fig. 8D, Fig. S13, Fig. S24, and Fig. S25 were conducted with the “stats.ttest_ind” function. Spearman’s correlation coefficients shown in Fig. 8C and Fig. S23C, D were calculated using the “stats.spearmanr” function.

Fisher’s exact tests and multiple-testing corrections in Fig. 3D, E and Fig. 4C, D were performed using the Enrichr online web server, as described in the Methods section. A p value < 0.05 was considered statistically significant.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.