Main

Hematopoiesis orchestrates the lifelong production of all differentiated blood and immune cells throughout development, adulthood and old age. Due to its close connection to virtually any organ in the human body, the hematopoietic system is a major attribute of systemic health1,2,3. Blood production by hematopoietic stem and progenitor cells (HSPC) is tightly regulated by both cell-intrinsic and extrinsic mechanisms that balance HSPC self-renewal and multilineage differentiation, maintain hematopoietic homeostasis, allow rapid and controlled responses to stress and prevent disease.

The composition and function of the hematopoietic system undergo substantial changes throughout human life. Extensive research has elucidated key features of human hematopoietic aging, including a shift from lymphoid-biased to myeloid-biased output4,5, an increase in the relative frequency of HSPC4 and a concomitant decline in their regenerative capacity6,7,8. Although differences between hematopoiesis in the developing fetus, young adults and older individuals have been studied in detail6,9,10,11, there is a relative paucity of studies defining the cellular and molecular composition of hematopoiesis during human childhood11,12,13. Comprehending pediatric hematopoiesis is crucial to understand the development of the human hematopoietic system and the origins of pediatric hematologic diseases.

A major challenge in characterizing pediatric hematopoiesis is obtaining bone marrow (BM) samples from healthy children. Additionally, several key hematopoietic and nonhematopoietic cell types, including HSPC and mesenchymal stromal cells (MSC), are rare in BM aspirates, necessitating specific enrichment strategies to capture sufficient cells for in-depth profiling14. Consequently, much of our current understanding of pediatric hematopoiesis is based on studies that may not fully capture all relevant cell types. Technologies that simultaneously measure mRNA and surface protein expression in single cells provide unprecedented opportunities to dissect the composition of hematopoiesis in health and disease15,16. Compared to flow cytometry-based approaches or unimodal single-cell RNA sequencing (scRNA-seq), these methods allow improved cell-type identification and more accurate detection of cell states17. Furthermore, as mRNA levels do not always correlate with protein expression17, the surface protein modality allows validation that specific transcripts of interest are indeed expressed and may have functional consequences.

Here, we performed multimodal profiling of BM cells from healthy individuals aged 0–32 years, revealing age-related changes in cell frequencies, states and differentiation trajectories between young pediatric (YP) donors (<10 years) and adolescent and young adult (AYA) donors (≥13 years). We identified phenotypic and transcriptional differences within the lymphoid progenitors (LyP) cell population underlying B cell-biased hematopoiesis in young children. Interaction analysis and BM cytokine measurements uncovered age-dependent changes in signaling from MSC that may differentially prime LyP toward stable or B cell-biased output. This comprehensive, multimodal pediatric BM atlas provides a valuable reference for studies on hematopoietic development and hematologic diseases, many of which may originate in childhood.

Results

Reference map shows the single-cell composition of pediatric BM

To uncover the cellular and transcriptional landscape of human pediatric BM, we established an experimental pipeline enabling single-cell analysis of rare BM cell types, while preserving information on native cell frequencies (Fig. 1a and Supplementary Fig. 1). BM samples were obtained from seven healthy pediatric donors (4 female and 3 male, age 2.1–16.7 years, median of 9.6 years), all of whom served as donors for allogeneic hematopoietic cell transplantation (Supplementary Table 1). To enable both in-depth analysis of HSPC and MSC and accurate estimation of representative cell frequencies in a cost-efficient manner, three cell fractions were isolated from each donor: a nonenriched, erythrocyte-depleted CD235a fraction, an HSPC-enriched CD235aCD34+ fraction and an MSC-enriched CD235aCD45loCD90+ or CD235aCD45loCD271+ fraction. Enriched cell fractions from each donor were pooled with the nonenriched cell fractions of a genotypically distinct donor (Supplementary Fig. 1). Donor-specific single-nucleotide variants (SNVs) enabled postsequencing assignment of cells to enriched and nonenriched fractions to determine native cell frequencies. Multiplexed libraries were processed using 10x Genomics-based simultaneous single-cell profiling of transcriptome and surface protein expression and a customized panel of 138 oligonucleotide-conjugated antibodies (Fig. 1a).

Fig. 1: Multimodal single-cell reference map of healthy pediatric BM.
figure 1

a, Schematic overview of the experimental pipeline for CITE-seq, in which BM mononuclear cells were collected from seven healthy pediatric donors aged 2.1–16.7 years (3 male and 4 female; Supplementary Table 1). CD235aCD34+ HSPC and CD235aCD45loCD90+ or CD235aCD45loCD271+ MSC were enriched from each sample by flow cytometry and combined with the nonenriched fraction from a genetically distinct donor. Pooled samples were subjected to CITE-seq using 138 oligonucleotide-conjugated antibodies on the 10x Genomics platform. b, CITE-seq-based weighted nearest neighbor (WNN) uniform manifold approximation and projection (wnnUMAP) of 68,094 cells from pediatric BM aspirates (n = 7) showing 28 clusters identified and annotated based on joint transcriptomic and protein expression patterns and grouped into eight major cell populations: HSPC (8 subsets), B and plasma cells (6 subsets), T and NK cells (7 subsets), myeloid populations (3 subsets), dendritic cells (2 subsets), erythroid cells (1 subset) and stromal populations (1 subset). c, wnnUMAP depicting the relative contribution of the combined HSPC-enriched and MSC-enriched (enriched) versus the nonenriched cell fractions as in b. d, Top differentially expressed genes for all 28 clusters, grouped by major cell populations as defined in b. e, Top differentially expressed surface markers for all 28 clusters, grouped by major cell populations as defined in b. MNC, mononuclear cell; HSPC, hematopoietic stem and progenitor cell; HSC, hematopoietic stem cell; LMPP, lympho-myeloid primed progenitor; MEP, megakaryocyte–erythroid progenitor; MkP, megakaryocyte progenitor; EryP, erythroid progenitor; MCP, mast cell progenitor; TN, naive T cell; TM, memory T cell; Treg cell, regulatory T cell; MAIT cell, mucosal-associated invariant T cell; ProgDC, dendritic cell progenitor; cDC, conventional dendritic cell; pDC, plasmacytoid dendritic cell; Ery, erythroid; DC, dendritic cell; Avg, average.

After quality control and thresholding, the pediatric dataset comprised 68,094 high-quality cells, with a median of 5,133 transcripts and 2,001 unique genes per cell for the RNA modality and 1,307 counts and 114 unique proteins per cell for the protein modality, respectively (Extended Data Fig. 1a–h). Integration across donors and measurement modalities revealed 28 high-resolution clusters. These clusters were annotated using published datasets11,12,13, canonical marker genes and differentially expressed transcripts and proteins, resulting in 8 major hematopoietic populations and 28 subpopulations: HSPC (8 subsets, including hematopoietic stem cells (HSCs), lympho–myeloid primed progenitors (LMPP), megakaryocyte–erythroid progenitors, megakaryocyte progenitors, erythroid progenitors, pro-myelocytes, mast cell progenitors and LyP), B cells and plasma cells (6 subsets), T cells and natural killer (NK) cells (7 subsets), myeloid populations (3 subsets) and dendritic cells (2 subsets; Fig. 1b and Supplementary Table 2). Erythroid and stromal cells contained no further subsets (Fig. 1b). All donors contributed to every subset (Extended Data Fig. 1i). HSPC and MSC were predominantly derived from the enriched fractions (13,225 HSPC in enriched versus 2,689 in nonenriched fractions, 4.9-fold enrichment; 541 MSC in enriched versus 29 in nonenriched fractions, 18.7-fold enrichment; Fig. 1c). All HSPC subsets expressed key HSC-defining transcripts (for example, NPM1; Fig. 1d) and surface proteins (for example, CD112 and CD34; Fig. 1e), whereas genes such as AVP (HSCs and LMPP) or MPO (pro-myelocytes) distinguished specific subsets (Fig. 1d,e and Supplementary Tables 26). Integration with public datasets18,19 confirmed cluster annotations (Extended Data Fig. 2). Because cell annotation in the external datasets relied on RNA expression only18,19, there was greater variability in the classification of cell types that require surface protein modality weights for accurate annotation, such as T cell subsets (Supplementary Fig. 2). This observation aligns with previous reports20, underscoring the added value of multimodal profiling for high-resolution cell-type mapping. The complete pediatric BM atlas generated in this study is available as a Seurat object and Loupe browser-compatible resource through Zenodo at https://doi.org/10.5281/zenodo.14168864 (ref. 21).

Age-related cellular and transcriptional shifts in BM

Next, we leveraged the CITE-seq reference atlas to explore developmental changes in BM composition from infancy to adolescence. Analysis of nonenriched cell fractions revealed a high abundance of B lineage cells in donors aged 2.1–9.6 years (n = 4, 2 male and 2 female; Supplementary Table 1), which occupied the majority of BM cellularity (Fig. 2a), consistent with previous observations in peripheral blood22,23. By contrast, BM composition of the donors aged 13.3–16.7 years (n = 3, 1 male and 2 female; Supplementary Table 1) exhibited a dominance of myeloid and T cells (Fig. 2a). To validate whether the observed compositional differences persisted into young adulthood, we expanded our dataset to include two adult BM donors (20.7 and 31.3 years, both female, 11,278 and 11,338 cells; Supplementary Table 1). BM composition in these two adult individuals closely resembled that of the donors aged 13.3–16.7 (Fig. 2a), allowing us to define two age categories: YP (age <10 years) and AYA (age 13–31 years; Fig. 2b). B lineage cells were significantly higher in the YP group (median 49.7% versus 13.9%), whereas T cells (median 18.6% versus 10.4%) and myeloid cells (median 51.7% versus 24.0%) were enriched in the AYA group (Fig. 2c). To validate these age-related differences, we performed spatial transcriptomics on BM trephine biopsies from an independent cohort of three YP (two male participants and one female participant, ages 0.7, 1 and 1.5 years) and three AYA donors (all male participants, ages 13.5, 17 and 23 years; Supplementary Tables 7 and 8). Analyses of cell composition in these BM biopsies revealed a higher abundance of B lineage cells in YP samples (median 34.6% in YP versus 9.1% in AYA), whereas myeloid cells were more prevalent in AYA samples (median 36.1% in YP versus 58.2% in AYA; Fig. 2d,e and Extended Data Fig. 3). Together, these findings indicate that YP BM is distinct from AYA BM and is characterized by a shift from B cell lineage bias to increased T cell and myeloid output from infancy to young adulthood.

Fig. 2: B cell lineage bias in YP BM is driven by a distinct transcriptional program in LyP.
figure 2

a, Frequencies of cell subsets (as in Fig. 1b) in CITE-seq data of nonenriched BM aspirates of four YP (2.1–9.6 years, two male and two female) and five AYA (13.5–31.3 years, one male and four female) healthy donors. B lineage cells (YP BM, median 49.7%; AYA BM, median 13.9%, P = 0.016), T cells (YP BM, median 18.6%; AYA BM median 10.4%, P = 0.56) and myeloid cells (YP BM median 51.7%; AYA BM median 24.0%, P = 0.063; two-sided Wilcoxon rank-sum test). b, Principal component analysis of BM composition in YP (n = 4) and AYA (n = 5) samples as in a. Dots represent individual BM aspirates, and arrows represent vectors reflecting each cell population’s contribution to the principal components. c, Frequencies of cell subsets (as in Fig. 1b) in BM aspirates from YP (n = 4) and AYA donors (n = 5). d, Representative BM cores from Xenium-based spatial transcriptomics of BM trephine biopsies from YP (n = 3, 0.7–1.5 years, two male participants and one female participant) and AYA donors (n = 3, 13.5–23 years, all male participants). B lineage cells were overlaid to allow consistent comparisons across samples; scale bar, 300 μm; yrs, years. e, Relative abundance of BM cell subsets (as in Fig. 1b) in spatial transcriptomics data from YP (n = 3, 0.7–1.5 years, two male participants and one female participant) and AYA (n = 3, 13.5–23 years, all male participants) BM biopsies. f,h, Radar plot showing module expression of differentially expressed genes in HSCs (f), LMPP (g) and LyP (h) from YP (n = 4, 2.1–9.6 years, two male and two female participants) and AYA donors (n = 5, 13.5–31.3 years, one male and four female participants) across BM cell subsets as in Fig. 1b. Genes upregulated in YP or AYA BM cell subsets (Supplementary Table 9) were aggregated into module scores representing the average expression of age-associated transcriptional programs and assessed across all 28 cell clusters to evaluate their potential lineage-biasing effects. BaEoMaP, progenitors of basophils, eosinophils and mast cells; MkP/Mk, megakaryocyte progenitor cell or megakaryocyte; AEC/VSMC, arteriolar endothelial cell or vascular smooth muscle cells; SEC, sinusoidal endothelial cell; Nonclass., nonclassical.

YP B lineage cell bias originates from LyP cells

We next explored whether the observed age-related differences in B, T and myeloid cell frequencies were accompanied by transcriptional changes in each of these populations or their upstream progenitors. To determine whether these frequency shifts reflected a general aging program shared across hematopoietic cell types or were restricted to specific hematopoietic cell populations, we compared single-cell transcriptomes and surface marker expression of each of the 28 BM subclusters in our CITE-seq data (as defined in Fig. 1b) between YP and AYA BM. Differential expression analysis identified ~5,000 genes (Extended Data Fig. 4a and Supplementary Table 9) and ~200 surface proteins differentially expressed between YP and AYA BM populations (Extended Data Fig. 4b and Supplementary Table 10). Among the 5,000 differentially expressed genes, most were cell-type specific: 67% of genes upregulated in YP BM and 62% of genes upregulated in AYA BM were significantly altered in only a single cell population (for example, restricted to HSCs, pro-B cells or classical monocytes; Extended Data Figs. 4 and 5). By contrast, a small number of genes, such as CD59 (higher expression in AYA BM) and CD74 (higher expression in YP BM) were consistently up- or downregulated across multiple cell types (Extended Data Figs. 4 and 5). The shared genes included genes previously linked to aging in adults, such as CTSW24 or KLF6 (ref. 25; Supplementary Table 10). These observations implied that hematopoietic cells underwent predominantly cell-type-specific transcriptional changes rather than a uniform maturation program active across all populations during maturation from YP to AYA stages.

To assess whether the differentially expressed genes between YP and AYA BM imposed a lineage bias toward B cell lineage versus myeloid fates, we calculated lineage-specific module scores. These scores were generated by aggregating the differentially expressed genes between YP and AYA cell populations (Supplementary Table 9) and subsequently assigning to each individual cell in our CITE-seq data. Average module expression was then visualized across all 28 BM cell populations (as defined in Fig. 1b and Supplementary Figs. 3 and 4), focusing on populations comprising the differentiation trajectory from HSC to the B cell lineage (HSC, LMPP and LyP; Fig. 2f–h). The modules derived from the differentially expressed genes in HSC (for example, IGFBP2 and PROK2; Supplementary Table 8) and LMPP (for example, CD24; Supplementary Table 9) were uniformly distributed between B cell and myeloid populations (Fig. 2f,g and Supplementary Table 9), indicating that although early progenitors (HSC and LMPP) underwent transcriptional changes during maturation from YP to AYA, these differences did not confer a lineage differentiation bias. By contrast, the modules derived from YP-upregulated LyP genes (for example, EBF1 and CD79A; Supplementary Table 9) were enriched in B cell populations (pro-B, pre-B, naive/memory B and plasma cells) compared to other BM cell populations (Fig. 2h and Supplementary Fig. 3). Conversely, gene modules upregulated in AYA LyP versus YP LyP (for example, CD37 and SPINK2; Supplementary Table 9) were enriched in HSC, LMPP, pro-myelocytes, myelocytes, classical monocytes and nonclassical monocytes; Supplementary Fig. 3). At the single-gene level, expression of EBF1 and other B cell lineage genes, including BACH2 and CD79B, was higher in YP than in AYA samples, starting at the LyP stage and continuing through the pro-B, pre-B and naive/memory B cell stages, but not in earlier LMPP or HSC stages (Extended Data Fig. 6). Overall, these analyses indicate substantial developmental changes in BM composition and transcriptional programs between YP and AYA groups, with a prominent B cell lineage bias in HSPC emerging (and potentially regulated) at the LyP stage.

Distinct LyP subsets correlate with age-related lineage bias

Next, we further dissected the LyP population to investigate whether heterogeneity within this population could explain the differences in B cell lineage versus myeloid bias between YP and AYA BM. Subclustering of LyP cells revealed two distinct populations with unique transcriptomic and proteomic features (subclusters 1 and 2; Fig. 3a). Both subclusters exhibited hallmark features of LyP, including expression of canonical LyP transcripts (for example, IGLL1 and DNTT) and surface proteins (CD34+CD90CD45RA+; Extended Data Fig. 7). This identity was further validated by mapping to the Deeply Integrated Human Single-Cell Omics (DISCO) external reference dataset, where both subclusters were annotated as LyP (Extended Data Fig. 8)19,26,27. Subcluster 1 showed high expression of B lineage cell-associated genes (EBF1, CD79A and VPREB1) and high surface protein expression of CD127 (interleukin-7 (IL-7) receptor; Fig. 3b,c and Supplementary Tables 11 and 12) and is referred to hereafter as B cell biased (LyP-B). Subcluster 2 showed high expression of genes associated with stemness (for example, SPINK2, ABCB1 and LRMDA), myeloid (LGALS1, CSF3R and CLEC12A) and lymphocyte lineage genes (CD37 and SPIB; Fig. 3b). Cells in this cluster also displayed increased protein expression of both myeloid (for example, CD123 and CD33) and lymphoid (for example, CD18 and CD155) markers (Fig. 3c) and are referred to as stable LyP (LyP-S). Gene Ontology term enrichment analysis supported this functional distinction, with LyP-B cells enriched for pathways related to B cell receptor signaling and lymphocyte proliferation and differentiation, whereas LyP-S cells were enriched for pathways related to myeloid leukocyte activation, mononuclear cell differentiation and B cell activation (Fig. 3d). Inference of gene regulatory networks showed increased activity of B lineage cell-associated regulons in LyP-B cells (for example, PAX5, LEF1 and TCF3)28,29, whereas LyP-S cells were enriched for regulons associated with general lymphoid development and stemness (for example, ELF4, RXRA and KLF4; Fig. 3e and Supplementary Table 13)30,31. To visualize both LyP subsets within the hematopoietic hierarchy, we created a diffusion map from our CITE-seq data, incorporating the B cell and myeloid trajectories (from HSCs to LMPP, LyP, cycling pro-B cells and pro-B cells and from HSCs to LMPP, pro-myelocytes, myelocytes and classical monocytes; Fig. 1b and Extended Data Fig. 7b). When annotated onto this trajectory, LyP-B cells localized closer to pro-B cell populations, whereas LyP-S cells positioned closer to LMPP and pro-myelocytes (Extended Data Fig. 7e), supporting the notion that although both LyP subsets arose downstream of LMPP, LyP-B cells exhibited a B cell lineage bias whereas LyP-S cells maintained broader, more multipotent potential. Although both LyP subsets were present across all individuals, LyP-B cells were significantly more abundant in YP BM, whereas LyP-S cells predominated in AYA BM (median 74.4% versus 26.9% LyP-B; Fig. 3f). Of note, in one of the two adult individuals (age 20.7), LyP-B cells were only detected in the enriched fraction (Fig. 3f), emphasizing the rarity of this subset in adults. Cell cycle analysis showed a trend toward higher proportions of cycling LyP-B cells (S + G2/M) in YP samples than in AYA samples (median 45.2% versus 24.5%; Fig. 3g), which might have contributed to their increased abundance. Moreover, the frequency of LyP-B cells in each individual was significantly correlated with the overall percentage of B lineage cells (Fig. 3h). To validate these age-associated patterns, we reanalyzed a public scRNA-seq dataset of CD34+ cells from 15 BM donors (ages 2–77 years; 8 male and 7 female donors)32 and reclassified LyP into LyP-B or LyP-S cells (Extended Data Fig. 8a). This analysis confirmed that LyP-B cells were more abundant in young donors (ages 2–12 years, median 70.1% of all LyP) than in AYA donors (ages 17–77 years, median 60.4% of all LyP; Extended Data Fig. 8b,c). Conversely, LyP-S cells were increased in AYA donors (median 29.4% in YP versus 39.6% in AYA; Extended Data Fig. 8b,c). Thus, in line with previous reports22,33, we found that two phenotypically and functionally distinct subpopulations of LyP exist in healthy BM, the abundance of which was age-dependent and associated with a systemic bias toward B lineage cells in YP BM.

Fig. 3: Distinct LyP subsets correlate with age-related lineage bias.
figure 3

a, UMAP plot of LyP-S cells (n = 690 cells) and LyP-B cells (n = 649 cells) in CITE-seq data of BM aspirates of YP and AYA donors combined (n = 9; 2.1–31.3 years, three male and six female donors). b,c, Volcano plot depicting genes (b) and surface markers (c) differentially expressed between LyP-B and LyP-S subclusters, as in a. d, Gene Ontology biological process enrichment of differentially expressed genes between LyP-B and LyP-S cells using clusterProfiler. Dot size indicates the GeneRatio, and dot color represents the adjusted P value. Enrichment significance was assessed using a hypergeometric test with Benjamini–Hochberg false discovery rate correction. e, Volcano plot of differentially active transcription factor regulons between LyP-B and LyP-S cells, inferred using single-cell regulatory network inference and clustering in Python (pySCENIC) and quantified using AUCell scores at the single-cell level. Differential regulon activity was assessed using a two-sided Wilcoxon rank-sum test as implemented in Seurat, with Bonferroni correction for multiple testing. f, Relative proportions of LyP-B and LyP-S subsets within the total LyP population (nonenriched fraction) for each individual (n = 9; Supplementary Table 1). LyP-B (YP BM, median 74.4%; AYA BM, median 26.9%; Wilcoxon rank-sum test P = 0.016). g, Stacked bar plots showing cell cycle phase assignment in LyP-B and LyP-S cells based on the combined enriched and nonenriched fractions for each individual (LyP-B cells in S + G2/M (YP, median 45.2%; AYA, median 24.5%; P = 0.11, Wilcoxon rank-sum test); LyP-S cells in S + G2/M (YP, median 35.3%; AYA, median 41.4%; P = 0.11, Wilcoxon rank-sum test)). h, Correlation plot depicting the Pearson correlation between the relative abundance of LyP-B cells within total LyP cells versus the percentage of B lineage cells (nonenriched) in each BM aspirate. Each dot represents one individual.

Age-dependent stromal signals drive LyP lineage bias

B lineage cell differentiation is regulated, in part, by cell-extrinsic signals from MSC within the BM niche34,35,36,37,38. To explore whether stromal signaling regulates LyP subsets in an age-dependent manner, we used NicheNet to infer ligand–receptor interactions that differed between LyP-B and LyP-S cells39. We prioritized expressed ligands based on their predicted regulatory influence on the differentially expressed genes between LyP-B and LyP-S cells, such as EBF1, ZCCHC7 and GLRX (LyP-B) and CD37, ATP8B4 and LGALS1 (LyP-S) and the differential expression of the receptors matching these ligands between LyP-B and LyP-S cells. The growth factor IGF2, the lymphoid-supporting cytokine IL-7 and the immunoglobulin superfamily member BTLA emerged as the top candidate ligands for LyP-B cells (Fig. 4a). IL-7 demonstrated the strongest predicted influence on B lineage cell transcription factors, including EBF1, LEF1 and PAX5 (Fig. 4a). By contrast, the pleiotropic cytokine transforming growth factor-β1 (TGFβ1) was identified as the top candidate regulator for LyP-S cells (Fig. 4a). Both ligands showed cross-regulatory potential: IL-7 affected certain LyP-S target genes (for example, LGALS1, RFLNB and MAP3K8), whereas TGFβ1 influenced certain LyP-B genes (for example, HMGB1, SOCS2 and IL7R; Fig. 4a), supporting their reciprocal regulatory relationship34,35,37. Receptor analysis of our CITE-seq data showed significantly increased mRNA expression of the IL-7 receptor α-chain (IL-7Rα, encoded by IL7R; fold change (FC): 7.01) and its co-receptor, the common γ-chain (IL2RG; FC: 1.45), on LyP-B cells. This was confirmed at the protein level via increased expression of CD127 (IL-7Rα; Fig. 3c). Conversely, LyP-S cells expressed more TGFBR1 mRNA (FC: 2.26; Fig. 4b). To identify the cellular source of these ligands in BM, we mapped IL7 and TGFB1 expression across all BM cell types in our CITE-seq data (as defined in Fig. 1b). Although TGFB1 was broadly expressed across dendritic, myeloid, NK and T cells, IL7 expression was mostly restricted to MSC (Fig. 4c). Both IL7 and TGFB1 showed age-dependent patterns: IL7 mRNA expression was higher in YP MSC than in AYA MSC, whereas TGFB1 mRNA was higher in AYA BM cells than in YP BM cells (Fig. 4d,e). To validate the age-dependent availability of IL-7, we quantified concentrations of 12 cytokines and growth factors, including IL-7, in BM plasma (defined as the cell-free fraction of BM aspirates) using a bead-based multiplex immunoassay in an independent cohort of 78 donors (‘LegendPlex cohort’, 0–20 years; median age of 8 years; 47 male and 31 female donors; Supplementary Table 14). IL-7 concentrations were significantly higher in YP than in AYA BM plasma (2.6 versus 2.0 pg ml−1; Fig. 4f). Other cytokines, such as IL-15, IL-3 or IL-34, showed no age-dependent differences (Fig. 4f). These findings aligned with prior reports identifying BM and thymic stromal cells as the primary sources of IL-7 (ref. 40) and with studies showing elevated concentrations of IL-7 in the blood plasma of young children (0–5 years) relative to AYAs (22–53 years)41.

Fig. 4: Age-dependent HSPC–stromal interactions are linked to LyP lineage bias.
figure 4

a, NicheNet interaction analysis of differential interactions between LyP-B (n = 690 cells) and LyP-S (n = 649 cells) cells in CITE-seq data from BM aspirates of YP and AYA donors combined (n = 9; 2.1–31.3 years, three male, six female donors; Supplementary Table 1), showing the top three predicted ligands underlying the observed upregulation of LyP-B cell genes (top rows) and LyP-S cell genes (bottom rows), prioritized based on regulatory potential and differential receptor expression between the two LyP cell subsets. The genes shown represent the top 30 differentially expressed genes in LyP-B (left) or LyP-S (right) cells predicted to be regulated by these ligands. The regulatory potential of LyP-B cell ligands on LyP-S cell genes (top right quadrant) and vice versa (bottom left quadrant) is also shown. b, Dot plot showing the average expression of IL7R, IL2RG, TGFBR1 and TGFBR2 in LyP-B and LyP-S cell subsets (IL7R (LyP-B versus LyP-S; FC: 7.01, P < 0.001), IL2RG (LyP-B versus LyP-S; FC: 1.45, P = 0.002) and TGFBR1 (LyP-S versus LyP-B; FC: 2.26, P < 0.001; two-sided Wilcoxon rank-sum test with Benjamini–Hochberg false discovery rate correction)). Dot color indicates scaled receptor expression, and dot size represents the percentage of expressing cells. ce, Dot plot showing the average mRNA expression of IL7 and TGFB1 in all BM cell types from CITE-seq data (c), in all BM cell types combined per individual (d) and in MSC only per individual (e). f, Box plots showing the concentrations of IL-7, IL-3, IL-15 and IL-34 protein measured by multiplex immunoassay (LegendPlex) in the BM of 78 individuals (age range of 0–20, median age of 8 years, 47 male and 31 female donors; Supplementary Table 13) with localized solid tumors without BM involvement. Boxes represent the median and the first and third quartiles, and whiskers indicate data points within 1.5 times the interquartile range. P values were calculated using a two-sided Wilcoxon rank-sum test.

Adult stromal cell atlases have defined various subsets of BM MSC expressing distinct HSPC-supporting factors and identified adipo-MSC and Thy-1+ MSC as the main producers of IL-7 in BM14. When mapped onto this atlas, most MSC in our CITE-seq data were annotated as adipo-MSC (Extended Data Fig. 9a,b). However, the limited number of MSC in these data (n = 588) precluded subset-specific analyses (Extended Data Fig. 9a). In the spatial transcriptomics dataset, which captured 8.4-fold more stromal cells (n = 4,950 versus 588 in CITE-seq; Fig. 5a–c), the stromal cell population in YP samples showed significantly increased frequencies of osteo-MSC (12.2% versus 1.4% of total MSC) and reduced frequencies of adipocytes (2.0% versus 14.0% of total MSC) compared to AYA samples (Fig. 5c–e). Adipo-MSC frequencies were comparable between YP and AYA samples (73.6 versus 66.1% of total MSC; Fig. 5c–e). These findings suggest that the elevated concentrations of IL-7 protein in YP BM might arise from increased per-cell production by adipo-MSC or from a more promiscuous stromal source than in AYA BM. In summary, age-dependent expression of IL7 and TGFB1, along with their respective receptors, paralleled the age-dependent dynamics of LyP subsets, supporting a model where signals from BM stromal cells cooperate with LyP-intrinsic differences to shape lineage bias during human development.

Fig. 5: BM stromal composition differs between YP and AYA BM.
figure 5

ac, Xenium spatial transcriptomics-based UMAP of 163,325 cells from BM trephine biopsies of YP (n = 3, age 0.7–1.5 years, two male and one female donor) and AYA donors (n = 3, age 13.5–23 years, all male donors; Supplementary Table 7), showing the identification of 27 clusters, annotated and grouped into 11 major populations: HSPC (3 subsets), B and plasma cells (5 subsets), T and NK cells (3 subsets), myeloid populations (5 subsets), stromal cells (5 subsets), erythroid cells, dendritic cells, megakaryocytes, erythroid cells, sinusoidal endothelial cells and arteriolar endothelial cell or vascular smooth muscle cells, annotated per major population (a) and subcluster (b). c, UMAP of cells as in a showing adipo-MSC (n = 3,571 cells), adipocytes (n = 478 cells), fibro-MSC (n = 38 cells), fibro/osteo-MSC (n = 598 cells), osteo-MSC (n = 265 cells) and osteoclasts (n = 82 cells). Key markers for each subpopulation are available in Supplementary Table 2. d, Representative BM cores from each individual as in a showing the spatial distribution of adipo-MSC, adipocytes, fibro-MSC, fibro/osteo-MSC, osteo-MSC and osteoclasts; scale bar, 300 μm. e, Stacked bar plot showing the relative abundance of adipo-MSC, adipocytes, fibro-MSC, fibro/osteo-MSC, osteo-MSC and osteoclasts in each individual. Osteo-MSC (YP BM median 12.2% of total MSC; AYA BM median of 1.4% of total MSC, mixed-effects logistic regression with donor as random intercept, Benjamini–Hochberg-adjusted P = 3.2 × 10−16), adipocytes (YP BM median 2.0% of total MSC; AYA BM median 14.0% of total MSC; Benjamini–Hochberg-adjusted P = 2.3 × 10−36) and adipo-MSC (YP BM median 73.6% of total MSC; AYA BM median 66.1% of total MSC; Benjamini–Hochberg-adjusted P = 0.60).

Discussion

Here, we compiled a comprehensive multimodal single-cell analysis of healthy pediatric BM, including single-cell transcriptomics, cell surface proteomics and spatial transcriptomics, to map HSPC lineage fate decisions during human development from infancy to young adulthood (0–32 years). We showed that YP BM (<10 years) was biased toward B lineage differentiation, whereas AYA BM was dominated by T cell and myeloid populations. This age-dependent lineage shift was driven by two distinct LyP subsets: CD127+ LyP, transcriptionally primed toward B cell output (LyP-B) and enriched before 10 years of age, and CD127+ LyP with T cell and myeloid features, predominating from ≥13 years onward. These changes correlated with age-related differences in BM MSC signaling, characterized by increased IL-7 production in early childhood and increased TGFβ signaling in AYA, indicative of niche-derived signals as critical regulators of HSPC lineage bias across development.

We identified two transcriptionally and phenotypically distinct subpopulations of LyP cells that differed in surface marker expression, lineage differentiation programs and age-dependent prevalence. Our data are consistent with previous flow cytometry-based studies that reported similar heterogeneity within human LyP22,33,42. Up to 57% of multipotent LyP (CD34+CD38Thy-1neg-loCD45RA+) from human adult BM generate B cell, T cell and myeloid progeny in vitro and in vivo35, reflecting residual myeloid potential. Studies in humanized mice have demonstrated that CD34+CD38loCD45RA+ITGB7+ progenitor cells can be subdivided into CD127 and CD127+ LyP subsets biased toward NK cell–innate lymphoid cell–T cell or B cell lineages, respectively22,33,42. Although CD127 LyP cell differentiation has been linked to FLT3 signaling, expansion of CD127+ LyP cells has been considered largely cell autonomous28. Our data refined this model by identifying niche-derived IL-7 as an HSPC-extrinsic regulator of CD127+ LyP cells, shaping the developmental balance between NK cell–innate lymphoid cell–T and B cell lineage output.

Our work highlighted the critical role of BM stromal signaling in driving age-dependent changes in hematopoiesis. Stromal-derived factors, including FLT3L, CXCL12 and IL-7, are established regulators of B cell lymphopoiesis36,37,38. We showed that stromal expression of IL7 progressively declined from birth to young adulthood, coinciding with a shift from B cell lineage to myeloid differentiation in LyP cells. Measurement of IL-7 protein concentrations in BM-derived plasma supported the functional relevance of our transcriptomic data. The mechanisms underlying age-related cytokine production by BM MSC remain incompletely known. Sex hormones represent one plausible contributor: both testosterone and estrogen suppress IL-7 production, and estrogen inhibits lymphopoiesis in mice, partly by inducing apoptosis of LyP43. Estrogen depletion increases BM IL-7 expression44. Additionally, pregnancy-induced elevations in estrogen and progesterone are associated with reduced IL-7 levels and B cell lymphopoiesis, effects reversible by recombinant IL-7 (ref. 45). Our dataset did not allow for the identification of the precise age at which the shift from B cell lineage to myeloid bias occurs, nor its relationship with pubertal onset in individual donors. Rather than a discrete event, this shift is likely to represent a gradual developmental process, occurring between ages 10 and 13, with interindividual variability in onset and duration. Alternatively, or in conjunction with sex hormones, developmental changes in MSC composition may contribute to the increased IL-7 production in YP BM. Adult stromal atlases have identified distinct MSC populations, with adipo-MSC and Thy-1+ MSC as the principal sources of BM IL-7 (ref. 14), alongside contributions of epithelial cells and reticular cells. In our CITE-seq data, MSC most closely resembled adipo-MSC, but numbers were insufficient for robust subset-specific analyses. Although our spatial datasets captured larger numbers of MSC, panel design precluded precise identification of IL-7-producing stromal subsets, which might differ between YP individuals and adults. Further studies are needed to define the precise stromal source of BM IL-7 across human development.

Finally, the developmental states observed within the healthy pediatric lymphoid lineage closely resembled the transcriptomic heterogeneity reported in B cell acute lymphoblastic leukemia46. Bulk leukemic samples from young children displayed stronger B lineage transcriptional signatures than those from AYA individuals, which had more multipotent transcriptional profiles46. These age-dependent leukemic states mirrored the age-dependent transcriptomic programs of LyP cells identified here. Because the cell of origin for pediatric B cell acute lymphoblastic leukemia likely emerges years before clinical diagnosis, possibly during fetal development47,48,49, these findings suggest that BM niche-mediated instruction of B cell lineage differentiation is preserved in the malignant setting. Elevated IL-7 in YP BM, which can induce a preleukemic state49,50, may therefore facilitate acquisition of additional oncogenic events. Given the increasing application of lymphoid-targeted therapies and the potential for lineage switch as a mechanism of escape, manipulating lineage-instructive signals from the BM niche may provide a relevant therapeutic strategy.

In conclusion, we provide a comprehensive reference of healthy pediatric BM spanning from birth to young adulthood (0–32 years) and have made our dataset openly available as an online resource accompanying this manuscript (https://doi.org/10.5281/zenodo.14168864)21. The differences between pediatric and adult BM underscore the importance of age-matched references when interpreting disease-associated changes. By enabling such comparisons, our pediatric single-cell map provides a foundation for future research on pediatric blood disorders. In addition, it reveals age-specific regulatory mechanisms that may underlie the timing and susceptibility of hematologic malignancies in childhood and adolescence.

Methods

Human donors

BM samples were obtained from three independent cohorts, each used for distinct analyses. All samples were obtained through the Biobank of the Princess Máxima Center for Pediatric Oncology, Utrecht, the Netherlands. Cohort demographics, including donor numbers, age ranges and sex distributions, are provided in Supplementary Tables 1, 7 and 14. For CITE-seq, BM aspirates from healthy individuals (n = 9, 2.1–31.3 years, three male and six female donors; Supplementary Table 1) were acquired. BM aspirates had been collected bilaterally from the posterior superior iliac crests as part of a stem cell donation to an affected relative. None of the healthy donors received any HSC-mobilizing treatment before donation. Written informed consent was obtained from all donors or legal guardians. Approval was granted by the Institutional Review Board of the Princess Máxima Center (PMCLAB2022.0328).

For Xenium-based spatial transcriptomics, formalin-fixed paraffin-embedded (FFPE) BM trephine biopsies were analyzed from an independent cohort of YP (n = 3, 0.7–1.5 years, two male and one female donor) and AYA donors (n = 3, 13.5–23 years, all male donors). As BM biopsies are invasive and not routinely performed in healthy individuals, samples were obtained from individuals undergoing diagnostic evaluation for suspected BM metastasis of solid tumors. All selected individuals were ultimately diagnosed with localized disease and showed no evidence of BM involvement (Supplementary Table 7). Written informed consent was obtained from all donors or legal guardians. Approval was granted by the Institutional Review Board of the Princess Máxima Center (PMCLAB2022.0351).

For BM cytokine measurements, BM plasma from cancer-free BM aspirates (n = 78, ages 0–20 years, 47 male and 31 female donors; Supplementary Table 14) was acquired. All aspirates were derived from individuals diagnosed with localized solid tumors, including low-grade embryonal rhabdomyosarcoma, Ewing sarcoma, ganglioneuroblastoma, malignant rhabdoid tumor, neuroblastoma or small cell sarcoma, without any evidence for BM involvement. Informed consent was obtained from all donors or legal guardians. Approval was granted by the Institutional Review Board of the Princess Máxima Center (PMCLAB2025.0620).

Sample preparation for CITE-seq

Residual mononuclear cells from BM aspirates, leftover after graft infusion, were isolated by Ficoll (Cytiva Life Sciences) density gradient centrifugation and cryopreserved in liquid nitrogen until further use. For CITE-seq, cryopreserved BM samples were thawed rapidly in a water bath at 37 °C. An equal volume of prewarmed thawing medium (DMEM, high glucose, pyruvate, no glutamine and 20% fetal calf serum) was added to the viably frozen cell suspension in a dropwise fashion. The cell suspension was transferred to a 50-ml Falcon tube and was further diluted (1:10) by dropwise addition of prewarmed thawing medium. Cells were centrifuged at 400g for 5 min at 4 °C. The cell pellet was resuspended in thawing medium containing DNase (100 µg ml−1; Roche) with MgCl2 (10 mM; Merck) and incubated for 30 min at 4 °C. After incubation, cells were centrifuged at 400g for 5 min at 4 °C and resuspended in cell staining buffer (CSB; Biolegend). The number of cells in the suspension was established using the Countess II cell counter (Invitrogen).

Cell sorting and multiplexing for CITE-seq

Before library preparation, each individual’s sample was enriched for HSPC and MSC, which were multiplexed with the nonenriched cell fraction of another genetically distinct individual. For this, FcR blocking reagent (Human Trustain FcX, Biolegend) was added to the cells at a 1:10 dilution and incubated on ice for 5 min. Cells were then incubated with Zombie NIR viability dye (BioLegend) on ice in the dark for 15 min. After incubation, the cell suspension was washed by adding CSB and centrifuging at 400g at 4 °C, followed by resuspension. Next, the cells were incubated on ice in the dark for 30 min with a customized mix of fluorophore-conjugated antibodies and oligonucleotide-conjugated antibodies (Supplementary Tables 15 and 16). Following incubation, the cells were washed three times with CSB and centrifuged at 400g at 4 °C after each wash. The cell suspension was then filtered using a 35-µm filter and sorted using a Sony SH800S cell sorter (SONY SH800S system software v2.1), with a uniform gating strategy for all samples (Supplementary Fig. 1). In general, we sorted 2.5 × 104 nonenriched, erythrocyte precursor-depleted cells (Zombie NIRCD235a) from one individual and combined with 1 × 104 HSPC (Zombie NIRCD235aCD45CD34+) and up to 2 × 103 MSC (Zombie NIRCD235aCD45loCD34CD271+ or Zombie NIRCD235aCD45loCD34CD90+) of another genetically distinct individual. During subsequent data analysis (described below), cells were demultiplexed and assigned to their original sample based on SNVs specific for each individual. After mixing, the resulting cell suspensions were counted using Trypan Blue and a Bürker counting chamber.

Library preparation and sequencing

Approximately 40,000 cells per multiplexed sample were loaded onto a Chromium Single Cell G chip and used for library preparation using a Chromium Next GEM Single Cell 3′ Library and Gel Bead Kit v3.1 (10x Genomics) according to the manufacturer’s instructions. For each multiplexed sample, two libraries were prepared, one for the RNA and one for the antibody capture modality. Each library was sequenced using a NovaSeq 6000 (Illumina) and the following number of cycles: read 1: 28; read i7: 10; read i5: 10; read 2: 91.

Xenium experiments

In collaboration with the Department of Pathology, representative regions from each BM biopsy (n = 6 biopsies, one to three regions per biopsy; Supplementary Table 7) were identified and incorporated into a tissue microarray (TMA), comprising one to three BM cores per individual (1.5 mm in diameter each). A 5-µm TMA section was prepared for spatial transcriptomics according to the Demonstrated Protocols Xenium In Situ for FFPE Tissue Preparation Guide (CG000578, 10x Genomics) and Xenium In Situ for FFPE Tissues Deparaffinization and Decrosslinking (CG000580, 10x Genomics). The Human Multi-Tissues and Cancer Panel probe set was supplemented with probes targeting 100 custom genes (Supplementary Table 8) and Cell Segmentation Reagents. Probe hybridization, ligation and rolling circle amplification were performed by the Leiden Genome Technology Center at the Leiden University Medical Center, following the manufacturer’s protocol (CG000582 Rev E, 10x Genomics).

CITE-seq data preprocessing

CITE seq data were processed using CellRanger count with feature barcoding (version 7.1.0, 10x Genomics) using the refdata-gex-GRCh38-2020-A transcriptome and a modified Feature Reference file (Supplementary Table 14).

CITE-seq genotype demultiplexing and barcode filtering

Cells from multiplexed samples were SNV-based genotype-demultiplexed using souporcell (singularity image created 1 December 2021)51. Barcodes with less than 1,500 transcripts and/or a percentage of mitochondrial genes above 10% were removed. Also, barcodes classified as doublets or unassigned genotypes by souporcell51 and barcodes classified as doublets in over five of ten runs using scDblFinder (version 1.18.0)52 were discarded.

CITE-seq normalization, dimensional reduction, feature deconfounding and integration

Further processing and analyses were performed in R (version 4.4.0) using Seurat (version 5.1.0)20. For each individual, donor gene expression data were SCTransform normalized with SCTransform (v2, number of variable features = 3,000). Dimensional reduction was conducted with the RunPCA function from Seurat. SCT integration features were calculated using the SelectSCTIntegrationFeatures function from Seurat. Using gene lists from the SCutils package (version 1.123), the following genes were filtered out from the SCT integration features, provided they were also found as variable features: genes specific to the S or G2M cell cycle phase, donor-specific genes correlating with the S or G2M phase, male and female-specific genes, stress-related genes and ribosomal protein genes. Gene expression data were integrated across donors with canonical correlation analysis (CCA), using the IntegrateLayers function in Seurat and the filtered SCT integration features. Antibody capture data were normalized per library using DSBantibody normalization (version 1.0.3)20. For each individual donor, FindVariableFeatures and ScaleData were run, and dimensional reduction was performed with the RunPCA function from Seurat. Data were also integrated with CCA using the IntegrateLayers function in Seurat and all antibodies as integration features.

CITE-seq visualization, clustering and cell-type annotation

A WNN graph was created with the FindMultiModalNeighbors function in Seurat, using the integrated reductions of both RNA and ADT modalities. A wnnUMAP was created from this WNN graph with RunUMAP, using 30 principal components for both modalities. Clustering was performed using FindClusters from Seurat with the wsnn graph, the SLM modularity optimization algorithm, as recommended by the Weighted Nearest Neighbor Analysis vignette from Seurat, and a resolution of 0.4.

Cell-type annotation was performed by combining three complimentary annotation approaches, performed with gene expression log-normalized data. First, cell types were inferred with SingleR (version 2.6.0)53. Second, individual cells were mapped to two external reference BM datasets11,13. Mapping to the CITE-seq dataset bmcite (version 0.3.0) from SeuratData (version 0.2.2.9001) was performed as recommended by the Seurat multimodal reference mapping vignette18. The Gene Expression BM dataset in DISCO19 was downsampled to a maximum of 3,000 cells per cell type and SCTransform normalized, followed by generation of a principal component analysis and UMAP, which were in turn used for reference mapping, according to the Seurat Mapping and annotating query datasets vignette. Third, antibody capture data and RNA expression data from known marker genes were used to confirm cell-type annotations.

For in-depth analysis of the erythroid lineage, myeloid lineage and T and NK cells, the respective clusters were subsetted, followed by subset-specific SCT normalization, dimensional reduction, feature deconfounding, integration and clustering at resolutions of 0.7, 0.5 and 0.3, respectively. Cell-type annotations were then redefined as described above. The resulting Seurat object provided cell-type annotations at multiple levels of resolution, ranging from broad categories (major groups, for example, T cells) to more detailed classifications (high-resolution clusters, for example, γδT cells).

CITE-seq subclustering of LyP cells

For in-depth analysis of LyP cells, cells were subsetted and analyzed as described above, with the following deviations: for clustering, a resolution of 0.1 was used; CCA integration between donors was not performed.

CITE-seq differential gene and protein expression

To compare each major group and high-resolution cell type against all other cell types, we used the wilcoxauc function in presto (version 1.0.0) to find cell-type-specific, significantly differentially expressed genes and proteins. We used the following thresholds: p-adj < 0.01, pct_in – pct_out ≥ 0, pct_in > 20, auc > 0.5 and logFC > 0. For each cell type, the FindMarkers function (test.use = ‘wilcox’, logfc.threshold = 0, min.pct = 0.01) was used to identify genes and proteins with significant differential expression between YP and AYA cells (thresholds: p-adj < 0.01 and FC > 1.5 and pct.1 > pct.2 or p-adj < 0.01 and FC < –1.5 and pct.1 < pct.2). From these cell-type-specific young and old gene lists, we computed module scores using Seurat’s AddModuleScore function and assessed scores per cell. Module scores were visualized as mean module scores per cluster using the ggradar package (version 0.2), capping negative average scores at 0. For LyP cell subclusters, the FindMarkers function (test.use = ‘wilcox’, logfc.threshold = 0, min.pct = 0.01) was used to identify significantly differentially expressed genes and proteins between clusters 0 and 1 (thresholds: p-adj < 0.01 and FC > 2 and pct.1 > pct.2 or p-adj < 0.01 and FC < –2 and pct.1 < pct.2). For all differential gene expression analyses, male- and female-specific genes were removed. For comparisons of major groups and high-resolution clusters, confounder genes mentioned above were also removed. Gene Ontology enrichment analysis was performed using the enrichGO function from the clusterProfiler (version 4.12.0) package, using biological process terms.

Transcription factor activity analysis in CITE-seq data

Inference of transcription factor network activity was performed using pySCENIC (image version aertslab-pyscenic-0.11.2.sif)54. For this, we used a loom file with raw transcript counts of a downsampled dataset as input, along with the ‘hs_hgnc_tfs.txt’ transcription factor list, the ‘motifs-v9-nr.hgnc-m0.001-o0.0.tbl’ motifs and the ‘hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather’ input databases. Only transcription factors that were identified in two or more of three independent runs were analyzed. Activity per cell was calculated as the mean AUCell values across runs. For LyP cell subclusters, FindMarkers (test.use = ‘wilcox’, logfc.threshold = 0, min.pct = 0.01) was used to identify significant differential regulon activity between clusters 0 and 1 (thresholds: p-adj < 0.01 and FC > 1).

Cell interaction analysis in CITE-seq data

Interaction analysis and ligand prioritization were performed using NicheNet (nichenetr package, version 2.2.0)39. Ligand activity analysis was performed on genes upregulated in LyP-B or LyP-S cells (identified using the FindMarkers function as described above), using the top n = 5,000 downstream targets for each ligand and a quantile cutoff of 0.001. Ligands were prioritized by equally weighing the following two criteria: (1) the predicted ligand activity and (2) LyP subset-specific receptor expression.

Xenium data preprocessing, filtering, normalization and dimensional reduction

Imaging and signal decoding were performed using the Xenium on-board analysis pipeline (10x Genomics, version xenium-3.1.1.0). The Xenium ranger relabel function was applied after removing the FLT3LG feature, due to lack of specificity of this custom probe. Cell segmentation was performed using a probabilistic segmentation method (ProSeg) to infer refined cell boundaries55. Regions of interest corresponding to the individual cores were selected in Xenium Explorer (version 3). For the selected cells, metadata and gene expression profile matrices were imported into scanpy (version 1.11.3)56 for downstream analyses. Quality assessment matrices were performed on the cells, including normalizing gene counts to total cell volume. Cells with a sum log1 (P value)-scaled gene count greater than 3.5 were retained for downstream analyses. An autoencoder variational model, ResolVI (scvi-tools, version 1.3.3), was applied, treating all BM cores as one batch (model parameters: n_hidden = 32, n_latent = 10, n_layers = 2, dropout_rate = 0.05)57 to obtain a low-dimensional representation with ten dimensions. The latent representation was used for constructing a UMAP.

Xenium visualization, clustering and cell-type annotation

Leiden clustering (resolution 2.4, ten iterations) was performed based on the ResolVI latent space (resolution 2.4, ten iterations). For cell-type annotation, we used reference mapping with Tangram58 against the adult BM atlas as published by Bandyopadhyay et al.14 and Zeng et al.59 and CITE-seq data from this manuscript. Cells were then subdivided into major groups: HSPC, T/NK cells, megakaryocyte lineage, B cell lineage, myeloid lineage, erythroid lineage and nonhematopoietic clusters. These subsets were reclustered at resolutions of 0.3, 0.5 and 1.0. Final annotations were assigned by integrating reference-based predictions with known marker gene expression, yielding both broad and fine-grained cell-type labels.

Comparison of LyP cell subtypes in CITE-seq data and a public dataset of CD34+ cells

A public scRNA-seq dataset of CD34+ cells isolated from BM aspirates of 15 donors (ages 2–77 years; 8 male and 7 female donors)32 was downloaded from the Gene Expression Omnibus (GEO) under accession number GSE189161. Transcripts in all cells were log normalized per sample (orig.ident) to a total of 10,000 transcripts. LyP exhibiting LyP score >0 with more than 1,500 transcripts were retained for downstream analyses. Module scores were constructed from differentially expressed genes in the LyP, LyP-B and LyP-S populations from the CITE-seq data in this manuscript, excluding sex- and cell cycle-related genes. Relative cell fractions of cells with an LyP-B score of >0 and LyP-S score ≤0 (LyP-B); cells with a LyP-S score >0 and LyP-B score ≤0 (LyP-S) and cells with LyP-B score ≤0 and LyP-S score ≤0 (double-negative LyP) were compared.

Cytokine measurements in BM plasma

BM plasma (the cell-free fraction of BM aspirates) was collected by Ficoll (Cytiva Life Sciences) density gradient centrifugation of EDTA-anticoagulated BM aspirates (n = 78 samples; Supplementary Table 14). The resulting plasma was stored at –80 °C until further analysis. IL-7 and IL-15 were measured using the LEGENDplex Human Hematopoietic Stem Cell Panel (Biolegend). Samples were processed, and cytokine levels were measured according to the manufacturer’s instructions, with the sole deviation being an extended overnight incubation during the initial incubation step to optimize cytokine detection. Cytokine concentrations were analyzed in R and visualized using the ggplot2 (version 3.5.1) and ggbeeswarm (version 0.7.2) packages. Cytokine concentrations lower than the detection limit (IL-15, <271.56 pg ml−1; IL-7, <0.317 pg ml−1) were assigned half the value of the detection limit. Outliers were identified and removed.

Statistics and reproducibility

Dimensional reduction of BM composition data was performed using principal component analysis and the prcomp function from the stats package (version 4.4.0). The relative abundance of cell types, LyP-B and LyP-S cell subsets and cycling versus noncycling LyP between YP BM and AYA BM was compared using a two-sided Wilcoxon rank-sum test. Correlations between B cell and LyP-B cell frequencies were calculated using the stat_correlation function (method = ‘pearson’) from the ggpmisc package (version 0.6.0). To allow reliable comparisons of rare cell types (cycling versus noncycling LyP and MSC subsets), the enriched and nonenriched were pooled. For all other comparisons, the nonenriched fractions were used. For comparison of spatial cell-type composition, mixed-effects logistic regression models were used (lme4 package, version 1.1-35.5) with donor as a random effect to account for multiple cores per individual; P values were adjusted for multiple testing using the Benjamini–Hochberg false discovery rate method. YP and AYA sample cytokine concentrations were compared using a Wilcoxon rank-sum test. No statistical methods were used to predetermine sample sizes, but our sample sizes are similar to those reported in previous publications60.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.