Introduction

The maturation and selection of T cells are mediated by the thymus, which is crucial for the development of adaptive immunity and central tolerance1. T cell progenitors from the fetal liver or bone marrow interact with different thymic microenvironments, giving rise to distinct T cells2. Histologically, the human thymus can be divided into the cortex, medulla, and septa3. In the cortex, cortical thymic epithelial cells (cTECs) support T lineage commitment and subsequent positive selection. Surviving thymocytes then enter the medulla, where a diverse pool of antigen-presenting cells (APCs) is present, including medullary thymic epithelial cells (mTECs) and dendritic cells (DCs). Those medullary APCs, which express a wide range of peripheral tissue antigens (PTAs), facilitate negative selection to eliminate overreactive thymocytes. Interestingly, studies have identified TECs expressing extra-thymic lineage transcription factors (TFs) and proposed that these TEC mimetics (i.e. to mimic peripheral counterparts) play an important role in central tolerance via compartmentalised self-antigen expression4.

The existence of many different thymic cell types and their complex interactions remains a considerable challenge for understanding T cell development. To date, a fine dissection of various thymic cell types has been performed by different studies5,6,7,8 using scRNA-seq transcriptome profiling and recently using spatial data9,10. These efforts have generated an overview of thymic cellular composition. Due to the comparatively lower resolution of spatial data, these studies have performed the projection of scRNA-seq data onto the spatial tissues to infer the location of the cell types11,12. However, a high-resolution spatial atlas is needed for the discovery of rare cell types directly on spatial tissue using cell type marker genes to understand the nuances of T cell maturation and specification processes.

In this study, we employed Stereo-seq13 spatial transcriptomics to create a high-resolution spatial atlas of the human fetal and paediatric thymi. Stereo-seq is a high-resolution spatial transcriptomics technology that captures genome-wide gene expression directly from intact tissue sections using dense DNA nanoball arrays and in situ RNA capture. Complementing this approach, we use MACSima14 to profile selected protein markers at cellular resolutions as multiplexed immunofluorescence imaging. Further, we employ Stereo-CITE, which integrates spatial multi-omics profiling of transcriptomics and proteomics from the same tissue. Together, these complementary platforms provide a comprehensive spatial omics framework for studying thymic architecture, cell-type locations, and cellular communication.

Using spatial profiling has enabled us to segregate known histological regions via the direct expression of marker genes of different cell types. Furthermore, we resolved each region into sub-regional niches and examined the cellular composition and active signalling pathways within these niches. Recently, various efforts have been taken to profile the different types of rare TEC mimetics15,16,17. There is also a concept that mTECs utilise lineage-defining transcription factors (TFs) to promote their differentiation into diverse sub-cell types with distinct molecular signatures, for instance, different stages of muscle mimetic TECs18. Here, we employ high-resolution spatial profiling to map rare mimetic TEC cell types and uncover the core transcriptional features.

In this study, we generate a high-resolution spatial atlas of the human fetal and paediatric thymi using Stereo-seq spatial profiling. This atlas resolves canonical histological regions into fine-grained cellular niches and reveals the spatial organisation of different known and rare thymic cell types. These findings provide a detailed map of thymic microenvironments and their signalling landscapes. We systematically identify TFs expressed by mimetic TECs using spatial transcriptomics and proteomics, characterising their potential tissue antigen-presenting function. More broadly, our human thymus atlas enables deeper investigation of T-cell maturation dynamics and serves as a resource for future studies on thymic development, central tolerance and immune dysregulation.

Results

High-resolution spatial transcriptomic atlas of human thymus

To build a spatial transcriptomic atlas for the human fetal and paediatric thymi, we collected thymus from the early fetal (F) stages of development, including 13 weeks (w), 14w, 17w and 18w post conception. We also collected paediatric (P) thymic samples comprising the ages of 7w, 2 years (y), 5 y and 6 y old (Fig. 1a and Supplementary Fig. 1a). The thymus comprises two lobes, and each lobe comprises sub-structures called lobules3. Histologically, the thymus can be further divided briefly into cortex and medulla (Supplementary Fig. 1b). We spatially and transcriptionally profiled one lobe of the human thymus using Stereo-seq and scRNA-seq techniques (Fig. 1a, b and Supplementary Data 1). We optimised our tissue processing and analytical pipeline using mouse thymus samples from 1- and 3-month-old mice (Supplementary Fig. 1c, d). The Stereo-seq methodology is based on DNA nanoball technology, and in this study, we use bin 50, where the spatial spots are 25 μm in diameter. The number of spots represents the lobe size across the different fetal ages, where 13w has 2868 spots, 14w has 28,800 spots, 17w has 65,438 spots, and 18w has 65,730 spots (Supplementary Data 1 and Fig. 1c). In addition, we also observed that the number of lobule structures increases, indicating the growing complexity of the thymus organ during fetal development. The paediatric thymus samples are usually 5 cm or longer in length (Supplementary Fig. 1a). Therefore, we spatially profiled a portion of the thymic lobe. For paediatric samples ages 2 y, 5 y and 6 y, the 1 × 1 cm chip was used. The number of spatial spots sequenced for 2 y was 106,933, 5 y was 67,617 and 6 y was 82,716 (Fig. 1c). For the paediatric 7w sample, we used a larger chip (2 × 3 cm) to capture the spatial delineation of a larger thymic area with 297,572 spots (Fig. 1b, c). We noticed that using a bigger chip for larger tissues enables better spatial profiling as it facilitates the capture of intact thymus morphology.

Fig. 1: High-resolution spatial transcriptomics atlas of human fetal and paediatric thymus.
Fig. 1: High-resolution spatial transcriptomics atlas of human fetal and paediatric thymus.The alternative text for this image may have been generated using AI.
Full size image

a Schematic of the overall workflow for profiling the human thymus in this study. The thymus tissue samples were collected from human fetal (13w, 14w, 17w, 18w) and paediatric (7w, 2 y, 5 y, 6 y) ages. Spatial transcriptome and single-cell profiling of thymus tissue sections were performed. Spatially inferred clusters were determined based on the transcription profile of the spatial spots. The thymus sections across developmental ages were integrated using PCA on the pseudobulk gene expression profile of the spatial clusters, and then K-means clustering was used to determine the groups. The major thymus domains were defined by the expression of key marker genes. Created in BioRender. Chen, Y. (2026) https://BioRender.com/wx66ksu. b Spatial visualisation of the number of genes detected with Stereo-seq spatial transcriptomics at bin 50 spot resolution or 25 μm diameter spot across fetal and paediatric tissues. c Total number of spots that were sequenced in the thymus tissue section across ages. d The number of unique genes detected across the spots in c of the tissue section at the developmental ages. F fetal, P paediatric.

Next, we determined the transcriptional depth of the samples and found an average of around 900 genes detected in 13w, 700 in 14w, 1100 in 17w and 500 in 18w across the fetal tissues and around 300 genes detected in 7w, 700 in 2 y, 800 in 5 y and 900 in 6 y in paediatric samples (Fig. 1d). However, across all these samples, the sequencing depth enabled the detection of marker genes for various thymic cell types (Fig. 2a and Supplementary Fig. 1e). Thymocyte markers such as CD4 and CD8A were detected throughout the tissue, as expected. Tissue compartmentalisation was revealed by comparing PRSS16 (a cTEC marker) and AIRE (a mTEC marker). PRSS16 was localised mainly in the outer cortical compartment, while AIRE was primarily detected in the inner medullary region. The CD34 expression pattern was concentrated in the septa regions, which mark the precursors of hematopoietic stem cells and early thymocytes, such as double-negative T cells19. Besides, different immunological markers, including MS4A1, CD83 and CD14, were detected, suggesting the presence of other thymic cell types such as B cells, DCs and macrophages20. Indeed, co-immunostaining of CD14 and CD68 identified the presence of macrophages (Supplementary Fig. 1f). Moreover, as revealed by COL1A2, thymic fibroblasts displayed a preferential localisation to septa and outlined areas of the thymic lobe, suggesting a distinct compartmentalisation function. Overall, this revealed the capabilities of spatial transcriptomics to capture diverse thymic cell types and their distinct spatial distribution patterns within the thymus.

Fig. 2: Expression profile of marker genes in the spatial and single-cell transcriptomics atlas.
Fig. 2: Expression profile of marker genes in the spatial and single-cell transcriptomics atlas.The alternative text for this image may have been generated using AI.
Full size image

a The spatial gene expression of known markers for thymocytes (CD4 and CD8A) and TECs (PRSS16 and AIRE). b Dotplot shows the expression of marker genes for all the cell types present in the thymus based on scRNA-seq data. F fetal, P paediatric.

To comprehensively delineate thymic cellular composition, we performed scRNA-seq of thymic tissue and integrated it with publicly available fetal and paediatric data5. This integrated scRNA-seq comprises fetal ages 10w to 18w and paediatric ages 7w to 13 y (Supplementary Fig. 1g–j). The abundant cell types, such as double-positive T cells (DPs), macrophages and fibroblasts, are represented across different studies. In contrast, rare cell types such as mTEC-III are detected in a higher proportion in EPCAM+ sorted scRNA-seq public samples (Supplementary Fig. 1k). Hence, to ensure the representation of most of the cell types in the thymus, we used the integrated scRNA-seq data for further analysis. The different cell types identified in the scRNA-seq data expressing known marker genes5,19 are shown in Fig. 2b. These annotated cell types provide a comprehensive reference for resolving spatial thymic composition and organisation.

Thymus architecture and geopositioning of cell types

In human thymus, the major anatomic regions (cortex and medulla) can be distinguished via the differential cellular density using H&E staining (Supplementary Fig. 1b). To define similar regional compartments on our spatial atlas, we performed spatially constrained clustering based on the spatial transcriptome in fetal and paediatric samples (Supplementary Fig. 2a). These spatial clusters take into account the neighbourhood of the spatial spots and the gene expression profile. Next, to integrate samples across different ages, we generated a pseudo-bulk transcriptional profile for each cluster using a curated set of genes representing the major cell types found in the thymus (see Methods). The resulting pseudo-bulk profiles were then batch-corrected across samples, followed by PCA transformation and K-means clustering, which stratified into 10 transcriptionally and spatially distinct groups (Supplementary Fig. 2b, c). The groups that expressed AIRE were referred to as the medulla, PRSS16 as the cortex, COL1A2, VWF and CD34 as septa and the group with the least overall gene expression as low quality (LQ) (Fig. 3a and Supplementary Fig. 2d). The expression of marker genes in the groups across the sample ages indicates the medulla of fetal and paediatric thymic samples systematically separated into different groups as Medulla I and II (Fig. 3b). We found that the LQ region exhibits higher expression of KRT1, a marker of mTEC III. This suggests that the LQ region may partially capture medullary regions enriched for mTEC III-like cell states.

Fig. 3: Geopositioning of cell types in thymic niches across age.
Fig. 3: Geopositioning of cell types in thymic niches across age.The alternative text for this image may have been generated using AI.
Full size image

a The heatmap shows the normalised gene expression of key markers to define the thymus domains from the integration of spatially aware transcriptome-based groups combining all ages. Domains were defined as medulla (AIRE and MS4A1), cortex (PRSS16 and LY75) and septa (COL1A2, CD34 and VWF). The colour indicates the normalised gene expression as a z-score. b The heatmap shows the gene expression of marker genes across thymus age within each integrated group identified. c The spatial domains are concentrically divided into three niches from the outer to inner axis using the TissueTag method. d The heatmap illustrates the normalised cell type enrichment based on the cell-type-specificity score. This score is derived from the expression of cell-specific genes within spatial spots, which were identified through differential scRNA-seq analysis. e The co-localisation of DP T cells (SMPD3) with cTECs (PRSS16 and PSMB11) in the cortex is shown using marker genes in red, blue and green, and the spots with overlapping expression of genes are shown in the respective overlapping colours. f The co-localisation of mTECs (AIRE), macrophages (CD14) and B cells (MS4A1) in the medulla is shown using marker genes in red, blue and green, and the spots with overlapping expression of genes are shown in the respective overlapping colours. g Immunostaining of paediatric thymic slide (1 y) with PSMB11 (cTEC marker) revealed the cortical compartment. C cortex, M medulla. Scale bar: 50 μm. A representative region is shown from three independent regions. h Immunostaining of different TEC markers (AIRE, KRT15 and EPCAM) in the paediatric thymic slide (7w) revealed localisation of different mTEC subtypes (mTEC I: KRT15+EPCAM+; mTEC II: AIRE+EPCAM+) in the medulla. Scale bar: 50 μm. A representative region is shown from four independent regions. i MACSima spatial profiling of 22 proteins was used to define the spatial domains, and these domains were concentrically divided into three niches from outer to inner axis in P:7w thymus. j The dot plot shows the cell types based on the abundance of proteins in the niches of the MACSima spatial profiled sample. F fetal, P paediatric.

We further subdivided the cortex, medulla and septa groups into three concentric niches using the continuous tissue axis method9 (Fig. 3c). The proportion of these niches shows an overall more than 45% cortex, 10% medulla, 25% septa and 10% LQ regions (Supplementary Fig. 2e). Since the spatial data is of higher resolution, we used two statistically based methods instead of a single-cell projection-based approach to determine the cellular composition. Firstly, we computed a cell-type-specificity score to determine the cell type present within each spatial spot. The score comprised known cell-type marker genes from Fig. 2b and the top ten differentially expressed genes derived from integrated thymus scRNA-seq data. The cell-type-specificity scores of the cell types within each niche across samples are shown in Fig. 3d. The cortical niches showed a higher enrichment for cortical-associated cell types such as cTECs and DPs (Fig. 3d and Supplementary Fig. 2f). The genes highly expressed in cTECs (PRSS16, and PSMB11) and DP (SMPD3) show a cortical spatial signature (Fig. 3e, g and Supplementary Fig. 2g). Meanwhile, the medulla niches show higher enrichment for mTECs, dendritic cells (DCs), B cells and single-positive T cells (SPs) (Fig. 3d). mTEC II marked by AIRE, DCs by CD14 and B cells by MS4A1 shows a medullary spatial signature (Fig. 3f, h and Supplementary Fig. 2h). Further, to validate the cell types in spatial locations, we profiled the protein abundance in thymus tissue using MACSima, and these regions were segregated similarly into niches for comparison (Fig. 3i). In line with the enrichment of cell types by gene expression (Fig. 3d), the proteins show a higher abundance of proliferating T cells in the cortex, immune cells such as B cells and macrophages and TECs in the medulla, and fibroblast and endothelial cells in the septa (Fig. 3j and Supplementary Fig. 2i). Our findings highlight distinct cellular compositions within the thymic niches, further substantiated by both gene expression and protein abundance analyses, demonstrating a spatial organisation that reflects the functional diversity across the cortex, medulla, and septa regions.

Signalling pathways active during positive selection of T cells

Given the dependence of thymocytes on interactions with thymic APCs for maturation, we next sought to decipher the signalling pathways in the microenvironment within different thymic niches. The high-resolution thymus atlas enables the direct detection and comparison of signalling pathways across the developmental stages (Fig. 4a and Supplementary Fig. 3a). Across the niches, prominent temporal changes were seen in pathways such as MK and IGF in cortex niche 1 in fetal samples. This might be due to the profiling of the whole thymus lobe in fetal samples, which contains the capsule intact, compared to the paediatric samples, where a small portion of the lobe was sequenced. Spatial expression of MDK (ligand) and LRP1 (receptor) for the MK pathway revealed a pattern that closely resembled that of COL1A1 (Fig. 4c and Supplementary Fig. 3c). MDK is expressed by cTECs and fibroblasts, whereas LRP1 is expressed by fibroblasts (Fig. 4b). Consistently, previous studies have shown the positive effects of MDK in stimulating fibroblast growth21, further implying the role of the MDK pathway in supporting cortical fibroblast maintenance. Similarly, the IGF pathway involved fibroblasts and DN(Q) (Fig. 4b and Supplementary Fig. 3b). Together, these findings highlight niche-specific signalling interactions that support fibroblast function.

Fig. 4: Signalling pathways active during positive selection of T cells.
Fig. 4: Signalling pathways active during positive selection of T cells.The alternative text for this image may have been generated using AI.
Full size image

a The heatmap shows the normalised enrichment of the signalling pathways that are active within the cortical niches across the thymus ages. b The dot plot shows the gene expression of selected ligands and receptors in the cell types found in scRNA-seq data. c Spatial co-localisation of the LRP1 and MDK and the cell type involved, like fibroblasts, is marked by COL1A1 in the fetal 17w thymus. d The spatial co-localisation of CXCL12, CXCR4, and the cell type involved, like cTECs, is marked by PRSS16. e Immunostaining of human thymic slides (P:1 y) revealed cortical enrichment of CXCL12/CXCR4 pathway. Co-staining of CXCL12 and CXCR4 showed preferential expression in cortex rather than in medulla. Two representative regions are shown from four independent regions. f Immunostaining of human thymic slides (P:7w) revealed co-expression of Col-I (fibroblast marker) and CXCL12, especially at the capsule layer, as indicated by white arrows, thus supporting cortical fibroblasts as another source for CXCL12 in cortex. Two representative regions were shown from three independent regions. g Immunostaining of human thymic slides (P:1 yr) showed co-expression between PSMB11 (cTEC marker) and CXCL12, as indicated by white arrows, thus supporting cTECs as a source for CXCL12 in cortex. Two representative regions are shown from three independent regions. h Immunostaining of human thymic slides (P:1 yr) of T cell marker CD3 with CXCR4 showed their co-expression, supporting T cells as a source for CXCR4 in cortex. Two representative regions are shown from three independent regions. F fetal, P paediatric.

The major signalling pathway enriched in cortical regions is CXCL (Fig. 4a). The expression of CXCL12 (ligand) and CXCR4 (receptor) in the CXCL pathway was more prevalent in cortical regions (Fig. 4d, e and Supplementary Fig. 3d, e). While scRNA-seq showed expression of CXCL12 by some fibroblasts, co-immunostaining of fibroblastic marker Col-I and CXCL12 also revealed their co-localisation or at least close proximity. This suggests that thymic fibroblasts might be one of the sources for CXCL12. (Fig. 4b, f and Supplementary Fig. 3f). In consensus with recent studies22,23,24, CXCL12 demonstrated a higher enrichment in TEC populations (Fig. 4b), whereas CXCR4 was detected preferentially in thymocytes (Fig. 4b). These observations were further supported by co-immunostaining of respective TEC and T cell markers with CXCL12 or CXCR4 (Fig. 4g, h and Supplementary Fig. 3g, h), consistent with the role of CXCL12 in thymic migration stimulation25 (Fig. 4d and Supplementary Fig. 3d). Besides, IL16 (ligand) and CD4 (receptor) from the IL16 pathway show enrichment in cortex and medulla (Fig. 4a and Supplementary Fig. 3b), aligned with the reported roles of the IL16 pathway in T cell development26,27,28. This elucidates the intricate signalling pathways within thymic niches, highlighting the critical role of signalling pathways in mediating thymocyte maturation and migration, thereby underscoring the importance of the thymic microenvironment in T cell development.

Signalling pathways active during negative selection of T cells

Similarly, we determined the major signalling pathways within the medulla niches across the samples from different developmental ages and found enrichment for pathways such as CCL, MIF, IL16, GALECTIN and TGFB (Fig. 5a). The primary function of the medulla has been the establishment of self-tolerance, which requires multiple components for efficient antigen presentation. For instance, CD74 is involved in the MIF pathway and is a component of the MHC-II antigen presentation process29. Consistently, it shows an expression pattern specific to medullary regions (Fig. 5c and Supplementary Fig. 3b, i). MIF is expressed in mTECs, and CD44 is expressed by SPs, DCs and B cells (Fig. 5b). The spatial expression map of CD74, CD44 and MS4A1 (B cell) showed spots with both co-expression of CD74 and MS4A1 and co-expression of CD44 and MS4A1, supporting the B cell’s antigen presentation function in the medulla. This suggests a central role of MIF signalling in mediating antigen presentation within the medulla.

Fig. 5: Signalling pathways active during negative selection of T cells.
Fig. 5: Signalling pathways active during negative selection of T cells.The alternative text for this image may have been generated using AI.
Full size image

a The heatmap shows the normalised enrichment of the signalling pathways that are active within the medullary niches across the thymus ages. b The dot plot shows the gene expression of selected ligands and receptors in the cell types found in scRNA-seq data. c Spatial co-localisation of the CD74, CD44, and the cell type involved, like B cell, is marked by MS4A1. d Spatial visualisation of the significantly expressed ligand (CCL19) and receptor (CCR7) of the CCL pathway and IL3RA (aDC). e Immunostaining of human thymic slides (P: 2 m) revealed medullary enrichment of CCR7/CCL19 pathway. Two representative regions are shown from five independent regions. f Immunostaining of human thymic slides (P: 2 m) showed co-expression between EPCAM (TEC marker) and CCL19 in medulla, as indicated by white arrows, thus supporting mTECs as a source for CCL19 in cortex. Two representative regions are shown from four independent regions. g Immunostaining of human thymic slides (P: 2 m) revealed co-expression of CD3 (T cell marker) and CCR7, as indicated by white arrows, thus supporting T cells as a source for CCR7 in medulla. Two representative regions are shown from three independent regions. F fetal, P paediatric.

The CCL signalling pathway is known to regulate the migration of selected thymocytes30, which could help establish medullary self-tolerance. One of the receptors on thymocytes mediating this process is CCR7, which interacts with CCL19 and CCL2131,32. Notably, the expression patterns of CCL19 and CCR7 were highly restricted to medullary regions (Fig. 5d, e and Supplementary Fig. 3j, k). Besides, CCL19 was more enriched in mTEC populations (Fig. 5f and Supplementary Fig. 3l), while CCR7 was comparatively found in SPs, B cells and activated DCs (aDCs) (Fig. 5b, g and Supplementary Fig. 3m). The co-localisation of CCL19 and CCR7 with another medullary APC marker IL3RA for DCs revealed multiple spots with co-expression among them (Fig. 5d and Supplementary Fig. 3j). In line with the negative selection of SP T cells leading to apoptosis, the medulla is enriched for the Galectin pathway, which is involved in cell death signalling. Additionally, galectins are also known to affect T cell activation and differentiation via APC modulation33. In summary, the analysis of the medulla niches reveals significant enrichment of key signalling pathways such as CCL, MIF, IL16, GALECTIN, and TGFB, which are crucial for promoting self-tolerance and efficient antigen presentation.

TFs regulating the cell states of mimetic TECs

The mimetic TECs refer to a group of TECs with distinct expressions of TFs that are usually from non-thymic lineages4. A recent study implied that these unconventional mTEC subsets could regulate diverse processes and functionally mimic endocrine, microfold or myocyte cell types15. The mTECs have the potential to differentiate into various molecularly distinct cells, including different stages of muscle differentiation18. However, a comprehensive spatial understanding of these cell states and the TFs that regulate mimetic TECs is still lacking. To systematically identify the TFs implicated in mimetic TECs, we utilised our high-resolution spatial atlas and scRNA-seq data (Fig. 6a). Since mTECs are epithelial cells located in the medulla, we focused on spatial spots of bin 50 or 25 μm resolution that expressed EPCAM, a well-known epithelial marker. We then screened for all known human TFs that were spatially present within EPCAM+ spots and expressed in EPCAM-sorted TECs within the scRNA-seq datasets5,17. To determine the TFs responsible for the molecular variation and cell states of mimetic TECs, we employed a majority voting method to assign the TFs identified in the EPCAM spots to the cortex, medulla, or septa (Fig. 6b and Supplementary Fig. 4a–e). It is important to note that mimetic TECs are relatively rare and were detected in 2674 spots across all samples, representing 0.4% of tissue sections sequenced.

Fig. 6: Identification of the TFs implicated in rare mimetic TECs.
Fig. 6: Identification of the TFs implicated in rare mimetic TECs.The alternative text for this image may have been generated using AI.
Full size image

a The schematic illustration depicts the identification of mimetic TFs (mimeTFs) expressed by mimetic TECs. The TEC spots on the spatial section at a bin 50 (~25 μm) resolution were determined using the expression of EPCAM. The TFs expressed within the spatial TEC spots were determined and then screened for the respective TF’s expression in scRNA-seq data of TECs. A majority voting strategy was used to stratify the identified TFs into the domains, and the TFs associated with the medulla were called mimeTFs. Created in BioRender. Chen, Y. (2026) https://BioRender.com/4jzktkg. b The stacked bar chart shows the percentage of spots expressing mimeTFs compared to all EPCAM spots within each sample. The colours in the bar chart indicate the sample. The grey colour represents the TFs identified from Stereo-CITE at bin 20, whereas the rest of the colours represent the samples that detected TF using Stereo-seq at bin 50. The labels highlighted in red are the TFs known in mTECs and mimetic TECs, and the TFs highlighted in blue are the mimeTFs found and verified in this study. c The niches are defined for the spatial multi-omics (Stereo-CITE) profiling of P:7w thymus, which included the information on abundance of 31 proteins coupled with whole transcriptomics at bin 20 (~10 μm) resolution. d The spatial distribution of EPCAM gene expression and the corresponding protein (CD326) abundance of the gene EPCAM in spatial multi-omics. F fetal, P paediatric.

A previous study estimated the location of a few known mimetic TECs, such as mTEC III keratinocytes, myoTEC, neuroTEC, ciliated TECs and tuft cells to be in the medulla and near Hassall’s corpuscles (HCs) based on scRNA-seq cell type projection onto spatial data9. However, the lower spatial resolution and the limited area of tissue sequenced hindered the unbiased identification of mimeTFs (Supplementary Fig. 4f–i). For instance, in a previous study, though the myoTECs are projected onto the spatial data, the key TF MYOG in myoTECs is not detected in conjunction with EPCAM due to lower spatial resolution (Supplementary Fig. 4i). In this study, we aim to address this limitation by utilising our high-resolution thymus atlas to identify TFs that are involved in mimetic TEC regulation systematically. We identified 70 mimeTFs specific to the medullary region, and the percentage of EPCAM+ spots expressing these mimeTFs across the samples displayed rare occurrences (Fig. 6b, Supplementary Fig. 4e and Supplementary Data 3). Similarly, we also identified TFs specific to the cortex and septa; however, these TFs showed relatively less abundance compared to the medulla (Supplementary Fig. 4a, b). Notably, the mimeTFs did not consistently exhibit differences based on fetal or paediatric origins. Key mTEC markers, such as AIRE and FEZF2, known for their role in regulating mimetic TECs, were detected34,35. We catalogued the known TEC mimetics based on the expression of lineage-specific TFs and genes found in previous studies15,16,17 (Supplementary Data 4). We uncovered several well-studied TFs in mimetic TECs, including ASCL1, INSM1, NEUROD1, MYOG, POU4F1, SOX2, SPIB and JUN (Supplementary Fig. 4j). Additionally, our approach revealed 62 TFs that may influence mimetic TEC cell states, suggesting extensive regulatory complexity within these rare cell populations.

To validate the detection capabilities for mimeTFs using our approach on spatial transcriptomics, we conducted a Stereo-CITE spatial profiling to capture both the transcriptome and proteome (including EPCAM) simultaneously. We further analysed the data at a resolution of 10 μm or bin 20 (instead of 25 μm), which better aligns with the diameter of epithelial cells. We identified the cortex, medulla, and septa niches for the CITE-spatial sample, and these regions are comparable to those found in other transcriptionally sequenced samples (Fig. 6c). The spatial plot indicates that EPCAM gene expression is sparse, whereas the corresponding protein profile reveals a concentration of EPCAM protein (CD326) in the medulla region as expected (Fig. 6d). To determine whether the same mimeTFs are observed at 10 μm resolution, we focused on the abundant EPCAM protein spots in the medulla and almost all the predicted mimeTFs were detected (Fig. 6b). Further co-immunostaining of mimeTFs with the pan-TEC marker EPCAM shows that TECs express these mimeTFs. Previously reported mimeTFs involved in microfoldTEC, myoTEC and neuroTEC, such as SPIB, MYOG and SOX2, can be detected (Fig. 7a). Notably, other mimeTFs identified in our analysis, including BHLHE40, DLX5, EGR3, and NFKB2 are expressed along with the TEC marker (Fig. 7b and Supplementary Fig. 5a), suggesting their involvement in mimetic TEC cell states. Overall, this study demonstrates a robust approach for resolving mimetic TECs spatially and to dissect their regulatory programmes.

Fig. 7: Immunostaining of the TFs expressed by rare mimetic TECs.
Fig. 7: Immunostaining of the TFs expressed by rare mimetic TECs.The alternative text for this image may have been generated using AI.
Full size image

a Co-immunostaining of reported mimetic TFs with pan-TEC marker EPCAM in the human thymic slides. Co-staining of mimetic TF SPIB or MYOG with EPCAM revealed their rarity in number, while co-staining of mimetic TF SOX2 with EPCAM showed a high number of SOX2+ TECs. Rare mimetic TF+ TECs are indicated by white arrows. Nuclei were stained with Hoechst. A representative region is shown from at least three independent regions. b Co-immunostaining of mimetic TFs and EPCAM in human thymic slides of 5 y sample. Examples of cells co-expressing mimetic TFs and EPCAM were marked with arrows. Further zoom-in of selected cells was marked by yellow arrows. A representative region is shown from two to six independent regions. Nuclei were stained with Hoechst.

Abundance and geopositioning of mimeTFs expressing cells

The mimetic TECs are rare cell types in the thymus, as indicated by the low percentage of spots expressing these mimeTFs (Fig. 6b). These TFs are generally also expressed at low levels, and spatial profiling is more susceptible to missing low-expressing genes. Thus, we analysed the number of EPCAM-sorted TEC cells that expressed these TFs in scRNA-seq data17 (Fig. 8a). TFs such as JUN, HES6, MEIS2, TSC22D1 and STAT3 are present in more than 80% of the cells and could be pan-mTEC TFs. Some known mimetic TFs such as INSM1, NEUROD1, POU4F1 and SOX2 are expressed in more than 50% of the cells, whereas TFs such as AIRE, FEZF2, MYOG, and SPIB are rare. Likewise, the novel mimeTFs BHLHE40, DLX5, EGR3 and NFKB2 are rare. Overall, these findings reinforce the rarity of lineage-defining mimeTFs and illustrate the technical and biological challenges of capturing such low-frequency TEC states.

Fig. 8: Abundance of mimeTFs and enrichment in KRT1+ spots.
Fig. 8: Abundance of mimeTFs and enrichment in KRT1+ spots.The alternative text for this image may have been generated using AI.
Full size image

a The points in the graph indicate the percentage of TEC cells that have a normalised expression of the identified mimeTFs greater than one in the scRNA-seq dataset comprising EPCAM+ sorted cTECs and mTECs. b Spatial visualisation of KRT1 spots in the medulla and LQ regions. KRT1 expression greater than one is referred to as KRT1+, and greater than 10 as KRT1++. c The relative percentage of spots containing mimeTFs within each niche and KRT1 high region, which includes KRT1+ and KRT1++ spots. d The identified mimeTFs are ranked based on the frequency of occurrence within the KRT1-expressing spots. e The network shows the colocalisation of mimeTFs that occur in the same spots. The node size represents the degree or number of connections, and the pie chart displays the relative abundance of mimeTFs in each sample. The thickness of the edges shows the relative number of spots that express the connecting mimeTFs together. f Immunostaining of AIRE+ and MYOG+ TECs in human thymic slides. Co-staining of mimetic TF MYOG and AIRE, with pan-TEC marker EPCAM revealed the instances where TECs expressing these mimetic TFs were close (within 50 μm). Nuclei were stained with DAPI. A representative region is shown from four independent regions. F fetal, P paediatric.

One limitation of our thymus spatial atlas is that the HC region cannot be precisely pinpointed, as these areas are typically defined by H&E staining. We observed that some of the HC-like structures present within the medulla were assigned to LQ regions, mainly because very few genes are expressed within the HC (Fig. 3c). Additionally, the LQ region exhibited a high expression of KRT1, which is a mTEC-III marker (Fig. 3b, d). A previous study suggests that mimetic TECs and mTEC-III are closer to HCs9. Therefore, we focused on the KRT1 spots in our samples and found an enrichment of KRT1-high cells near the HC, along with some cells forming a boundary around it (Fig. 8b and Supplementary Fig. 5b). Next, we calculated the relative percentage of spots in each niche that expressed mimeTFs within the KRT1-high region. When comparing the outer medulla to the innermost medulla (M3), we found that M3 had the highest proportion of spots expressing mimeTFs (Fig. 8c and Supplementary Fig. 5c). The KRT1-high region showed the highest number of spots with mimeTFs expression (Fig. 8c). The ranking of mimeTFs based on the frequency of occurrence within the KRT1-high region is shown in Fig. 8d. Notably, AIRE and FEZF2, known for tissue-restricted antigen expression regulation34,35 showed the highest specificity in the KRT1-high region, followed by other mimeTFs such as ASCL1, INSM1, DLX5 and MYOG. These results therefore pinpoint mimeTF-expressing TECs to discrete KRT1-high regions, establishing their geopositioning within specialised medullary microenvironments and HCs.

In addition to TFs, several genes are linked to mimetics TECs, such as KRT10 in keratinocyte-TEC, SPINK5 and LY6D in corneo-TEC, DES and MYL1 in muscle-TEC15,16,17 (Supplementary Data 4). For example, most mimeTF spots express KRT10, TTN, and UBB4B, while some genes are more specific; for instance, KRT7 is associated with SOX9, and the SPINK5 gene is expressed in spots containing mimeTFs such as SOX2, IRF1, NEUROD2, and SPEN at varying degrees (Supplementary Fig. 5d). The NEUROD2 mimeTF-containing spots show high expression of KRT10, GNG8, SFTPD, IL25, and SPINK5. This suggests that these mTECs have the potential to differentiate into various molecularly distinct cell types based on the expression of mimeTFs.

Subsequently, to examine the physical proximity of mimeTFs expressed by the mimetic TECs, we computed the co-occurrence of mimeTFs (Fig. 8e). AIRE and FEZF2, known for tissue-restricted antigen expression regulation34,35, were primarily co-localised with each other and formed a hub across all samples. Besides, a higher degree of co-localisation was observed among the mimeTFs INSM1, HES6, and ASCL1. Furthermore, we analysed the spatial neighbours of mimeTFs within the same spot, five spots and 10 spots (Supplementary Data S5). The network illustrates that mimetic TECs expressing mimeTFs are typically located in close proximity to one another. Likewise, co-immunostaining of MYOG and AIRE supported their physical proximity, with 6.5% of MYOG and AIRE co-occurring at the same spots, 58.1% within five spots, and 74.2% within 10 spots, and significantly co-localised by Fisher’s exact test with p values of 0.009, <2.2e−16 and <2.2e−16, respectively (Fig. 8f and Supplementary Fig. 5e, f). This shows a similar trend to that observed for the MYOG and AIRE spatial transcriptional co-localisation. Together, these multilayered analyses reveal that mimetic TECs are rare but spatially organised into tightly clustered, HC-proximal microenvironments, where their coordinated mimeTF expression and close physical proximity suggest a specialised regulatory hub for self-antigen presentation and central tolerance.

Cell type mimicking function by mimeTFs expressing TECs

To better understand the role of mimeTFs in regulating cell type mimicking function, we conducted a Gene Ontology (GO) biological processes enrichment analysis on the differentially expressed genes in spots that express mimeTFs (Supplementary Data 6). This revealed the presence of many non-thymic cell fate/lineage-related terms, which could infer the origin of their counterpart cell type. For instance, the GO term enrichment for known mimeTFs such as INSM1 correlated with pancreatic B cell differentiation. This finding aligns with observed autoimmune phenotypes in the murine pancreas following an Insm1 conditional knockout in TECs36. Additionally, ASCLI was associated with terms related to neuronal development, MYOG was connected to skeletal muscle development, and POU4F1 indicated roles in bone and nerve development. SOX2 was found to be involved in endocrine, eye, and pituitary gland development, while BHLHE40 was implicated in the nervous system, DLX5 in bone, skeletal and nervous system, and EGR3 in endothelial and muscle development. Subsequently, we aimed to map the lineage-specific TF that controls the mimetic TECs for antigen presentation for the lineage. For this, we assessed the knockout effects of these TFs using known zebrafish knockout phenotypes from the ZFIN database37 and human diseases from the Disease Ontology database38 (Supplementary Fig. 6a and Supplementary Data 7). The mimeTFs exhibited similar functions based on the involved organs, with major organ groups including the nervous system, brain, cardiac system, and eye. For instance, similar to the GO analysis, ASCL1, DLX5 and BHLHE40 were also associated with the nervous system. These findings highlight the potential roles of these factors in presenting antigens by mimicking other cell types for T cell maturation.

Next, we resolved the cell types located in close proximity to the mimeTF-expressing TECs. As expected, immune cell types such as B cells and DCs, mTECs, and SP thymocytes showed higher enrichment (Supplementary Fig. 6b, c). Notably, the mimetic TEC subtypes myo and neuro exhibited some of the highest enrichment across all spots expressing mimeTFs (Supplementary Fig. 6c). Since TFs are known to regulate tissue-specific expression, in this study, we aim to investigate whether these mimeTFs influence the genes responsible for mimicking other cell types. Specifically, we sought to determine if spatial data can be used to directly quantify genes that are not specific to the thymus, as well as antigen-presenting genes. For instance, Supplementary Data 8 presents the TFs alongside their target genes, which include known tissue-specific and antigen-presenting genes. The tissue-specific genes listed are not related to the thymus, suggesting a cell-mimicking function regulated by these mimeTFs. Next, the TF regulon for each mimeTF was computed, and for some mimeTFs, such as DLX5, SOX9 and MEIS2, the top TF regulon identified was the corresponding TFs (Supplementary Fig. 6d). For these, we explored the expression of tissue-specific genes from the Human Protein Atlas39 within the TF regulon target genes and found a varying number of tissue-specific genes regulated. For instance, MEIS2 regulates FBXO32, which is specifically expressed in skeletal muscle (Supplementary Data 8). Some of the target genes were also involved in the antigen presentation process, including the MHC class I and II. In line with the enrichment of APC cells in the mimetic spots, we also observed varying expression of HLA variants, DC, and NK cell receptors (Supplementary Fig. 6e). The associations with tissue-specific GO terms, disease phenotypes, and regulon targets indicate that mimeTFs drive cell type mimicking functions that broaden the self-antigen landscape presented to developing T cells. Overall, this study demonstrates the ability of high-resolution spatial omics to geoposition, characterise and investigate rare cell types transcriptionally primed to support central tolerance within the thymus.

Discussion

Our study presents a comprehensive, high-resolution spatial atlas of the fetal and paediatric human thymi, featuring a fine dissection of cellular compositions for each regional niche. The cartographic nature of our integrative spatial map enables us not only to directly locate the previously reported cell types via gene marker visualisation but also to resolve the spatial distances among subtypes of each thymic cell type. Our study has demonstrated the geo-position of thymic niches, paving the way for further examination of sub-regional functions during thymus development. Moreover, using the defined cell-type-specificity scoring metrics, we can directly assign the geo-position of the cell type in the thymus tissue.

Recent advances in both single-cell and spatial transcriptomics have enhanced our understanding of human thymic development, enabling the definition of the structural domains (i.e., cortex, medulla, and septa) despite the compact nature of thymic tissue. The application of spatial Stereo-seq technology serves as one approach to decipher the regional complexity of the human thymus. Compared to a previous study9,10,12, our high-resolution atlas enables capturing cell types via direct detection of gene markers. Known histological regions, including cortex, medulla and septa, are identified via spatial profiling of regional markers, such as PRSS16 (cortex) and AIRE (medulla). Each thymic cell type comprises multiple subtypes, usually differentiated by a few gene markers. Our high-resolution map allows for a direct comparison of their regional distribution.

The study examined the signalling pathways active during T-cell selection in thymic niches. Since the fetal samples could capture the capsule intact, pathways such as MK and IGF were prominent, indicating interactions between cTECs and fibroblasts. While the CXCL pathway was key to the migration of DP thymocytes in cortex. In contrast, the thymic medulla showed enrichment in pathways like CCL, MIF, and TGFB, which are crucial for establishing self-tolerance via antigen presentation. MIF was associated with MHC-II presentation, while CCL19 and CCR7 were essential for thymocyte migration and maturation, as discussed previously9,10. This highlights the intricate signalling networks within thymic niches that are essential for T cell maturation and self-tolerance.

Recent studies have profiled many different types of TEC mimetics at the single-cell level15,16,17,18. However, a systematic method for unbiased detection of TFs involved in TEC mimetics using spatial omics and their spatially interacting information remains less examined. In our study, we employed our high-resolution thymus atlas and multi-omic spatial data, including proteomic and transcriptomic information, to identify TFs that regulate the mimetic TEC. We found that around 0.4% of cells spatially expressed mimeTFs associated with mimetic TECs. Several well-studied TFs in mimetic TECs, including ASCL1, INSM1, NEUROD1, MYOG, POU4F1, SOX2, SPIB and JUN were identified. We observe a significant increase in the presence of mimeTFs in the KRT1-high region located near and surrounding the HCs. We estimate the functionality of these novel mimeTFs, such as BHLHE40 to be implicated in the nervous system, DLX5 in the nervous system and bone, and EGR3 in endothelial and muscle-related cell type presentation. Further, we establish that the mimeTFs are spatially co-localised with AIRE, FEZF2, HES6 and INSM1, forming a hub. Our study has a unique advantage in determining the spatial geopositioning of the mimeTFs with their interaction partners and estimated antigen presentation from different tissue origins.

Nevertheless, this study also presents some limitations. Owing to the technical challenges, a single-cell resolution of the spatial atlas remains lacking. Currently, larger cell sizes like fibroblasts can be near the single-cell level at 50 bin or 25 μm spots, but thymocytes are cell types that are many magnitudes smaller. Here, we focus on thymic epithelial cells in identifying mimetic TECs, which are approximately 10 μm. Hence, we try to overcome this by defining the spots at bin 20 or 10 μm resolution as close as epithelial cell size in multi-omic transcriptome and proteome data. However, the thymus is a highly compact tissue, making it more challenging to identify the borders of individual cells within it. Therefore, a higher-resolution atlas with cell border imaging could allow for better single-cell visualisation of cell-cell interactions among the subtypes of thymic cells. In addition, higher spatial resolution would enable detailed mimeTF-based TF-gene regulon identification. Further, the spatial aspect can be integrated with multimodal data, including scRNA-seq40,41,42, chromatin accessibility43, histone profiling44, DNA modification45 and RNA modification46 to capture the overall cellular dynamics in the thymus.

In summary, our study has established an integrated high-resolution spatial atlas for the human fetal and paediatric thymi, enabling fine dissection of thymic niches for major histological structures. We leverage the spatial profiling to identify TFs implicated in rare mimetic TECs. This spatial atlas enables the identification of rare cell types, as the mimeTFs are detected in less than 0.4% of the tissue section and spatially co-localised to form a hub that matures T cells via antigen presentation. This atlas could serve as a resource for future studies dissecting thymic microenvironments, central tolerance and immune dysregulation.

Method

Sample collection

Mouse thymic samples were harvested from euthanised C57BL/6, specifically 1-month-old and 3-month-old (IACUC: 221691). All work involving animals was conducted following the ethical guidelines of A*STAR. The housing facility was the Biological Resource Centre, A*STAR, which was a pathogen-free facility. The housing temperature was maintained at 21 + 1 degrees, 45–70% humidity and a 12/12 dark/light cycle.

Human thymic samples were collected after obtaining consent from patients, with approval from the SingHealth Centralised Institutional Review Board and National Healthcare Group Health Domain Specific Review Board (CIRB Ref: 2013/837/D; NHG DSRB Ref: 2023/00049 – “Understanding Human Thymic Ageing and Its Mechanisms”). Fetal thymic samples were obtained from aborted fetuses of indicated weeks of gestation (i.e. 13w, 14w, 17w and 18w). Informed consents were obtained from all women for the donation of their fetal tissues for research. Paediatric thymic samples (i.e., 7w, 2 y, 5 y, and 6 years) were collected from patients undergoing cardiac surgery after obtaining consent from them or their legal guardians (prior to surgery). Freshly harvested thymus samples from donors were kept and transported on cold DMEM media (Gibco, 11965092).

Sample embedding and freezing

Prior to embedding, each thymic tissue was wiped dry with tissues and Kimwipe to remove unwanted surface liquid. Subsequently, the tissue was dissected and trimmed to smaller sizes comparable to that of a cryomold (Peel-A-Way® 22 mm square, 20 mm deep, 27110 for paediatric samples; SAKURA Finetek Cryomold Intermediate, 15 × 15 × 5 mm, 4566 for fetal samples). Trimmed tissues were embedded in OCT (Tissue-Tek O.C.T. Compound, SAKURA, 4583) with adjustment to desired orientation. After that, the cryomold with tissues embedded in OCT was placed in a mixture of dry ice and ethanol for freezing. Upon a complete change in OCT from transparent to fully white in appearance, the freezing was finished. The OCT-tissue blocks were then stored on dry ice and subsequently at −80 °C for long-term storage.

For all subsequent spatial profiling, frozen human thymic tissues were sectioned with the same cryostat (LEICA CM3050S).

Stereo-seq spatial transcriptomics profiling

The samples were processed for Stereoseq according to the protocol described in study13 as well as manufacturer’s instructions. For fetal thymic samples (13 week, 14 week, 17 week, 18 week), Stereo-seq was performed with standard procedure. Briefly, the tissues were sectioned at 10 μm, and mounted to a 1 × 1 cm Stereo-seq chip (T chip, 110CT114). Mounted tissue on chip was dried at 37 degrees, followed by fixation with cold methanol at −20 degrees. Nuclei staining was performed with Qubit ssDNA reagent (Invitrogen, Q10212A) as recommended and imaged at ×10 magnification (Olympus, FV3000RS Upright). Subsequently, the tissue section was permeabilized (37 degrees) and then reverse transcribed (42 degrees) for at least 3 h. Permeabilization duration was pre-determined with permeabilization test kits (P chip kit, 111SP118). Tissues were then removed (55 degrees, 10 to 20 min), and cDNA was released overnight (55 degrees, between 3 to 16 h). Released cDNA was collected and purified following manufacturer’s instructions with SPRI beads (SPRIselect, B23318). Raw cDNA was amplified further and purified again with SPRI beads. A total of 20 ng amplified cDNA was input for library processing. Libraries were prepared based on manufacturer’s instructions (Stereo-seq Library Preparation Kit, 111KL114). Processed libraries were sequenced with a MGI DNASEQ-Tx sequencer.

In-house modifications were tested for paediatric thymic samples (2 years, 5 years, 6 years) to improve gene capture. Two key modifications were implemented: (1) mounted tissues were dried at −20 degrees with desiccant for around 30 min (instead of 37 degrees) prior to fixation to reduce RNA degradation; (2) the overall workflow was changed to Stereo-CITE protocol without antibody added, to accommodate the change of fixative from methanol to 4% paraformaldehyde (PFA) (Boster,AR1068). In brief, tissue sections were mounted to a 1 × 1 cm Stereo-seq chip slide (10 μm for 2 years old sample, 5 μm for 6 years old sample, T slide, 210CT114), followed by low temperature drying at −20 degrees. Mounted tissue was subsequently fixed with 4% PFA (Boster,AR1068) at room temperature (RT) for 10 min, followed by blocking at RT with pre-prepared blocking buffer as recommended by the manufacturer (Horse Serum, Thermo Fisher, 26050070; Goat Serum, Thermo Fisher, 16210064; Triton X-100 Solution, Sigma Aldrich, 93443-100 ML; Blocking reagent from Stereo-CITE seq’s protein assist kit; RNase Inhibitor, Thermo Fisher; Salmon Sperm DNA sheared, Invitrogen,AM9680) for 65 min. No FcR blocking reagent was added as no antibody would be added later. Nuclei staining was then performed with DAPI (Thermo Fisher, 62248) and imaged at ×10 magnification (Olympus, FV3000RS Upright). De-crosslinking was conducted at 70 degrees for 15 min. Upon cooling down to room temperature, the section was permeabilized at 37 degrees with duration pre-determined using Permeabilization test kits (211SP118), followed by reverse transcription for at least 3 h at 42 degrees. The downstream steps were similar to those for fetal sample processing, including tissue removal, cDNA release, cDNA purification and amplification, as well as library processing.

Stereo-CITE seq spatial transcriptomics profiling

Stereo-CITE seq was performed only for paediatric 7 week sample, following manufacturer’s instructions. The tissue was sectioned at 10 μm and mounted to a 1 × 1 cm Stero-seq T chip slide (210CT114). The section was dried at 37 degrees, followed by 4% PFA fixation. The remaining steps were the same as described above, except for blocking step and imaging step. Blocking was performed at room temperature with FcR blocking reagent (Biolegend, 422301) added for 20 min. A cocktail of 31 primary antibodies was prepared and added to the tissue section (45 min at room temperature). Secondary antibody incubation was performed at room temperature for 15 min. DAPI staining was performed after secondary antibody incubation, at room temperature for 2 min. Nuclei and protein imaging were then performed. The primary antibody list is shown in Supplementary Table 1.

MACSima spatial profiling

MACSima spatial profiling was performed by Singapore Immunology Network (SIgN) as a contracted service. Frozen human thymic tissue was sectioned at 7 um, and mounted onto a SuperFrost Plus slide (Fisher Scientific, FIS12-550-15). The slide was transported on dry ice to SIgN for downstream MACSima processing, which followed the guidelines described by the earlier study14. Briefly, the mounted section underwent fixation and permeabilization (in blocking buffer) prior to cyclic antibody staining. A total of 22 primary antibodies were used (see below), with conjugation to either fluorescein isothiocyanate (FITC), phycoerythrin (PE) or allophycocyanin (APC). Each cycle of antibody staining was followed by immediate imaging, before the start of the next antibody staining cycle. The entire process was automated using the MACSima instrument. The primary antibody list is shown in Supplementary Table 2.

Hematoxylin and eosin (H&E) stain of thymic slides

The OCT blocks of both murine and human thymic tissues were sectioned using a cryostat to 10 μm per section for H&E staining. All H&E staining of thymic slides was performed with the H&E Staining Kit (Hematoxylin and Eosin) (Abcam, ab245880) according to the manufacturer’s protocol. Briefly, the thymic section was fixed with cold methanol at −20 degrees for 30 min. Subsequently, the fixed slides were stained with Hematoxylin, Mayer’s solution at room temperature for 5 min, followed by bluing reagent for 10–15 s. Then, section was stained with Eosin Y Solution for 3 min. The stained tissue was mounted with DPX Mountant for histology (Sigma,06522-100 ML). In between each staining step, sections were washed with distilled water or absolute ethanol as instructed in the manufacturer’s protocol. All H&E slides were imaged with the Zeiss AxioObserver7 HCS at ×10 magnification.

Immunostaining of thymic slides

Human thymic slides (10 μm per section) were fixed with 4% paraformaldehyde at room temperature for 10 min, followed by one to two times washing with 1X Phosphate buffered saline (PBS, Gibco,10010031). Fixed sections were subsequently permeabilized with 0.25% Triton-X (Sigma) at room temperature for 10 min, followed by washing one to two times with 1x PBS. Blocking was performed with 7% Fetal bovine serum (FBS, heat-inactivated, Gibco,16140071) or 1 % Bovine Serum Albumin (BSA, Capricorn Scientific, BSA-1S) at room temperature for at least 1 h. Sections were then incubated with primary antibodies followed by secondary antibodies. Prior to mounting with a mounting medium (VectorShield,H-1000-10), sections were stained with Hoechst for nuclei (1000x dilution with blocking buffer, Invitrogen, H3570) at room temperature for 10 min if the mounting medium was without DAPI. In between each incubation, 6 times of 1xPBS washes were performed. Primary antibody incubation was performed at 4 degrees for 16 h and secondary antibody incubation was done at room temperature in the dark for 1–2 h. All antibodies were diluted in respective blocking reagents. Primary antibodies (diluted to 1 ug/mL for all) used are in Supplementary Table 3. Secondary antibodies (1000x diluted) used were Goat anti-Rabbit Alexa Fluor 488 (Life Technologies, A11008), Goat anti-Mouse Alexa Fluor 555 (Life Technologies, A21422), Donkey anti-Mouse Alexa Fluor Plus 488 (Thermo Fisher, A32766), Donkey anti-Rabbit 594 (Thermo Fisher, A21207), and Donkey anti-Goat 647 (Thermo Fisher, A32814). All slides were imaged with the Olympus FV3000RS Upright Microscope or the Zeiss LSM700inv Microscope.

Quantification of co-localisation using immunofluorescence

Quantification analysis on the immunofluorescence slides was performed using open-source software, Fiji47, which is a distribution of ImageJ. A custom ImageJ macro was written to perform the analysis in Fiji. Due to the rare nature of myo-TECs, images were taken on regions with at least 1 myo-TEC present and with this ImageJ macro, those images can be batch processed. This macro uses Bio-Formats48 plugins to open the microscope images and nuclei on Hoechst channel are segmented using the StarDist plugin. The segmented nuclei are then added as regions of interest (ROIs) to ImageJ ROI manager. Nuclei ROIs in the ROI manager that do not have EPCAM signal or are too small are then removed. The remaining nuclei ROIs are the TEC nuclei.

MYOG staining and AIRE staining were then used to identify which TEC nuclei ROIs in ROI manager are myo-TEC and mTECII, respectively. Staining control slides (i.e. slides stained with only secondary antibodies) were used to set the minimal intensity thresholds to reduce false-positive nuclei. For each positive myo-TEC nuclei, the macro can check if there is any mTECII nucleus within a circle of diameter either 50, 125 or 250 um was labelled to identify the existence of any mTECII nucleus. A total of 23 images across six patients were included for quantification analysis.

At a defined spatial diameter (50, 125 or 250 um), EPCAM⁺ spots were classified based on MYOG and AIRE expression into four groups: MYOG⁺/AIRE⁺ double-positive, MYOG⁺ only, AIRE⁺ only, and double-negative. Counts for each group were extracted, and the conditional proportion of MYOG⁺ cells co-localised with AIRE was computed. A 2 × 2 contingency table was then constructed to assess the statistical association between MYOG and AIRE expression using Fisher’s exact test.

Sample dissociation to single cells

All fresh paediatric samples were dissected and trimmed prior to tissue embedding and freezing in OCT as described above. Small tissue pieces were randomly selected and collected for tissue dissociation, yielding single-cell suspensions. In brief, fresh thymic samples were minced into small pieces, followed by enzymatic dissociation with dissociation solution. Dissociation solution was prepared by mixing Collagenase IV (Gibco,17104019) at 2U/uL and DNase I (STEMCELL Technologies, 07900) at 10 ug/mL. Minced tissues were incubated with dissociation solution at 37 degrees for 30 min, with agitation using pipettes every 5 min. At the end of incubation, dissociated tissues were passed through a 70 μm cell strainer followed by a 40 μm cell strainer to create a single-cell suspension. Subsequently, red blood cells were lysed with ACK lysis buffer (Gibco, A1049201) via incubating at room temperature for 10 min. Cells were resuspended at 0.5% BSA for cell counting with an automated cell counter. Cell viability was measured with trypan blue (Gibco, 15250061). For long-term storage, cells were frozen in freezing media (DMEM/10%FBS supplemented with 10% DMSO) in liquid nitrogen.

snRNA-seq and scRNA-seq profiling

Due to limited recruitment of fetal samples, no extra fetal samples were collected for dissociation to yield fresh single-cell suspensions. Hence, snRNA-seq was performed instead. Five to ten sections of frozen thymic tissues (at 10 μm thickness) were used to extract nuclei via commercial nuclei isolation kits (10x Genomics). Extracted nuclei were fixed and subsequently subjected to fixed RNA profiling kits (10x Genomics). All steps were performed following the manufacturer’s instructions.

For paediatric samples with dissociated single cells, frozen vials were thawed, followed by dead cell removal (Dead Cell Removal Kit, Miltenyi, 130-090-101). Viable cells were recovered and subsequently subjected to MACS sorting prior to scRNA-seq. MACS sorting was performed according to the manufacturer’s protocols (Miltenyi) with the following microbeads: CD3 (Miltenyi, 130-097-043), EPCAM (Miltenyi, 130-061-101). Briefly, total thymic cells were incubated with CD3 MACS beads at 4 °C for 15 min, followed by magnetic separation with MS columns (Miltenyi, 130-042-201) or LS columns (Miltenyi, 130-042-401). Flow-through containing CD3-ve populations were then collected and incubated with EPCAM MACS beads at 4 degrees for 30 min. Upon magnetic separation, both populations were collected. MACS-sorted CD3+, CD3−/EPCAM+ and CD3−/EPCAM− populations were subjected to scRNA-seq with 3’scRNA-seq kits (10x Genomics, 1000127, 1000268, 1000190). All sample processing and subsequent library construction were performed according to the manufacturer’s instructions.

sc/snRNA-seq data processing

Cellranger v7.1 was utilised to map the sequencing data from 10x scRNA-seq and fixed RNA profiling snRNA-seq to the respective mm10 v32 mouse and GRCh38 v32 human reference genomes. The count data was processed for downstream analysis using Seurat v4.349. The cells were then filtered out if the cells had less than 500 genes or greater than 10% mitochondrial reads. Additionally, cells with UMI counts less than 0.01% quantile and greater than 95% quantile across all cells were removed. The genes were filtered out if they were detected in less than 10 cells. Only genes with log10 of average UMI count across all cells greater than −2.5, genes with at least 2 UMIs and genes detected in at least 10 cells were used for downstream analysis. To identify and eliminate doublets, DoubletFinder v2.0.350 was used. The gene count was normalised using SCTransform51 and dimension reduction was performed using PCA and UMAP.

sn/scRNA-seq integration with public data

The sc/snRNA-seq data from this study were integrated with a publicly available thymus scRNA-seq dataset. First, the cells from the public dataset were grouped by sample. Then, the samples from both the public dataset and this study were integrated. Integration anchors were identified using the ‘FindIntegrationAnchors’ function with RPCA reduction, incorporating the first 50 PCs. The integrated data was scaled, and a UMAP projection was computed. There were 290,401 cells representing the diverse cell types present in the thymus. The cell types were annotated by projecting the cells in this study to the reference public dataset5 cell types in Seurat. The differentially expressed genes were identified using the FindAllMarkers function with MAST differential expression testing from Seurat, with a minimum proportion of cells of 0.25 and a minimum log fold change of 0.25.

Spatial transcriptomics data processing

We performed spatial transcriptomics on thymus tissues using Stereo-seq13. The SAW v613 pipeline was used to process the spatial image and transcriptome to obtain the gene expression profile for each DNB spot. The sequencing reads of mouse and human thymus were aligned to mm10 v32 mouse and GRCh38 v32 reference genomes, respectively. The downstream analysis was performed at a bin 50 resolution, corresponding to a 25 μm diameter spot in the tissue. The analysis was performed using Stereopy v1.013 and Scanpy v1.9.652. The low-quality spots are based on the number of genes expressed in each spot. For bin 50, spots expressing more than 100 genes were selected for samples with an average of unique genes detected greater than 500. For samples with fewer than 500 unique genes, spots that expressed more than 75 genes were selected. The spots with mitochondria reads greater than 10% were removed. The spots were then normalised using SCTranform normalisation. Spatial neighbours were identified using the top 30 PCs. Then, spatially constrained Leiden clustering13 was performed to identify the transcriptionally and spatially similar spots with default parameters. The spatial neighbourhood enrichment analysis was performed using Squidpy v1.3.053. The identified Leiden clusters were then visualised as edge-weighted networks based on the neighbourhood z-score between clusters.

Integration of spatial clusters across samples

First, for each sample, we identified the Leiden clusters through spatial inference. To obtain the gene signature of these clusters, we utilised decoupler v1.8.054 to sum the raw counts, creating pseudobulk gene expression profiles for the clusters across all samples. We used a set of genes representing the major cell types in the thymus as features: LY75, PRSS16, PSMB11, FABP5, RUNX3, AQP3, ZBTB7B, FOXP3, TOX2, LAMP3, IGHM, DNTT, MKI67, TRAF1, MS4A1, TRAF5, CD8A, CLEC10A, CD14, CD68, MS4A7, TPSAB1, TPSB2, KRT15, EPCAM, AIRE, FEZF2, KRT1, KRT10, VWF, PECAM1, CDH5, HBB, CD34, COL1A1, COL1A2, LUM and ACTA2. The pseudobulk transcriptional profiles of these features were then batch-corrected using pyComBat v0.3.355. Following this, we conducted PCA analysis on the batch-corrected pseudobulk cluster profiles and used the top five principal components to fit the model. The K-means algorithm was applied to cluster the pseudobulk profiles into ten groups. These ten groups were then assigned to specific regions: the cortex, medulla, and septa. The assignments were based on the expression of key marker genes: AIRE and MS4A1 for the medulla, PRSS16 and LY75 for the cortex, and COL1A2, CD34, and VWF for the septa. The group exhibiting the lowest overall gene expression was classified as low quality. For comparison between different technologies, such as transcriptome and proteome analysis, we utilised TissueTag v0.2.29, which marks a common continuous axis to subdivide the cortex, medulla, and septa. The cortex was further divided into three niches, from the outer cortex I to the innermost cortex III, by splitting the region into equal parts containing a similar number of cells. Similarly, the medulla and septa were separated into three niches.

Spatial cell type annotation

The cell-type deconvolution of spatial bin 50 spots was carried out using RCTD v2.2.156. RCTD used a supervised learning approach to decompose the integrated scRNA-seq data and predict the cell types present in the spatial spots. RCTD was run in full mode across all samples. We set the parameter of a maximum of 3 cells within a 25 μm spot based on the assumption that an inactive T cell is approximately 8 μm, an epithelial cell is about 12 μm, and a fibroblast is around 15 μm in size. The number of cell types assigned to each cluster or niche was scaled to a range of 0 to 1 based on the total number of spots within each niche. Cell types with values greater than 0.5 were retained and visualised using a heatmap.

To calculate the cell-type-specificity score, we combined known cell-type marker genes with the top ten differentially expressed genes identified from integrated fetal and paediatric scRNA-seq data using the FindMarkers function in Seurat. The known cell-type marker genes are detailed in Fig. 2b. A complete list of the cell type-specific genes utilised in this study for calculating the specificity score can be found in Supplementary Data 2. These cell type-specific genes were used to compute the expression score via the gene score function from Scanpy. Spots with cell type-specific scores equal to or greater than 0.75 were assigned to the corresponding cell type. The number of cell types assigned to each cluster or niche was scaled to a range of 0 to 1 based on the total number of spots within each niche. Cell types with values greater than 0.5 were retained and visualised using a heatmap.

Three gene co-expression

The enrichment for each of the genes in the three gene combination was calculated using Scanpy score genes, and spots that were greater than a threshold of 0.1 were used to select spots expressing the gene. The colours used were red, blue and green to mark the spots expressing only one gene and the respective combination of colours was used to indicate the presence of 2 or 3 gene expressions. To quantitatively show the spots co-expressing two or more genes, we visualised a heatmap with a number showing the exact number of spots and the colour gradient normalised across domains within each sample.

MACSima spatial image analysis

The cyclic Macsima run for 22 proteins stated above was performed in a paediatric sample of age 7 weeks. The image datasets for each protein were imported into the MACS iQ View Software v1.3.1 for analysis. Segmentation and manual contrast adjustments were applied to the images, and the mean fluorescent intensities for each cell were calculated against the background. The protein abundance values for each spot, based on the x and y axes, were exported for further analysis using Seurat. The data were log-normalised, and PCA was performed. The cells were clustered into niches based on a continuous tissue axis, and the protein abundance within each niche was visualised using a dot plot.

Signalling pathways

For the spatial transcriptomic data used in the study, cell-cell communication patterns within the cortical and medulla niches at bin 50 resolution were computed using CellChat v2.1.257. The spots were spatially constrained to 200 μm in diameter for the ligand-receptor pair interaction calculation. The communication probability was calculated using type as “truncatedMean”, trim parameter at 0.05 and scale distance at 0.1 for all the samples. The communication pathways were filtered based on the presence in at least 10 spots. The signalling role score across all the samples was obtained and normalised to a 0 to 1 scale within each sample. The overall signalling pathways with scores combining the incoming and outgoing signals were then visualised using a ggplot2 v3.5.1 tile graph.

Identification of mimeTFs in spatial transcriptome

Since mimetic TECs refer to a group of TECs with distinct expressions of TFs that are usually from non-thymic lineages, we used an in silico approach to screen for all possible TFs expressed by these TECs. The pipeline is in Python using scanpy v1.9.6, pandas v2.2.3, and matplotlib v3.7.5.

Step I: Identifying and extracting the TEC spots

The mimetics are TECs, and from the scRNA-seq data, we see that EPCAM is only expressed by TECs in the thymus. We used EPCAM, a pan-TEC marker, to mark the spots that contained TECs. The spots that expressed EPCAM greater than one were selected.

Step II: Selecting the TF that are expressed in the TEC spots

The mimetic TECs are known to express lineage TFs; hence, 1639 known human TFs were obtained from the previously published human transcription factor database58. Next, within the EPCAM+ spots, the TFs expressed with normalised expression levels greater than 10 and in at least three spots on the tissue were selected.

Step III: Determination of TF expression that is specific to TECs

Next, the ZNF TFs within the filtered TFs were removed as the biological function for these TFs is complex to interpret. The TFs are expressed in a spot that may also contain other cell types along with TECs. To ensure the TFs are being expressed by the TECs, we used publicly available scRNA-seq data on EPCAM+ sorted TECs5,17. The TFs were further sub-selected based on whether the TFs had a normalised expression level of at least five in TECs from scRNA-seq data.

Step IV: Spatial-specific TFs were determined

Based on the distribution of spots, the TFs were assigned to the cortex, medulla, and septa domains using the majority voting approach. Seventy TFs were assigned to the medulla, 138 to the cortex, and 46 to the septa. The TFs were present in 2674 spots across all samples, which accounts for 0.37% of all the tissue spots profiled. The TF assigned to medulla were referred to as mimeTFs and was present in 783 spots across all ages.

Spatial multi-omics data processing

We performed spatial transcriptomics and proteomics on thymus tissue using Stereo-CITE. The SAW v7.1.2 pipeline processed the spatial image and transcriptome to obtain the gene expression profile for each DNB spot. The sequencing reads of the human thymus were aligned to the GRCh38 v32 reference genome. The cite-spatial data was analysed at a bin 20 or 10 μm resolution to be closer to the epithelial cell type. The transcriptome data were log-normalised using Scanpy functions. The spot of the proteome data for each protein was normalised using SpatialGlue v1.1.559.

Identification of mimeTFs in spatial multiome

The Stereo-CITE spatial transcriptomics and proteomics were carried out on the P:7w sample simultaneously on the same tissue section, resulting in 635,400 spots. The analysis was performed at a bin 20 resolution, and the domains such as cortex, medulla and septa were defined using marker genes PRSS16 and AIRE in transcriptome data. The niches were defined using the tissue continuous axis, similar to the other transcriptome-only data. The proteome data spots greater than the mean distribution of CD326 (EPCAM protein) were selected to be EPCAM+ protein spots. Then the corresponding EPCAM+ transcriptome spots were used to identify mimeTFs. A similar approach as described in the previous section was used to identify mimeTFs. The only exception was that the gene expression requirement of the TF in EPCAM+ spots was reduced to 5 instead of 10 to accommodate the lower gene expression profile at bin 20 resolution. The mimeTFs were present in 17459 spots in medulla, which is 2.7% of the tissue.

Defining KRT1 high regions

Since KRT1 expression was seen in the medulla and LQ regions, the determination of KRT1-high spots was restricted to these regions. The spots with KRT1 expression greater than zero were grouped as KRT1+ and with greater than 10 as KRT1 + + regions. Collectively, these spots are called KRT1 high regions. The relative frequency of occurrence of mimeTFs within the KRT1 high spots was determined.

Differential GO function

The spots expressing mimeTFs were grouped, and differential expression analysis was carried out for GO_Biological_Process_2023 using drug2cell v0.1.060. The top 10 differential gene ontology terms were identified, and development-related terms were highlighted.

Co-localisation of mimeTFs

The co-localisation of expression on mimeTFs with other cell types was computed based on the cell-type-specificity score. Additionally, the co-localisation of different mimeTFs in the same spot was calculated based on the number of spots expressing mimeTFs. The co-localisation network was visualised using Networkx v3.161 and Cytoscape v3.10.362. The nearest neighbours with five bins-50 spots or 100 μm, and 10 spots or 200 μm were calculated using Euclidean distance between the x-y coordinates.

TF knockout phenotypes and disease phenotypes

The developmental knockout effect was found for each TF identified using the reported experimental phenotypes from the ZFIN database37, which comprises zebrafish knockout phenotypes. mimeTFs associated with human diseases were obtained from the Disease Ontology database38. Affected organ labels were assigned manually based on the affected organ in the zebrafish knockout phenotype or human disease.

Public Visium data analysis

The publicly available spatial Visium data was obtained from this study9. A similar procedure of computing the mimeTFs from spatial spots was employed. The spots that expressed EPCAM greater than one and were present in the medulla region were selected. Since the Visium data had lower resolution, the number of genes detected was comparatively lower. We reduced the gene expression cutoff from 10 to 5 for selecting TFs in EPCAM+ spots. Then TF, which were expressed in at least three spots, were selected as mimeTFs.

TF and target gene analysis

The TF-based gene-regulatory network for the spatial mimeTF spots was determined by using pySCENIC v0.12.163 and cistargetDB with the TFs and targets in the human genome. The target genes were screened for tissue-specific genes from the Human Protein Atlas39 and antigen processing and presentation from the KEGG pathway.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.