Main

The emergence of somatic diversification of antigen receptor genes required a radical re-organization of adaptive immune facilities to avoid the potentially fatal autoimmunity associated with quasi-random receptor specificities. It is conceivable, however, that the initial diversity of antigen receptor repertoires in early vertebrates was much lower than that of extant vertebrates13, allowing a grace period during which appropriate tolerance mechanisms could have evolved in step with a gradually increasing diversity of antigen receptor repertoires. For instance, the emergence of new antigen presentation pathways and/or the repurposing of ancient antigen presentation pathways14 is thought to have contributed to the diverse central and peripheral tolerance mechanisms that characterize the extant vertebrate immune systems. In the thymus, central tolerance is induced by at least two functionally connected mechanisms. The expression of peripheral genes by medullary epithelial cells1 and the presence of peripheral mimetic cells2,3,4,5,6,7 combine to provide developing T cells with a means to probe the specificity of their antigen receptors against the universe of peripheral self antigens15. Peripheral cell types, such as muscle-like cells, goblet and mucous cells, have long been known to be present in the thymic microenvironment of many different vertebrate species9,16,17. For instance, as early as 1905, Hammar concluded that myoid cells are peculiarly differentiated epithelial cells and therefore unrelated to true muscle cells17, although their origin from extrathymic tissue sources has later also been considered18. A tolerance-inducing function of peripheral cell types in the thymic microenvironment was suggested more than 50 years ago19, but this attribute of thymic mimetic cells has only recently been experimentally verified2,3,4,5,6,7. The presence of peripheral cell types in the thymus of all jawed vertebrates9 prompted us to examine further aspects of their development and evolutionary origin.

Development of mimetic TEC subsets

Focusing on major thymic mimetic cell types2, we re-examined the cellular heterogeneity of EPCAM+CD45 thymic epithelial cells (TECs) at various developmental time points, for which we had already identified two progenitor populations, in addition to cortical TECs (cTECs) and medullary TECs (mTECs)20. Signature gene sets for mimetic cell types were derived from single-cell differential gene expression data2 and used as input to AUCell21 to score each signature per cell (Extended Data Fig. 1a,b). Cells with signatures of early progenitors and cTECs dominate the epithelial compartment at embryonic day 16.5 (E16.5), whereas those characterizing postnatal progenitors and mTECs constitute the majority at postnatal day 28 (P28); at birth (P0), the epithelial populations have an intermediate composition (Extended Data Fig. 1c and Supplementary Table 1). The 11 mimetic cell types2 examined here are robustly detected at P28 (Extended Data Fig. 1c). Whereas at P28 essentially all Aire-stage cells express the genes encoding the tolerogenic factors AIRE22 and FEZF223, only about a quarter of mimetic cells express Aire (compatible with their post-Aire phenotype2); by contrast, more than 70% of mimetic cells express Fezf2 (Extended Data Fig. 1d). At E16.5, mimetic cells are essentially undetectable, with the exception of a few muscle and goblet cells (Extended Data Fig. 1c). At P0, muscle cells represent a major fraction of mimetic cells; at this stage, Myog-expressing cells often occur in small medullary clusters (Extended Data Fig. 2a,b), whereas they appear to be more scattered at P28 (Extended Data Fig. 2c). Enumeration of Myog-expressing cells by RNA in situ hybridization (ISH) indicates that they are about three times more frequent at P0 than at P28 (Extended Data Fig. 2d), validating the single-cell RNA-sequencing (scRNA-seq) data (Extended Data Fig. 1c and Supplementary Table 1). Other mimetic cell types are also predominantly found in the medulla (Extended Data Fig. 2e,f). To quantify these changes among cell types during development, we performed differential abundance analysis of cellular compositions using scCODA24 (Extended Data Fig. 1e). Compared with the P28 time point, TECs at E16.5 exhibit a greater abundance of early progenitors and cTECs, and fewer postnatal progenitors, mTECs, Aire-stage cells and tuft cells. At P0, we still found larger early progenitor and cTEC populations, and slightly smaller muscle populations (Extended Data Fig. 1e).

However, the sensitivity of differential abundance analysis for rare cell types (such as the thymic mimetic cells) is relatively low, complicating its use for comparative studies across a variety of developmental and evolutionary conditions. We therefore turned to the use of bulk RNA-sequencing (RNA-seq) analysis of purified TECs as a more versatile analytical approach. To identify relative shifts among canonical and mimetic TEC signatures, we performed competitive enrichment analysis of specific cell signatures against relevant control conditions25 (Fig. 1a,b and Supplementary Fig. 1). At E15.5 and P0, early progenitor and cTEC signatures were highly enriched compared with P28, whereas mTEC and most mimetic signatures were depleted (Fig. 1a,b), mirroring the scRNA-seq data (Extended Data Fig. 1c). By contrast, signature gene sets of ciliated, goblet, ionocyte and muscle cells showed less variation over developmental time (Fig. 1a,b and Supplementary Fig. 2). These observations indicated that individual groups of mimetic cells may develop in an asynchronous fashion, possibly related to the successive activities of the previously identified early and postnatal TEC progenitors20.

Fig. 1: Development of mouse thymic mimetic cells.
figure 1

Enrichment of canonical TEC and mimetic signatures in bulk RNA-seq data of purified TECs from E15.5 (n = 4) and P1 (n = 2) time points compared with the P28 (n = 3) time point. a, Visualization of changes in gene expression between conditions. Each line represents a gene of the indicated signature, and its position on the x axis shows the value of the t-statistic derived from differential expression analysis. Numeric values listed in the left column represent log10(adjusted P) from enrichment analysis with camera25. Log-transformed P values of upregulated sets were multiplied by −1 so that positive values indicate upwards directionality and negative values indicate downwards directionality. b, Log-transformed and signed P values from camera (two-sided, Benjamini–Hochberg adjusted) as shown in a, represented as a heat map. Values beyond the limits of the colour scale were rounded to the nearest limit. Adj., adjusted. c, Overall Foxn1 expression levels (log2 of counts per 10,000 reads (CP10K)) in Aire-stage cells and mimetic cells from scRNA-seq data (n = 4 mice; age, P28). The proportion of cells with detectable Foxn1 is indicated; P = 1.2 × 10−33 (two-sided binomial test, Benjamini–Hochberg adjusted). d, Proportion of cells with detectable CRISPR–Cas9-induced barcodes at the Hprt locus for each signature. n indicates total number of cells, pooled from three mice at P28. Data are mean ± 95% confidence interval of proportions (Wilson/Brown method51); P values are derived via likelihood ratio test between logistic regressions with or without ‘signature’ as a predictor.

Source data

To explore this possibility further, we examined the expression levels of the Foxn1 gene, which encodes a key transcription factor that is required for the differentiation of canonical TECs26,27,28, in the major TEC populations and mimetic cells. At P28, the cTEC population exhibits the highest proportion of Foxn1-expressing cells (Extended Data Fig. 3a). When compared with Aire-stage cells (Fig. 1c), fewer mimetic cells express Foxn1, although some differences exist among the various mimetic cell types (Extended Data Fig. 3b). Using a Foxn1-directed CRISPR–Cas9 barcoding system20, we found that both canonical TEC populations and mimetic cells were marked by the presence of barcodes (Extended Data Fig. 3c) in similar proportions (Fig. 1d); as only few muscle cells could be detected in this experiment, no conclusion can be reached for this cell type. When analysed at the level of individual informative barcodes, it becomes apparent that mimetic cells as a group share their barcodes more often (but not exclusively) with postnatal progenitors than with early progenitors (Extended Data Fig. 3c). This difference can be explained by the small number of mimetic cells such as muscle cells that belong to the developmentally early group, and the large expansion of the postnatal progenitor population and its mimetic descendants, such as enterohepatic and skin cells (Extended Data Fig. 1c). The spatial distribution of mimetic cells as analysed by multicolour RNA ISH suggests a close affinity of Aire-expressing cells to the Igfbp5-expressing20,29 postnatal progenitor population, as expected; the same is true for Foxi1-expressing ionocytes, whereas Foxj1-expressing ciliated cells are not found in close vicinity to the presumptive postnatal progenitor population (Extended Data Fig. 4). Likewise, reconstruction of the developmental trajectory of mimetic cells with CellRank29,30 identified some mimetic cell types, such as ciliated cells (Extended Data Fig. 5 and Supplementary Fig. 3) and muscle cells (Supplementary Fig. 3), as early branching mimetic cell types. We conclude that mimetic cells and canonical TECs initially pass through a developmental stage marked by Foxn1 gene expression but subsequently follow different developmental trajectories.

Genetic determinants of mimetic cells

We sought additional clues to support the hypothesis of a developmentally orchestrated appearance of mimetic cells in the thymic microenvironment by analysis of a number of genetic models. To this end, we first explored whether the presence and proportion of mimetic cells is affected by mouse strain and size of the thymus (https://phenome.jax.org/measureset/10415) (Supplementary Fig. 4). Compared with strains such as C57BL/6 or CBA, the thymus of PWK mice supports approximately half of the number of thymocytes (Fig. 2a), and about 5% the number of TECs (Fig. 2b). The thymi of F1 hybrids of CBA and PWK mice exhibit intermediate values for these parameters31 (Extended Data Fig. 6). Phenotyping of reciprocal backcrosses revealed clear separations in the distributions for thymocytes and TEC numbers (Extended Data Fig. 6), suggesting that thymopoietic activity is determined by only few large-effect modifiers. Comparative analysis of TEC transcriptomes indicated only subtle differences; notably, whereas an enrichment of the cTEC signature was observed, no significant differences among mimetic cell types were detected when comparing PWK mice to C57BL/6 mice (Fig. 2c, Extended Data Fig. 7a and Supplementary Fig. 5a). Thus, despite the size differences of the organ, the representation of mimetic cells in the thymic microenvironment appears to be unchanged.

Fig. 2: Malleability of mimetic cell populations.
figure 2

Thymic cellularity of C57BL/6 (B6), PWK and CBA strains at P28. a, Absolute numbers of CD45+ thymocytes. b, Absolute numbers of EPCAM+CD45 TECs. a,b, n is indicated in panels; each data point represents one thymus explant from one animal. P values between groups were derived from pairwise two-sided t-tests with Bonferroni correction after significant (P < 0.05) ANOVA. Boxes encapsulate the first to third quartile, the line indicates the median and whiskers extend to the furthest point with a distance of up to 1.5 times the interquartile range from the boxes. cf,h, Signature enrichment of canonical TEC and mimetic signatures in bulk RNA-seq data of purified TECs for PWK versus C57BL/6 (c), Foxn1+/– versus wild type (WT) (d), tgBmp4;Foxn1+/+ and Ascl1–/–;Foxn1+/+ versus wild type (e), tgFgf7;Foxn1+/+ and tgFgf7;Foxn1+/− versus wild type (f) and YFP+mCardinal+ versus YFP+mCardinal (h). Wild-type PWK, n = 4; wild-type C57BL/6, n = 3; Foxn1+/−, n = 4; tgBmp4;Foxn1+/+, n = 3; Ascl1−/−, n = 3; tgFgf7;Foxn1+/+, n = 3; tgFgf7;Foxn1−/−, n = 3; YFP+mCardinal+, n = 2; YFP+mCardinal, n = 2. ch, Values beyond the limits of the colour scale were rounded to the nearest limit. Adj., adjusted. g, Schematic illustrating the principle of the YFP/mCardinal dual reporter system. Activity of the Foxn1 promoter at any time point during development results in Cre recombinase expression and permanent activation of YFP expression. Acute activity of the Foxn1 promoter is assessed by the mCardinal (mCard) reporter.

Source data

Given the known role of Foxn1 for the TEC differentiation and maturation processes, we next examined the phenotype of Foxn1+/− heterozygous mice, as Foxn1 haploinsufficiency is known to be associated with a smaller thymus32. We observed a relative increase of early progenitor and cTEC signatures, a less pronounced enrichment of the postnatal progenitor signature, and a depletion of the pan-mTEC signature, compatible with a partial block of TEC differentiation (Fig. 2d, Extended Data Fig. 7b and Supplementary Fig. 5b). Among the mimetic cell signatures, a reduction was observed for skin (Fig. 2d), compatible with the known extrathymic function of Foxn1 in the differentiation of keratinocytes33. However, other mimetic signatures and the Aire-stage mTECs were either unchanged or only marginally affected (Fig. 2d). We conclude that only minor quantitative—but no qualitative—changes occur in the mimetic cell pool in Foxn1-heterozygous mice.

To explore the developmental origin of the mimetic cell compartment further, we used additional models that were suitable for genetic interference with the maturation of the thymic microenvironment. First, we overexpressed Bmp4 under the control of the Foxn1 promoter (tgBmp4;Foxn1+/+) to induce an immature thymic microenvironment34,35, and analysed the relative shifts of mimetic cell signatures in the transgenic thymus at P28. Compared with non-transgenic mice, the signature of the early progenitor population was increased, as expected (Fig. 2e, Extended Data Fig. 7c and Supplementary Fig. 5c). However, changes in the gene signatures of the different mimetic cell populations were again not uniform; ciliated, goblet and muscle cell signatures were enriched, whereas others remained largely unchanged (Fig. 2e), reminiscent of the relative increase in the former group of mimetic signatures at earlier time points in non-transgenic mice (Fig. 1b). Second, using Ascl1–/–;Foxn1+/+ mice, we examined the effect of TEC-specific inactivation of Ascl1, which encodes a neuronal tissue-specific transcription factor36 that is also highly expressed in cells of the postnatal progenitor population (Extended Data Fig. 7d). In keeping with the expression profile of Ascl1 (Extended Data Fig. 7d), we observed a shift among the canonical TEC signatures; the mTEC signature was decreased, whereas the cTEC and both progenitor cell signatures were correspondingly increased. Under these conditions, the greatest reductions among mimetics were observed for neuroendocrine and tuft cell signatures as noted previously6. This finding is indicative of an affinity of these cell types to the neuronal lineage (Fig. 2e, Extended Data Fig. 7c and Supplementary Fig. 5c). Third, we examined the effect of Fgf7 overexpression (via tgFgf7;Foxn1+/+ mice) in the thymic microenvironment20, which in accordance with the expression profile of the cognate Fgfr2 receptor gene (Extended Data Fig. 7d), led to an enrichment of progenitor signatures and a decrease in the mTEC signature20, whereas other signatures were not significantly affected (Fig. 2f, Extended Data Fig. 7e and Supplementary Fig. 5d). Combining the effects of Fgf7 overexpression and Foxn1 haploinsufficiency (via tgFgf7;Foxn1+/− mice) exacerbated the maturation block of TECs, accompanied by a depletion of skin and tuft cell signatures (Fig. 2f, Extended Data Fig. 7e and Supplementary Fig. 5d). Collectively, these results support the notion of unexpected developmental heterogeneity among mimetic cell types and their malleability through extrinsic signals.

Although the precursors of mimetic cells pass through a stage of Foxn1 gene expression (Fig. 1d), their development and/or maintenance may not depend on the activity of the FOXN1 transcription factor itself. To address this possibility, we turned to a triple-transgenic mouse model20, which enabled us to distinguish TECs that were actively transcribing the Foxn1 gene from those that ceased expression of this gene (Fig. 2g). Activation of the Foxn1 gene during any time of TEC development indelibly marks such cells with YFP fluorescence. This is achieved by the combination of two transgenes; the Foxn1 promoter drives the expression of the Cre recombinase, which results in the excision of a stop cassette in the Rosa26 locus to allow YFP gene expression. The additional presence of a Foxn1-mCardinal transgene marks cells with red fluorescence as a sign of acute Foxn1 promoter activity (Fig. 2g). RNA-seq analysis of YFP+mCardinal+ (acutely Foxn1-expressing) TECs and YFP+mCardinal (acutely Foxn1-negative, but with a history of Foxn1 expression) TECs indicated a preponderance of the mTEC gene signature in the former; correspondingly, gene signatures characteristic for cTECs and the two progenitor populations were underrepresented (Fig. 2h, Extended Data Fig. 7f and Supplementary Fig. 6a). Likewise, Aire-stage, enterohepatic, microfold, pancreatic and skin gene signatures were enriched in YFP+mCardinal+ TECs, whereas ciliated, goblet, ionocyte, lung (basal), neuroendocrine and tuft signatures were not grossly changed, or slightly reduced (muscle cell signature) (Fig. 2h and Extended Data Fig. 7f), hinting at a cell-specific effect of FOXN1 transcription factor levels for mimetic cell development.

Foxn1 gene and mimetic cell development

Previously, we showed that two distinct progenitor populations contribute to the thymic microenvironment20, a cTEC-biased early progenitor, and an mTEC-biased postnatal progenitor. Analysis of the cTEC/mTEC ratio via flow cytometry, using Ly51 (also known as CD249) and Ulex europaeus agglutinin-1 (UEA1) as markers for cTECs and mTECs, respectively, indicated a prominent nadir at around 3 weeks of age (P21) (Fig. 3a). We reasoned that this inflection point might mark the transition period during which the TEC compartment becomes less dependent on the early progenitor and more reliant on its postnatal counterpart.

Fig. 3: FOXN1 influences the development of mimetic cells.
figure 3

a, Ratio of Ly51+ (cTEC) and UEA1+ (mTEC) EPCAM+CD45 TECs cells over developmental time. Each point shows the ratio from one thymus. The trend line was determined via LOESS with tenfold cross-validation. The minimum occurs at P21. Error bands show 95% confidence interval around the predicted trend line. b, Multiple sequence alignment of the C-terminal end of FOXN1 protein sequences encoded by exon 2 of the gene in various jawed vertebrate species; identical amino acid residues are shaded. Cm, Callorhinchus milii; Dr, Danio rerio; Xl, Xenopus laevis; Gg, Gallus gallus; Oa, Ornithorhynchus anatinus; Mm, Mus musculus. Sequences obtained from ref. 38. c, Schematic illustrating the coding content of Foxn1 exons and the deleted 3′ region of Foxn1 coding exon 2 in the Δ3ex2 mutant; three exons contribute to the DNA-binding domain as indicated. d, Principal component analysis of bulk RNA-seq samples from purified TECs of Δ3ex2 mutants, collected from mice around P21. e, Numbers of CD4/CD8-double positive thymocytes in thymi of wild-type FVB mice and Δ3ex2 mutants during collapse and recovery phases. n is indicated; each data point is one thymus explant from one animal. P values between groups were derived from pairwise two-sided t-tests with Bonferroni correction after significant (P < 0.05) ANOVA result. Boxes encapsulate the first to third quartile, the line indicates the median and whiskers extend to the furthest point with a distance of up to 1.5 times the interquartile range from the boxes. f, Signature enrichment analysis in bulk RNA-seq data of purified TECs in Δ3ex2 mice (FVB (wild type), n = 6; collapse, n = 9; recovery, n = 15). P values from camera (two-sided, Benjamini–Hochberg adjusted) were log-transformed and multiplied by −1 for upregulated sets, so that positive values indicate upwards directionality and negative values indicate downwards directionality. Values beyond the limits of the colour scale were rounded to the nearest limit. Adj., adjusted.

Source data

When the endogenous Foxn1 gene is replaced in mice by its more ancient paralogue Foxn4, the demarcation of cortex and medulla is impaired37,38. Indeed, domain-swap experiments demonstrated that sequences in the second coding exon of Foxn1 are particularly important to establish the maturation of the epithelial compartment38, and thus may be required for the developmental transition from early to postnatal TEC progenitor activities. A multiple sequence alignment of FOXN1 proteins from representative species of jawed vertebrates indicated that the sequences encoded by the 3′ part of this exon are common to FOXN1 proteins of all jawed vertebrates (Fig. 3b). Accordingly, we generated a mouse strain (Δ3ex2) that expresses a modified FOXN1 protein lacking this domain (Fig. 3c). In this strain, the embryonic and perinatal development of the thymus proceeded normally; however, thymopoietic activity collapsed in the third postnatal week, coincident with the nadir in Ly51/UEA1 ratios. Principal component analysis of transcriptomes of purified TECs collected at this time point separated the mice into two groups (Fig. 3d), mirrored in altered thymic cellularities (Fig. 3e and Extended Data Fig. 8a) that are suggestive of inactive (collapsing) and active (recovering) microenvironments. Enrichment analysis of cell-type-specific signatures indicated that during the phase of diminished thymopoietic activity (Extended Data Fig. 8a and Supplementary Fig. 6b), the epithelial compartment exhibits signs of a bias towards an immature stage, with a prominent enrichment of the early progenitor in conjunction with depletion of mTEC and Aire-stage signatures (Fig. 3f). Of note, with respect to mimetic cell signatures, we found that the signatures of enterohepatic, microfold, pancreatic and skin cells were significantly reduced (Fig. 3f and Extended Data Fig. 8b), again highlighting the association of ongoing mTEC maturation and their development. The mimetic cell composition of thymopoietically active thymi (that is, those in the recovery phase) was essentially indistinguishable from the wild type, accompanied by a slight increase in early progenitor and cTEC signatures (Fig. 3f, Extended Data Fig. 8b and Supplementary Fig. 6c), strongly suggesting that only some mimetic cell types (for instance, those with an enterohepatic signature) arise from the postnatal progenitor population.

Next, we directly examined the possibility that the degrees of dependence on FOXN1 transcription factor activity may substantiate the classification of mimetic cell types. To this end, we isolated TECs from Foxn1-deficient mice. As expected, mTEC and Aire-stage signatures were clearly reduced, alongside enterohepatic, microfold, pancreatic, skin and tuft cell signatures (Fig. 4a, Extended Data Fig. 9a and Supplementary Fig. 7). By contrast, expression levels of signature genes characteristic of other mimetic cells, such as ciliated, goblet and muscle cells, did not differ from those of Foxn1-wild-types, whereas those of ionocytes were increased (Fig. 4a and Extended Data Fig. 9a). The results of RNA ISH experiments confirmed the presence of cellular heterogeneity in the Foxn1-deficient epithelium (Fig. 4b and Extended Data Fig. 9b). For instance, the expression pattern of Foxi1, encoding a key transcription factor for ionocytes39, overlapped at least partially with that of Foxn1 (Fig. 4b). By contrast, cells expressing Foxj1, encoding the key transcription factor of ciliated cells40, were found in areas devoid of Foxn1 gene activity, suggesting that the FOXN1 transcription factor is not required for their development in the thymic anlage41. To directly assess whether the ciliated cells in the Foxn1-deficient thymic rudiment match their counterpart in the Foxn1-sufficient thymic microenvironment, we determined their transcriptional signatures by single-nucleus RNA-seq (snRNA-seq) (Extended Data Fig. 9c). Cells compatible with a ciliated signature formed a singular cluster comprising cells from both genotypes, indicative of highly similar overall transcriptomic landscapes (Fig. 4c). This indicates that the differentiation of ciliated cells in the thymic microenvironment is independent of FOXN1 transcription factor activity. Owing to the distorted differentiation of Foxn1-deficient epithelium (Extended Data Fig. 9c), other TEC subsets, such as ionocytes, are more difficult to reliably compare between genotypes (Fig. 4d and Extended Data Fig. 9d,f). Unlike Foxj1 and Foxi1, expression of Hnf4a, a key transcription factor of the enterohepatic lineage, was not detectable in the Foxn1-deficient epithelium (Fig. 4b), indicating that development of enterohepatic cells5 depends, at least transiently, on FOXN1 transcription factor activity. Thus, whereas most mimetic cell types require Foxn1 activity for their development, some cell types, such as ciliated cells, can develop in the absence of FOXN1.

Fig. 4: Requirement of Foxn1 for mimetic cell development.
figure 4

a, Signature enrichment analysis in bulk RNA-seq data of purified TECs of Foxn1−/− mutants compared with wild-type controls (wild type, n = 3; Foxn1−/−, n = 4). P values from camera (two-sided, Benjamini–Hochberg adjusted) were log-transformed and multiplied by −1 for upregulated sets, so that positive values indicate upwards directionality, and negative values indicate downwards directionality. Values beyond the limits of the colour scale were rounded to the nearest limit. Adj., adjusted. b, Micrographs of thymic rudiments in Foxn1−/− mice after RNA ISH with the indicated probes; top and bottom rows depict consecutive sections. Cells expressing the indicated genes are labelled in blue. Data representative of five mice. Scale bars, 0.1 mm. c, Characterization of the ciliated and tuft cell clusters (see Extended Data Fig. 9c). Each dot represents a cell, with the genotype of origin indicated by colour (left). Expression levels of Foxj1 (middle) and Pou2f3 (right) are provided as log2-normalized counts. d, Contributions of individual samples to the indicated clusters; genotypes are coloured and replicates are identified by different shades.

Source data

Evolutionary history of mimetic cells

To explore the evolutionary history of mimetic cells, we examined the thymus of the cartilaginous fish Scyliorhinus canicula as a representative of the most basal jawed vertebrate group. We examined the presence and location of cells expressing characteristic mimetic genes (Extended Data Fig. 10a and Supplementary Fig. 8) in the thymus (Fig. 5a–c and Extended Data Fig. 10b) by RNA ISH. Similar to the situation in mice2 (Extended Data Fig. 2) and zebrafish7, putative mimetic cells, such as those marked by the expression of TTR (a liver-specific gene42), FOXI1, FOXJ1 and POU2F3 (a marker of tuft cells) were found in the medullary region (Fig. 5d,e and Extended Data Fig. 10c). FOXN1 and its paralogue FOXN4 were expressed in the thymic microenvironment of cartilaginous43 (Fig. 5c and Extended Data Fig. 10c) and bony fishes37.

Fig. 5: Evolutionary trajectory of mimetic cells.
figure 5

a, Macroscopic view of the gill basket of a juvenile S. canicula specimen; gc, gill chamber; sc, spinal chord; H&E, haematoxylin and eosin. b, Higher magnification of the thymus region in a, showing the cortical (c) and medullary (m) structures of the thymus. Images in a,b are representative of n = 3 animals. c, Micrographs of the shark thymus after RNA ISH with FOXN1 (blue). d,e, Micrographs of the shark thymus after RNA ISH with TTR (d) and FOXI1 (e). Images on the right show the medullary regions at higher magnification. ce, Images are representative of n = 2 animals. f, Signature enrichment analysis of mouse Foxn1−/− mutants expressing ancient Foxn1 and Foxn4 genes under the control of the mouse Foxn1 promoter compared against corresponding wild-type controls. tgFoxn4Cm, Foxn4 gene from the cartilaginous fish C. milii (n = 4); tgFoxn1Cm, Foxn1 gene from C. milii (n = 3); tgFoxn1Cm;tgFoxn4Cm (double-transgenic mice, n = 5); wild type, n = 3. The last column represents a comparison of TECs from the two single-transgenic strains. g, Signature enrichment analysis in bulk RNA-seq of whole thymi from foxn1−/− zebrafish (n = 3) compared with foxn1+/+ wild types (n = 3). h, Signature enrichment analysis of mouse Foxn1−/− mutants expressing the Foxn4 gene from the cephalochordate B. lanceolatum (n = 4); wild type, n = 3. i, Left, micrograph depicting a gill filament of Lampetra planeri (H&E staining). thy, thymoid; sl, secondary lamellae. Right, further magnified view of the thymoid, indicating the tissue heterogeneity (H&E staining); the blood vessel is filled with nucleated erythrocytes. Image representative of n = 20 animals. jl, Micrographs of thymoids after RNA ISH with probes specific for CDA1 (j), TTR (k) and MYHC1 (l). Rows depict consecutive sections; images are representative of n = 3 animals. Scale bars: 0.1 mm (a); 0.2 mm (be); 0.1 mm (il).

Source data

We previously hypothesized that a primordial thymopoietic activity of vertebrates was driven by Foxn4, the ancestor of Foxn137,38. We suggest that the emergence of new cell types and organs, such as the liver, in the ancestor common to all vertebrates12, was accompanied by the transition from an invertebrate form of Foxn4 (via a primordial vertebrate version of Foxn4) to its vertebrate-specific paralogue Foxn1 to achieve the necessary adaptation of central tolerance mechanisms. Because genetic manipulation of cartilaginous fishes is not possible, we examined the individual thymopoietic capacities of the shark Foxn1 and Foxn4 genes using a Foxn1-replacement model. In this system, the mouse Foxn1 gene is replaced by other evolutionarily distinct members of the Foxn1/4 gene family38. The TEC compartments driven by the shark Foxn1 and Foxn4 gene of a cartilaginous fish (C. milii), the most basal vertebrate group, are not identical (Fig. 5f and Extended Data Fig. 10d), although in Foxn1;Foxn4 double-transgenic thymi (resembling the physiological situation in the shark thymus), the mimetic cell composition is more similar to the mouse wild type than is either of the two single-transgenic compartments (Fig. 5f and Extended Data Fig. 10d). The results of the reconstitution experiment thus suggest that FOXN4 and FOXN1 transcription factors in the shark thymic epithelium non-redundantly contribute to achieve coverage of peripheral cell types.

We directly tested the non-redundancy by examining the composition of the thymic microenvironment in teleosts lacking the foxn1 gene, the evolutionarily younger paralogue of foxn443. The mimetic signatures in the mutant zebrafish thymus can be categorized into two groups (Fig. 5g); goblet, ionocyte, muscle, lung (basal), neuroendocrine and skin signatures were enriched compared with those representing enterohepatic, microfold, pancreatic, tuft and ciliated cells (Fig. 5g, Extended Data Fig. 11a and Supplementary Figs. 7 and 8). This phenotype is similar to the bias introduced by Foxn4 in the shark replacement model (Fig. 5f). In the teleost thymus, additional aspects of central tolerance formation are also dependent on the evolutionarily younger FOXN1 transcription factor. For instance, expression of Psmb11, which encodes a thymus-specific component of the proteasome and is important for positive selection of CD8+ thymocytes44, is undetectable in the Foxn4-driven microenvironment (Extended Data Fig. 11b), in line with the requirement of FOXN1 for Psmb11 expression in the mouse thymus45. Moreover, despite the fact that the Aire-stage cell signature remains unchanged (Fig. 5g), the expression levels of aire itself are nonetheless reduced in the foxn1-deficient microenvironment (Extended Data Fig. 11b). Collectively, these data suggest that the contribution of foxn1 to central tolerance formation with respect to supporting the presence of mimetic cells appears to be biased towards evolutionarily more recent tissue and cell types.

To substantiate this conclusion, we examined the thymopoietic activity of the Foxn4 gene of amphioxus (Branchiostoma lanceolatum). Amphioxus is a representative of cephalochordates, a distant relative of vertebrates without adaptive immunity. In the Foxn1-replacement model, the microenvironment driven by B. lanceolatum Foxn4 supports T cell development only up to the CD4/CD8-double positive stage, and these mice exhibit autoimmune phenomena38, compatible with the pre-adaptive function of this transcription factor. Gene signature analysis of reconstituted TECs revealed an enrichment of the early progenitor and cTEC signatures, and a paucity of mTEC and Aire-stage cells (Fig. 5h, Extended Data Fig. 11c and Supplementary Fig. 7), indicative of blocked maturation of the epithelium. Notably, whereas gene signatures for ciliated, goblet, ionocyte and muscle cells were not significantly different from those of wild-type TECs, we noted a distinct reduction of other mimetic cell signatures, which in the mouse environment likely causes incomplete representation of peripheral self antigens. Indeed, with the exception of the notable presence of the characteristic cTEC signature, the pattern in B. lanceolatum Foxn4 transgenic mice (Fig. 5h) is essentially indistinguishable from that of Foxn1-deficient mice (Fig. 4a), in which T cell development is completely blocked and hence no autoimmunity is observed. We conclude that also in this evolutionary reconstruction, the group of ciliated, goblet, ionocyte and muscle cell behaves distinctly differently from the other mimetic cell types, revealing an evolutionarily plausible trend of stepwise addition of mimetic cell types. The development of certain mimetic cells, such as the enterohepatic group, are clearly dependent on the formation of the mTEC compartment, the full maturation of which is driven by FOXN1. This group of mimetic cells represents cell types that are vertebrate-specific innovations (no liver is detectable outside the vertebrate lineage12), whereas other mimetic cell types, such as goblet, muscle and others, are representatives of evolutionarily more ancient cell types and thus may have been added to the portfolio of mimetic cells at an earlier stage of vertebrate evolution.

Considering the present results and earlier work2,7, peripheral antigen expression in the thymus appears to be a feature of all jawed vertebrates, ranging from sharks to humans. To explore the evolutionary origin of this central tolerance mechanism further, we turned our attention to lampreys, the best-studied representative of jawless vertebrates, the sister group of jawed vertebrates. To this end, we determined whether cells that express peripheral (tissue-restricted) antigens are present in the thymoid, the thymus equivalent of lampreys, which is situated at the tip of gill filaments and distal secondary lamellae. Although the anatomical structure of the thymoid in lampreys is less well studied than that of the thymus of jawed vertebrates, it is clearly a structured tissue (Fig. 5i). The expression of CDA1, which encodes a cytidine deaminase, identifies developing lamprey T-like cells in the distal (outer) area of the thymoid46, a localization that is reminiscent of the thymic cortex of jawed vertebrates (Fig. 5j). By contrast, cells that express the TTR gene, which is most highly expressed in the lamprey liver47, are predominantly situated in the inner region of the thymoid (Fig. 5k). We also detected cells that express the gene for myosin heavy chain (Myhc1; Fig. 5l). Thus, our results raise the intriguing possibility that intrathymic tolerogenic mechanisms, including self-antigen-displaying mTECs and/or thymic mimetic cells, are present in both jawless and jawed vertebrates.

Conclusion

In mice, AIRE-mediated peripheral antigen expression by thymic epithelial cells22, the activity of Fezf2 in mTECs23, and the presence of mimetic cells2 in the thymic microenvironment non-redundantly contribute to the formation of a self-tolerant T cell repertoire. The Aire gene is found only in the genomes of jawed vertebrates48 and is not present in jawless vertebrates; by contrast, Fezf2 is a pan-vertebrate-specific gene (a paralogue of the evolutionarily ancient Fezf1 gene49) found in both jawed and jawless vertebrates (Extended Data Fig. 11d). These findings indicate the possibility that Fezf2-driven peripheral self-antigen expression in the thymus might have emerged already in the ancestor common to all vertebrates, before AIRE-specific functions were added in jawed vertebrates. Thus, the tolerogenic capacity of the thymus might have evolved in successive steps, mirrored in the presence or absence of tolerogenic factors and the distinct developmental sequence of mimetic cell types.

In mice, FEZF2 and AIRE regulate only partially overlapping aspects of peripheral antigen expression4,23. Both tolerogenic factors—FEZF2 and AIRE—are required for the differentiation of some mimetic cells2,4. FEZF2 appears to be particularly relevant for the differentiation of the enterohepatic lineage of mimetic TECs4; notably, in mTECs of Fezf2-deficient mice, Ttr, which encodes transthyretin, is the most highly downregulated gene23. The presence of TTR-expressing cells in the thymoid is compatible with the control of intrathymic TTR gene expression by FEZF2 rather than AIRE (which is absent from the lamprey genome). At present, the mechanistic underpinnings of antigen receptor repertoire development in the lamprey T cell lineage50 are largely unexplored, complicating attempts at establishing the functional importance of peripheral antigen expression in the lamprey thymoid. Further work is required to establish the frequency and distribution of cells expressing different peripheral antigens in the gill filaments and to clarify the mechanism by which developing T cells gain access to these antigens. Nonetheless, our results reveal an unexpected similarity in tolerogenic traits between the two sister groups of vertebrates. Despite species-specific variations7, the exposure of developing thymocytes to peripheral tissue antigens thus emerges as a general component of vertebrate adaptive immune systems.

Methods

Animals

C57BL/6 mice were maintained at the Max Planck Institute of Immunobiology and Epigenetics. Foxn1−/− (ref. 27), Foxn1:cre52, Rosa26-LSL-EYFP53, Foxn1:mCardinal54, Ascl1fl/fl (ref. 55), Foxn1:BlFoxn4, Foxn1:CmFoxn4, Foxn1:CmFoxn138 and Foxn1:Fgf720 transgenic mice have been described previously. The Foxn1:Bmp4 transgene was created by T. Schlake and B.K. by inserting a cDNA fragment corresponding to nucleotides 497–1729 in GenBank accession number NM_007554.3 as a NotI fragment into pAHB1434. The Δ3ex2 Foxn1 deletion mutant (internal designation Chi6) transgene was generated by deletion of nucleotides 504–728 of mouse Foxn1 cDNA (Genbank accession number NM_008238.2) and insertion as a NotI fragment into pAHB1434. To generate transgenic mice, constructs were linearized and injected into FVB pronuclei according to standard protocols. The Foxn1Δ3ex2 mice were bred to a Foxn1-deficient background. Genotyping information is summarized in Supplementary Table 2. Mice carrying the original nu mutation (CByJ.Cg-Foxn1nu/J) were purchased from Charles River and used for snRNA-seq experiments. Mice were analysed at the age of 4–6 weeks, unless otherwise stated.

The zebrafish line carrying an internal deletion of the foxn1 gene has been described56. Adult zebrafish (3 months of age) were used for experiments.

Mice and zebrafish were kept in the animal facility of the Max Planck Institute of Immunobiology and Epigenetics under specific pathogen-free conditions (mice: 14 h light, 10 h dark; temperature 22 ± 2 °C; relative humidity 55 ± 10%; zebrafish: 13 h light, 11 h dark, water temperature 28 °C). All animal experiments were performed in accordance with the relevant guidelines and regulations, approved by the review committee of the Max Planck Institute of Immunobiology and Epigenetics and the Regierungspräsidium Freiburg, Germany (mice: licenses 35–9185.81/G-12/85; 35–9185.81/G-16/67; zebrafish: license 35–9185.81/G-14/41). All strains are made available from the corresponding author upon request, subject to standard material transfer agreements.

Ammocoete larvae of L. planeri (body length, 8–10 cm) were caught by a licensed fisherman from the wild in the Freiburg region (Riedgraben, March-Neuershausen) under permission by the local governmental authority (Landratsamt Breisgau-Hochschwarzwald, license 420.1.13-2024-034414). Juvenile cat shark (S. canicula) specimens were kindly supplied by Markéta Kauka (Max Planck Institute for Evolutionary Biology, Plön, Germany); juvenile bamboo sharks (Chiloscyllium punctatum) were purchased from a local pet shop. Upon arrival at the laboratory, lampreys and sharks were euthanized using 0.02% tricaine methanesulfonate. For RNA isolation, tissues were removed under a dissection microscope and dissolved in TRI reagent (T9424, Sigma-Aldrich); for histological studies, dissected tissues were fixed in 4% neutral formalin.

Sample sizes were based on our experience and accepted practice in the respective fields, balancing statistical robustness, resource availability and animal welfare. No statistical methods were used to predetermine sample size. Provided the transgenic status and age matched the experimental requirements, mice were randomly assigned to experimental groups, irrespective of sex. Blinding was not possible because the thymus phenotype—the transgenic status of the respective mouse—is evident from flow cytometry, imaging analysis or genotyping information.

RNA in situ hybridization

Tissues were fixed with modified Davidson’s fluid and dehydrated with Li-ethanol and ethanol in a stepwise manner and finally embedded in paraffin; RNA ISH on paraffin sections was performed using DIG-labelled probes57. Double ISH was carried out as follows. DIG- and fluorescein-labelled RNA antisense probes were simultaneously hybridized to RNA in tissue sections. The DIG-labelled probe was detected first, either with an alkaline-phosphatase-conjugated anti-DIG antibody (1:2,000 dilution in maleic acid buffer (MAB); 100 mM maleic acid, pH 7.5, 150 mM NaCl, 2 mM Levamisol, 1% blocking reagent (Roche), 0.1% Tween-20) for chromogenic detection, or with an anti-DIG-POD antibody (1:300 dilution in MAB) for fluorescent detection. The presence of the DIG-hapten was revealed by staining with BM Purple (Roche) or by Cy3 fluorescence, using the Tyramide Signal Amplification Plus system (AkoyaBioscience). The sections were washed several times in PBS; the fluorescein-labelled probe was detected by a peroxidase-conjugated anti-Fluorescein-POD (1:300 dilution in MAB) and revealed by Cy5 fluorescence.

Sequence coordinates in GenBank accession numbers for the probes were as follows:

Image analysis

Images were acquired on Zeiss microscopes (Axioplan 2) equipped with an Mrc 5 camera; in some figure panels, Cy5 signals were converted to false (yellow) colour for better visualization.

Flow cytometry and cell sorting

Single-cell suspensions of TECs for preparative flow cytometry were obtained as described31,58. Thymic epithelial cells have the surface phenotype EPCAM+CD45; thus, cell surface staining was performed using anti-EPCAM (G8.8), conjugated with APC (1:1,000, BioLegend) or anti-EPCAM (G8.8), conjugated with biotin (1:1,000, BioLegend), in combination with streptavidin, conjugated with eFluor 450 (1:1,000, eBioscience), and anti-CD45 (30-F11), conjugated with PE Cy7 (1:2,000, BioLegend) at 4 °C in PBS supplemented with 0.5% BSA and 0.02% NaN3. In order to differentiate cells with past and acute Foxn1 expression, triple-transgenic Foxn1:cre; Rosa26-LSL-eYFP; Foxn1:mCardinal mice were used for cell sorting. Cells with past expression of Foxn1 were sorted as EPCAM+YFP+ mCardinal cells, whereas cells with acute Foxn1 expression were sorted as EPCAM+YFP+ mCardinal+ cells. EPCAM+CD45 cells (after negative enrichment using anti-CD45 magnetic-activated cell sorting (MACS) beads and anti–Ter-119 MACS beads, Miltenyi Biotec) were sorted directly into TRI reagent (T9424, Sigma-Aldrich). Cell sorting was carried out using the MoFlow instrument (Dako Cytomation-Beckman Coulter) controlled with the Summit (5.5) software. Analytical flow cytometry was performed for TECs as follows: anti-EPCAM (G8.8), conjugated with APC (1:1,000, BioLegend); anti-Ly51 (alias BP-1; 6C3), conjugated with PE (1:1,600, eBioscience); UEA1, conjugated with FITC (1:1,000, Vector Labs) or UEA1, conjugated with biotin (1:600, Vector Labs), in combination with streptavidin, conjugated with eFluor 450 (1:1,000, eBioscience). When analysis of haematopoietic fractions was desired, thymocyte suspensions were prepared in parallel by mechanical liberation, best achieved by gently pressing thymic lobes through 40 μm sieves. Cell surface staining (anti-CD45 (30-F11), conjugated with PE/Cy7 (1:2,000, BioLegend); anti-CD4 (GK1.5), conjugated with FITC (1:1,000, BioLegend); anti-CD8a (53-6.7), conjugated with APC (1:800, eBioscience); anti-TCRβ (H57-597), conjugated with PE (1:400, eBioscience); anti-CD19 (eBio1D3), conjugated with PerCP/Cy5.5 (1:500, eBioscience) or PE/Cy7 (1:1,000, eBioscience); anti-B220 (alias CD45R; RA3-6B2), conjugated with biotin (1:200, eBioscience); anti-IgM (II/4.1), conjugated with PE (1:300, eBioscience), anti-CD93 (alias C1qRp; AA4.1), conjugated with APC (1:300, eBioscience); streptavidin conjugated with eFluor 450 or FITC (1:1,000, eBioscience)) was performed at 4 °C in PBS supplemented with 0.5% BSA and 0.02% NaN3. Flow cytometry experiments were evaluated using FACSDiva (8.0.2) and FlowJo (9.3.1) software. The relevant gating strategies20,37,38 are shown in Supplementary Fig. 4.

Isolation of cell nuclei from thymus tissue

Thymus tissues were recovered under a dissection microscope; thymocytes were mechanically liberated by applying gentle pressure on the tissue on a 40-μm sieve and extensive washing to deplete as many haematopoietic cells as possible. For nude mice, thymic rudiments from two individuals were pooled to constitute one sample. The resulting tissue remnants were transferred into a 1.5 ml Eppendorf tube containing 150 µl of freshly prepared ice-cold lysis buffer (10 mM Tris-HCl pH 8, 10 mM NaCl, 3 mM MgCl2, 0.1% non-denaturing detergent Igepal CA-630, 0.1% Tween-20, 1% BSA), supplemented with 1 U μl−1 Protector RNase inhibitors (Roche, 3335402001). Tissues were homogenized using a RNAse-free disposable pestle for 1.5 ml tubes (Fisher Scientific, 12141364) with a circular motion until the majority of clumps were homogenized. Nuclei quality was assessed under a phase-contrast microscope. Subsequently, 500 µl of ice-cold wash buffer (10 mM Tris-HCl pH 8, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, 1% BSA) with 0.2 U μl−1 of Protector RNase inhibitors was added, mixed by inversion, and spun briefly to collect liquid from the cap of the tube. The nucleus suspension was then filtered through a 70 µm Flowmi Cell Strainer (H13680-0070, Bel-Art) and centrifuged at 300g for 5 min at 4 °C. The supernatant was removed, and the nuclear pellet was resuspended in 200 μl of 0.5% BSA in PBS, supplemented with 0.2 U μl−1 of Protector RNase inhibitors and filtered again through a 40-µm Flowmi Cell Strainer (H13680-0040, Bel-Art). Nuclei were quantified by Trypan blue staining using the Countess 3 automated cell counter (Thermo Fisher Scientific). Only Trypan blue-positive nuclei were counted (to exclude fat droplets). Each nucleus suspension was normalized to a concentration of 800 nuclei per µl using the resuspension buffer.

PCR with reverse transcription

RNA was isolated from total thymus tissue using TRI reagent (T9424, Sigma-Aldrich). Thymus tissue was isolated from specimens of unspecified sex from juvenile brown-banded bamboo shark (C. punctatum)13 and foxn1+/+ and foxn1−/− adult zebrafish56 on the ikzf1:EGFP background43. For each RNA extraction from zebrafish thymi, three organs were pooled and a total of three such pools were processed. RNA was quantified using the Qubit RNA HS Assay Kit (ThermoFisherScientific, Q32852) and the Qubit 4 Fluorometer (ThermoFisherScientific, Q33226). RNA quality was checked by determining the 18S/28S rRNA ratio using the Fragment Analyzer RNA Kit (ThermoScientific, DNF-471-0500) and the 5200 Fragment Analyzer System (ThermoScientific, M5310AA). cDNA was prepared using random hexamers and the SMARTScribe Reverse Transcriptase (Clontech).

The primers used for amplification from bamboo shark cDNA and the respective amplicon sizes are listed in Supplementary Table 3; the primers used for amplification from zebrafish cDNA and the respective amplicon sizes are listed in Supplementary Table 4. In the PCR reaction, primers were used at a concentration of 10 pmol μl−1. For comparative analyses, the amounts of input cDNA were calibrated by comparison to zebrafish ef1a expression levels. Amplicons, independently generated for three thymic samples each per genotype, were pooled prior to gel electrophoresis. For gel source data, see Supplementary Fig. 8.

Bulk RNA-seq of TECs

TECs derived from mouse thymus were sorted directly into TRI reagent (T9424, Sigma-Aldrich), while for zebrafish total thymus tissue was used. RNA isolation was performed according to standard protocols. RNA was quantified using the Qubit RNA HS Assay Kit (ThermoFisherScientific, Q32852) and the Qubit 4 Fluorometer (ThermoFisherScientific, Q33226). RNA quality was checked by determining the 18S/28S rRNA ratio using the Fragment Analyzer RNA Kit (ThermoScientific, DNF-471-0500) and the 5200 Fragment Analyzer System (ThermoScientific, M5310AA). For each zebrafish library, RNA from three animals was pooled and three such pools were sequenced. Libraries were prepared using the Ultra RNA Library Prep Kit (NEB). Samples were run on Illumina HiSeq 2500, HiSeq 3000 or NovaSeq 6000 instruments and sequenced to a depth of 10 × 106 to >150 × 106 reads per sample. Transcriptomes were analysed on the Galaxy platform59 using Trim Galore! version 0.4.3.1 (developed by Felix Krueger at the Babraham Institute), HISAT2 version 2.1.060 and featureCounts version 1.6.1.061.

snRNA-seq of thymic tissue

The Chromium GEM-X Single Cell 3′ v4 protocol (CG000731, Rev B) was followed starting from step 1.1 according to the manufacturer’s guidelines. The Chromium GEM-X Single Cell 3′ Kit v4 (PN-1000686) and the Chromium GEM-X Single Cell 3′ Chip Kit v4 (PN-1000690) were used to process each sample. A total of 36.6 µl of normalized nucleus suspension (see ‘Isolation of cell nuclei from thymus tissue’) was added to the master mix to target a recovery of 20,000 nuclei. The ready GEM-X chip was loaded onto a 10x Genomics Chromium Xo instrument, and final libraries were completed as per 10x Genomics guidelines. Libraries were quantified using the Qubit High Sensitivity DNA Assay (Invitrogen, Q32851), and their molar concentrations were calculated on the basis of their size distribution using the Fragment Analyzer NGS 1–6,000 bp High Sensitivity DNA kit. Libraries were pooled, cleaned of adapter dimers and denatured according to Illumina guidelines. Libraries were sequenced in paired-end mode with a read length of 10 × 100 × 100 × 10 bp (i5 index, R1, R2, i7 index) on a NovaSeq 6000 instrument (Illumina), aiming for a minimum sequencing depth of 20,000 read pairs per nucleus.

Analysis of snRNA-seq data

Counts were generated with 10x Genomics Cell Ranger (8.0.1) and background noise was reduced with CellBender (0.3.0, --fpr 0.01)62. Droplets were excluded if they exhibited low total UMI counts (<500), high proportions of mitochondrial counts (>5%), or abnormal numbers of detected genes (beyond 1 median absolute deviation (MAD) from the median) or complexity (ratio of detected genes over total count; beyond 3 MADs from the median). Doublets were removed with scDblFinder63. After initial dimensionality reduction and clustering of the data, each cell was scored for mimetic and background marker gene sets with AUCell21. Clusters predominantly comprising cells scoring highly for signatures compatible with expected contaminants (thymocytes and adipocytes for control and nude samples, respectively) were excluded from further analysis. The reduced dataset was clustered again. Parameters were selected to maximize the mean silhouette width. Subclustering was performed on clusters selected manually based on their mean silhouette width, again performing parameter sweeps and selecting parameters based on the total within cluster sum of squares and mean silhouette widths of subclusters. The final clusters were annotated by investigation of dominant signature scores and marker genes.

Assessment of mimetic signatures in bulk RNA-seq data

Bulk RNA-seq data from all samples were processed jointly. Genes were included if their counts exceeded 5 in at least 50% of samples. Differential expression of genes between samples was assessed with voom64 and limma with treat(., lfc = log2(1.2), robust = TRUE)65,66 to generate the t-statistic. Relative enrichment of gene signatures was determined with the competitive gene set test camera25 with cameraPR, employing the t-statistic as output by treat as the ranking statistic. The output of camera is a two-sided and Benjamini–Hochberg adjusted P value. Results are reported as log10(adj. P), conditionally multiplied by −1 for signatures with an upwards direction of change. Although we were mainly interested in TEC signatures2,20, we also collected background signatures of unrelated cell types from PanglaoDB67, Tabula Muris68 (bladder, spleen, kidney, liver, marrow, muscle, lung, non-myeloid brain, from FACS) and MSigDB69,70 (M8: mouse cell-type signature gene sets, except for Tabula Muris Senis signatures). From the collection of signatures, those that comprised up to 10% less or 10% more genes than the smallest and largest TEC or mimetic signature, respectively, were retained as background signatures (see Supplementary Fig. 1). To investigate mouse-derived signatures in RNA-seq samples from D. rerio, we performed orthology transfer using data from the Ensembl database (release 112)71,72, allowing many-to-many relationships.

Identification of mimetic cells in scRNA-seq data

Signature gene sets for mimetic cells were derived from single-cell differential gene expression results from Supplementary Table 2 in Michelson et al.2. The gene lists were filtered (adjusted P < 0.01, |log2(FC)| >1, expressed in at least 10% of cells) and used as input to AUCell21 to score signatures in each cell. Based on the resulting histograms of areas under the curve (AUCs), thresholds were defined manually for each mimetic signature (see Extended Data Fig. 1a). A cell that met thresholds for multiple signatures was disambiguated by ordering all eligible signatures and choosing the highest-ranked one, first by the cell’s rank within the signature AUCs (that is, a signature is ranked higher if the cell’s position on the histogram is further to the right), and second (in case of ties) by the z-score standardized AUC. The signatures ‘Tuft1’ and ‘Tuft2’, as well as ‘Skin_basal’ and ‘Skin_keratinized’ led to the identification of an overlapping set of cells which were therefore collapsed into unified Tuft and Skin populations (Extended Data Fig. 1b). Using the final signatures, mimetic cells were identified in the uniform manifold approximation and projection (UMAP) graphs calculated from single-cell datasets20.

Shifts in population composition were evaluated with scCODA24. The model was run multiple times, using each of the major populations (early and postnatal progenitors, mTEC, cTEC, Aire-stage and unassigned) as the reference. Detected changes were minimal for the ‘unassigned’ population and it was therefore used as the final reference. A population was deemed ‘overall credibly changed’ if more than half the models detected a credible change.

Lineage tracing of mimetic cell lineages

CRISPR–Cas9 barcodes for single cells20 were binarized (barcode present or absent) and a logistic regression model was fitted with fixed covariates ‘signature’ and ‘sample’. A P value for the ‘signature’ term was determined by likelihood ratio test with a reduced model. Proportions of Foxn1-expressing cells between populations were compared with a binomial test, implemented in the findMarkers function from the scran package73.

Determination of fate probabilities in scRNA-seq data

scRNA-seq data from barcoded and unbarcoded samples from three time points (embryo, newborn, 4 week) were integrated with fastMNN74. The integrated dataset was re-clustered and clusters with expression of Pth/Chga (ectopic parathyroid) and Cd3/Cd4/Cd8a (thymocytes) were excluded. Diffusion pseudotime was calculated, starting from the cell with the highest AUC for the early progenitor signature in the embryonic sample75. The pseudotime and integrated representation of the data were used for generation of a PseudotimeKernel and ConnectivityKernel with CellRank29,30, respectively. Both kernels were combined and analysed with a GPPCA estimator76 to identify between 10 and 16 macrostates (gppca.fit(n_states = [10, 16])). The optimal number of macrostates based on the minChi criterion76 was 12. Macrostates were manually labelled according to their composition of annotated cell types and defined as initial (early progenitors) or terminal (Aire-stage and mimetic populations) states. Intermediate states were not retained. Fate probabilities were calculated with the remaining macrostates and the overlap of cells exceeding fate probability thresholds in pairs of fates was quantified by the Jaccard index. Because the automatically determined macrostates did not reflect all mimetic populations, we additionally ran CellRank after manually identifying terminal cells, picking representatives with the highest signature AUC for each population of interest. Fate probabilities and Jaccard indices were also calculated for these data.

Data handling and statistics

Box plots encapsulate the first to third quartile, a line indicates the median. Whiskers extend to the furthest point with a distance of up to 1.5 times the interquartile range from the boxes. For the data analyses presented in Figs. 2a,b and 3e and Extended Data Figs. 6 and 8a, differences in means between groups were compared by ANOVA. Separate ANOVAs were conducted for each dependent variable. The resulting P values were corrected for multiple testing with the conservative Bonferroni method. For tests with adjusted P < 0.05, we rejected the null hypothesis of equal means and performed pairwise two-sided t-tests between all (Figs. 2a,b and 3e and Extended Data Fig. 6a) groups, or between selected groups (Extended Data Fig. 8, F2 samples only). P values from pairwise t-tests were corrected for multiple testing with the conservative Bonferroni method. For Extended Data Figs. 2d and 4d, groups were compared via a two-sided t-test without (Extended Data Fig. 2d) or with (Extended Data Fig. 4d) Welch correction (GraphPad Prism 9.5.1). For LOESS (Fig. 3a), we tested span parameters between 0.4 and 0.9 in increments of 0.01 with tenfold cross-validation. The minimal mean squared error was achieved with a span of 0.53. The numbers of biological replicates are indicated in the figure panels or figure legends.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.