Main

The stability of somatic cell identities is essential for the coordination of specialized organ systems, and aberrant deviations from differentiated states can lead to disease4. However, alterations in somatic cell fates can promote tissue repair and enable adaptation to changing microenvironments1. How functional variation in a somatic lineage is constrained or promoted to regulate the balance between cellular stability and plasticity remains unclear. Here we investigate the underlying mechanisms in medullary thymic epithelial cells (mTECs), which express nearly the entire coding genome and adopt states that mirror specialized cells in disparate tissues for the selection of T cells that can protect the host but remain tolerant to self-constituents3,5,6. The breakdown of this selection in autoimmune polyglandular syndrome type 1 revealed the transcriptional activator AIRE to be an important determinant of thymic epithelial plasticity7. However, chromatin accessibility and transcriptional initiation at AIRE-regulated loci do not require AIRE8,9, indicating that there is an orthogonal mechanism that poises mTECs for cellular plasticity. We investigated the nature of this mechanism in individual mTECs by jointly profiling their transcriptome and chromatin accessibility landscapes along their developmental trajectory (Fig. 1a).

Fig. 1: Chromatin accessibility noise is associated with ectopic transcription.
figure 1

a,b, A scATAC-seq uniform manifold approximation and projection (UMAP) (b) visualization of 9,928 mTECs from Multiome profiling (a); the colours show the cluster annotation. c, Aire expression data overlaid on the UMAP. d, Number of expressed αTSGs overlaid on the UMAP. e, The indicated αTSG expressions overlaid on the UMAP. f, Sum expression of each αTSG (n = 3,184) in each UMAP cluster (colours as in a) as a fraction of the total. g, Histogram of minimum false discovery rate (FDR) values of differentially accessible scATAC-seq peaks (two-sided Mann–Whitney–Wilcoxon tests) for cells expressing a particular αTSG (αTSGpos) versus not expressing it (αTSGneg) (black, n = 3,184 αTSGs) or early AIRE+ versus immature mTECs (red arrow, Pmin = 8.04 × 10−17). h, Histogram of significant peaks detected (FDR ≤ 0.1) from αTSGpos versus αTSGneg (black) or early AIRE+ versus immature mTECs (red arrow, n = 35,204 peaks) comparisons. i, Schematic of differential chromatin accessibility analysis visualized as heatmaps at tissue-specific loci between αTSGpos versus αTSGneg mTECs. j, Heatmaps of differential OOP (left) or WIP (right) accessibility at indicated αTSGs between αTSGpos versus αTSGneg mTECs. k, Heatmaps of differential OOP (left) or WIP (right) accessibility at all 3,184 αTSGs detected between αTSGpos versus αTSGneg mTECs. l,m, Aggregate ratios of OOP or WIP scATAC-seq fragments from αTSGpos versus αTSGneg mTECs (l) or MIGpos versus MIGneg mTECs (m) (Extended Data Fig. 1u) at the indicated loci. n, scATAC-seq fragment size distributions from mature mTECs. o, Histogram of indicated P-values (two-sided likelihood ratio tests) for the probability of each αTSG being expressed as a function of local OOP or WIP fragments. p, Distribution of the logistic regression coefficient P-value ratios (OOP/WIP) from o. Ratio = 1 (black dotted line); geometric mean ratio = 0.007 (red dotted line, the indicated P-value from one-sided Mann–Whitney U-test). q, Volcano plot of regression coefficients β versus the P-values from o. Number of P-values ≤ 0.1 for non-zero coefficients β from out-of-peak (red) or within-peak (blue) analyses. r, Cumulative distribution of probabilities that within a αTSGpos versus αTSGneg mTEC, another αTSG within 50 kb (local) or a random distance is expressed (for 3,184 αTSGs). P-value (local αTSGpos versus αTSGneg) from one-sided Mann–Whitney–Wilcoxon test.

Source Data

Epigenetic noise is linked to plasticity

To identify the molecular levers that give rise to mTEC plasticity, we used the 10X Genomics Chromium Multiome platform on mTECs from a four-week-old C57BL/6 mouse that were sorted by fluorescence-activated cell sorting (FACS). We obtained 9,928 high-quality cells after quality-control filtering and exclusion of further differentiated ‘mimetic’ mTECs that extinguished Aire expression5,6 to focus on AIRE-independent mechanisms that poise mTECs for somatic plasticity (Fig. 1a, Extended Data Fig. 1a–h and Supplementary Fig. 1). We delineated a developmental trajectory consisting of five clusters of cells that corresponded well with the known mTEC developmental program10 (Fig. 1b,c and Extended Data Fig. 1i–p). We next investigated the role of chromatin accessibility states in promoting mTEC plasticity and found that the expression of nearly all AIRE-dependent tissue-specific genes (αTSGs) was not confined to any particular chromatin state across the developmental axis (Fig. 1d–f and Extended Data Fig. 1q–t). Moreover, we rarely found differentially accessible peaks between mTECs expressing (αTSGpos) versus not expressing (αTSGneg) a particular αTSG (Fig. 1g,h), indicating that the repertoire of accessible elements does not confer the potential for ectopic transcription.

However, we often noticed differences in Tn5 inserts outside the single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) peaks at regions flanking αTSGs between αTSGpos and αTSGneg mTECs (Fig. 1i–k). The differential out-of-peak (OOP) signal (defined by the same union peak set applied to all cells) accumulated symmetrically about the transcriptional start site (TSS) across a characteristic length scale of around 100 kilobases (kb), eventually converging with the differential within-peak (WIP) signal to a common normalized baseline (Fig. 1l). The differential OOP signal was not observed at AIRE-independent loci induced during mTEC maturation, indicating that it is not a general feature of active chromatin (Fig. 1m and Extended Data Fig. 1u). Furthermore, we found that OOP scATAC-seq fragments consisted of longer nucleosomal lengths than WIP fragments (Fig. 1n), which is indicative of nucleosome-dense regions becoming more labile for Tn5 integration. The OOP ATAC-seq fragments are generally considered to be ‘noise’, and the fraction of reads within peaks is commonly used as a metric for ATAC-seq signal enrichment11,12. These results led us to hypothesize that the differential OOP signal is predictive of ectopic gene expression and indicates destabilization of chromatin barriers (Extended Data Fig. 1v).

To test this hypothesis, we performed a series of logistic regressions. We fit the probability (P) of expressing an αTSG to the normalized OOP fragments within the characteristic length scale of destabilized chromatin (L ≈ ±50 kb). We also included the number of scATAC-seq fragments (nFrags) in each cell as a regression covariate to control for the extent of sampling per cell:

$$P({\rm{\alpha }}{\rm{TSG}}| {\rm{OOP}}) \sim \frac{1}{1+{{\rm{e}}}^{-({\beta }_{0}+{\beta }_{{\rm{OOP}}}^{\pm L}{\log }_{10}({\rm{OOP}}+1)+{\beta }_{N}{\log }_{10}({\rm{nFrags}}+1))}}.$$

As a comparative association, we fit the probability of expressing the αTSG to the normalized scATAC-seq WIP fragments:

$$P({\rm{\alpha }}{\rm{TSG}}| {\rm{WIP}}) \sim \frac{1}{1+{{\rm{e}}}^{-({\beta }_{0}+{\beta }_{{\rm{WIP}}}^{\pm L}{\log }_{10}({\rm{WIP}}+1)+{\beta }_{N}{\log }_{10}({\rm{nFrags}}+1))}}.$$

We then profiled the magnitudes, directions and statistical significance of the fit regression coefficients for local OOP \(({\beta }_{{\rm{OOP}}}^{\pm L})\) and WIP \(({\beta }_{{\rm{WIP}}}^{\pm L})\) fragments. The regression coefficients \({\beta }_{{\rm{OOP}}}^{\pm L}\) for OOP fragments exhibited high levels of statistical significance compared with the coefficients \({\beta }_{{\rm{W}}{\rm{I}}{\rm{P}}}^{\pm L}\) for WIP fragments, with the regression P-values for \({\beta }_{{\rm{OOP}}}^{\pm L}\) being around 140-fold smaller than those for \({\beta }_{{\rm{WIP}}}^{\pm L}\) for the same αTSGs. These results indicate that variation in local OOP fragments was a much more reliable predictor of ectopic transcription than local WIP fragments (Fig. 1o,p). Furthermore, statistically significant regression coefficients \({\beta }_{{\rm{OOP}}}^{\pm L}\) for OOP fragments were almost always greater than zero, meaning that an increase in OOP fragments was associated with a higher probability of αTSG expression when controlling for the extent of sampling per cell (Fig. 1q). Moreover, the expression of a given αTSG by an mTEC (αTSGpos) substantially increased the likelihood of expressing a neighbouring αTSG within the same 100-kb region of destabilized chromatin compared with the likelihood in αTSGneg cells, at a significance level (P = 9.15 × 10−166) that was far greater than the difference in the likelihood of expressing a random subset of αTSGs (P = 4.34 × 10−4) (Fig. 1r). Together, these results indicate that enhanced fluctuations in background chromatin accessibility are strongly predictive of ectopic expression of the local tissue-specific genes.

Epigenetic noise is AIRE-independent

To identify when in mTEC development chromatin accessibility noise becomes amplified, we followed the proportion of scATAC-seq fragments within peaks genome-wide (the WIP fraction) and found a prominent decrease at the early mature stage that was maintained through the later stages (Fig. 2a,b and Extended Data Fig. 2a–c). This decrease was not dependent on the magnitude of the transcriptome detected, nor the number, size or significance of peaks called (Extended Data Fig. 2d–l). Furthermore, we did not observe similar decreases in WIP fraction in quiescent versus cycling cells from Multiome datasets of embryonic day 18 (E18) mouse brain, indicating that this decrease was not a general feature of postmitotic cells (Extended Data Fig. 2m–o).

Fig. 2: Chromatin accessibility noise is AIRE-independent and associated with p53 repression.
figure 2

a, Fraction of scATAC-seq fragments within scATAC-seq peaks (WIP) across the mTEC developmental axis defined in Fig. 1b. b, Violin and box (median, 25th and 75th percentile; whiskers show 1.5 × the interquartile range) plots depicting the distributions of the fraction of scATAC-seq WIP fragments across annotated clusters (n = 9,928: immature, 2,107; transitional, 1,790; early mature, 2,825; mid mature, 2,511; late mature, 695). P-values were calculated by one-sided Mann–Whitney U tests. c, Bulk ATAC-seq fragment size distributions from mature and immature mTECs from published datasets9. d, Mean number of expressed AIRE-dependent tissue-specific genes (αTSGs) (blue) and mean WIP fraction (purple) across mTEC development. e, Cluster annotations overlaid on a UMAP of merged scATAC-seq data5 from Aire+/+ and Aire−/− mTECs. f, WIP fraction overlaid on the scATAC-seq UMAP defined in e. g, Paired violin plots comparing the distributions of WIP fraction from the indicated genotypes across the annotated developmental clusters defined in e. P-values calculated by one-sided Mann–Whitney U tests. KO, knockout. h, Rank-sorted differences in motif prevalence within accessible genomes (chromVAR deviation scores) of mTECs between the indicated developmental stages for 884 known transcription factor (TF) motifs. i, Distributions of the prevalence of p53-target motifs in accessible genomes (chromVAR deviation scores) of mTECs within the indicated cluster. j, Transcription factor footprinting at p53-target motifs (highlighted region) within the indicated mTEC developmental cluster. k, Aggregate expression of p53-target genes across mTEC development (overlaid on the scATAC-seq UMAP from Fig. 1b). l, Scatter plot of differential expression (TPM, transcripts per million) of known p53 regulators for the indicated comparisons (n = 2 biological replicates). Highly significant differentially expressed genes (Benjamini–Hochberg FDR ≤ 1 × 10−9, fold-change ≥ 2 or ≤ 0.5) indicated as repressors (brown text) or promoters (green text) of p53 activity. m,n, Expression of the p53 regulators Cop1 (m) and Sirt1 (n) across mTEC development (scATAC-seq UMAP from Fig. 1b).

Source Data

We also observed a reciprocal increase in the prevalence of nucleosomal fragments in mature mTECs compared with immature progenitors or peripheral T cells from previously published bulk ATAC-seq studies9,13 (Fig. 2c and Extended Data Fig. 2p–r). The progressive loss in WIP fraction and gain in OOP fragments mirrored the progressive increase in the number of αTSGs expressed per maturing mTEC (Fig. 2d). We also found these features to be conserved in human mTECs (Supplementary Fig. 2a–i and Supplementary Notes), further linking chromatin destabilization with ectopic transcription during mTEC maturation.

To determine whether this association is dependent on transcription, we analysed published scATAC-seq data5 from Aire−/− mTECs and found no substantial differences in WIP fraction between Aire+/+ and Aire−/− mTECs (Fig. 2e–g and Extended Data Fig. 2s–v). Rather, we observed a small increase in median nucleosomal fragments in the accessible genomes of mature Aire−/− versus Aire+/+ mTECs (Extended Data Fig. 2w), which may be associated with the previously reported repressive influence of AIRE on chromatin accessibility9,14. These data indicate that increased chromatin accessibility noise was not driven by AIRE-dependent expression of tissue-specific genes, nor by the AIRE-dependent facets of mTEC maturation.

mTECs repress p53 during maturation

To identify potential drivers of chromatin accessibility noise, we conducted transcription factor motif enrichment and found that the greatest differential feature was the depletion of p53-binding motifs in the accessible genome of mature versus immature mTECs (Fig. 2h,i and Extended Data Fig. 3a,b). We also observed losses in footprinting at sites containing p53-binding motifs and expression of validated p53 target genes15 in mature versus immature mTECs (Fig. 2j,k and Extended Data Fig. 3c). These data, in conjunction with the well-characterized roles of p53 in enforcing differentiation along committed lineage trajectories16 and inhibiting somatic reprogramming17, led us to hypothesize that mTECs repress p53 activity to amplify chromatin accessibility noise for cellular plasticity.

To investigate how p53 is repressed in mTECs, we assessed the differential expression of known p53 regulators and found highly significant induction of Mdm2 (the primary regulator of p53 that promotes its proteosomal degradation18) during mTEC maturation at both the transcript and protein levels (Fig. 2l and Extended Data Fig. 3d–g). Other negative regulators of p53 were also induced in mature versus immature mTECs, such as COP1, another E3 ubiquitin ligase of p53 (ref. 18), and SIRT1, which deacetylates p53 to inhibit its transactivation potential19 (Fig. 2l–n).

We also observed repression of p53 regulators that promote p53 activity, including genes that encode SET7/9 and PCAF (which respectively methylate20 and acetylate21 p53 to promote its transcriptional activity), c-ABL and CHK1 (which respectively phosphorylate p53 (ref. 22) and MDM2 (ref. 23) to inhibit p53 degradation), RASSF1A (which promotes MDM2 ubiquitination24) and ATF3 and p63 (which facilitate cooperative binding of p53 target genes25,26) in mature versus immature mTECs (Fig. 2l and Extended Data Fig. 3h–n). By contrast, we did not detect significant differences in the expression of Trp53 (which encodes p53), except for a brief induction in transit-amplifying mTECs that was extinguished before the AIRE+ state (Fig. 2l and Extended Data Fig. 3o). Importantly, we observed the systematic repression of p53 activity in human mTECs (Supplementary Fig. 2j–u and Supplementary Notes) and in mature AIRE-deficient mTECs (Extended Data Fig. 3p,q), which is consistent with the AIRE-independent nature of amplified chromatin accessibility noise.

Augmenting p53 activity stabilizes chromatin

To test the role of p53 in regulating chromatin accessibility noise, we generated Trp53LSL-QM/wt;Foxn1cre (p53-cHyper) mice that conditionally express in mTECs (from the endogenous Trp53 locus) a transactivation-dead p53 mutant that impairs MDM2 binding, allowing it to stabilize heteromeric wild type–mutant complexes to bring about modest increases in p53 activity27. Compared with sex-matched Trp53wt/wt;Foxn1cre (p53 wild-type) littermate controls, thymi from p53-cHyper mice had comparable total cellularity, frequency of AIRE+ mTECs, mean expression of AIRE, frequencies of FOXP3+ regulatory T cells and other thymocyte compartments, indicating that p53 hyperactivity did not disrupt mTEC or thymocyte differentiation in p53-cHyper mice (Extended Data Fig. 4a,b).

To investigate the effect of p53 hyperactivity, we FACS-sorted mTECs from sex-matched p53-cHyper and p53 wild-type littermates and jointly profiled the transcriptome and chromatin accessibility landscapes (Fig. 3a,b and Extended Data Fig. 4c). Differential enrichment of p53-binding motifs and enhanced footprinting at these sites in mature p53-cHyper versus p53 wild-type mTECs validated the perturbation (Extended Data Fig. 4d,e). We next quantified the proportion of scATAC-seq fragments within peaks (the WIP fraction) across the developmental trajectory and found highly significant increases in p53-cHyper compared with p53 wild-type mTECs (Fig. 3c and Extended Data Fig. 4f). Notably, the increase in WIP fraction became progressively larger across the developmental stages, such that the median WIP fraction of mature p53-cHyper mTECs was greater than that of immature p53 wild-type controls (Fig. 3c). Furthermore, we observed the reciprocal loss in prevalence of nucleosomal scATAC-seq fragments in mature p53-cHyper versus p53 wild-type mTECs, indicating greater stability of nucleosome-dense regions in p53-cHyper mTECs (Extended Data Fig. 4g).

Fig. 3: p53 Hyperactivity suppresses chromatin noise and mTEC plasticity.
figure 3

a, UMAP visualization of merged scATAC-seq profiles of mTECs from the indicated genotypes (colours) generated from Multiome datasets. b, Cluster annotations overlaid on the UMAP from a. c, Paired violin plots comparing distributions of the fraction of scATAC-seq fragments within scATAC-seq peaks (WIP) from the indicated genotypes across annotated clusters of mTEC maturation. P-values were calculated by one-sided Mann–Whitney U tests. d, Differences in the number (left) or sum of mRNA (right) of expressed αTSGs between neighbouring p53-cHyper and p53-WT mTECs overlaid on the UMAP from a. e, Scatter plot comparing the differential expression of genes by bulk RNA-seq resulting from mTEC maturation (horizontal axis) or p53 hyperactivity in mature mTECs (vertical axis) (n = 3 biological replicates). Fold-change densities are shown on the top and right margins. Statistically significant (Benjamini–Hochberg FDR ≤ 0.05) differential expression between p53-cHyper and p53-WT mTECs is highlighted in blue for all genes and red for αTSGs. f, UMAP of merged scRNA-seq profiles of p53-WT and p53-cHyper mTECs with cluster annotations overlaid. g, Differential density of genotypes (p53-cHyper versus p53-WT) across the mimetic mTEC clusters defined in f. h, Fraction of the indicated mimetic mTEC subtypes represented by each genotype. i, Comparison of the numbers of the indicated mimetic mTEC compartments between p53-WT and p53-cHyper sex-matched littermates (n = 6). P-values for two-sided paired ratio tests are shown.

Source Data

To determine whether the suppressed chromatin accessibility noise in p53-cHyper mTECs affected their potential for ectopic transcription, we quantified the expression levels of αTSGs and found both the diversity and magnitude of ectopic gene expression to be compromised in mature p53-cHyper versus p53 wild-type mTECs (Fig. 3d). To assess the full scope of this effect, we conducted bulk RNA sequencing (RNA-seq) on FACS-sorted mature and immature mTECs from sex-matched p53-cHyper and p53 wild-type littermates. We found that p53 hyperactivity in mature p53-cHyper mTECs significantly affected the expression of 6,279 genes, 77% of which were repressed, which is consistent with the heightened stability of nucleosomal barriers (Fig. 3e). We also found that 1,653 tissue-specific genes, which were normally induced during mTEC maturation, were repressed in mature p53-cHyper versus p53 wild-type mTECs, with around 70% of these being AIRE-dependent, indicating a strong concordance (P < 2.3 × 10−308) between p53 repression and AIRE-mediated ectopic expression of tissue-specific genes (Fig. 3e). At the same statistical thresholds, this effect resulting from p53 hyperactivity was more than 3.4-fold more deleterious than the effect of dysregulated mTEC maturation resulting from p53 deficiency in Trp53fl/fl;Foxn1cre (p53-cKO) mice on AIRE-dependent ectopic gene expression (Supplementary Fig. 3 and Supplementary Notes).

Augmenting p53 activity limits plasticity

To determine whether p53 hyperactivity in mTECs affected the differentiation of downstream ‘mimetic’ phenotypes5,6, we re-integrated the mimetic populations into the p53-cHyper/wild-type Multiome analysis (Fig. 3f and Extended Data Fig. 4h–m). We detected a relative paucity of p53-cHyper mTECs in the microfold, enterocyte, tuft and secretory mimetic compartments (3.1-fold, 2.8-fold, 1.4-fold and 1.3-fold fewer than p53 wild-type mTECs, respectively), in contrast to the almost 1:1 ratios observed in the keratinocyte and ciliated compartments (Fig. 3g,h). To confirm and extend these findings to other mimetic compartments, we used previously established mimetic flow cytometry panels6 (Extended Data Fig. 4n–s) and found a significant decrease in numbers of keratinocyte (about 28% fewer), ciliated (about 44% fewer) and myoid (about 52% fewer) mimetic mTECs in p53-cHyper versus p53 wild-type thymi, along with confirmed decreases in tuft (about 29% fewer) and overall mimetic mTEC numbers (30% fewer) (Fig. 3i). These data indicate that suppression of chromatin accessibility noise by p53 hyperactivity constrained the potential of mTECs to deviate from the established state, preventing the activation of genes restricted to other tissues and compromising the differentiation of mimetic subtypes.

p53 stabilizes chromatin indirectly

To identify the mode by which p53 suppresses chromatin accessibility noise, we conducted p53-targeted cleavage under targets & release using nuclease28 (CUT&RUN) in sorted immature and mature mTECs. We detected little p53 occupancy near αTSGs with heightened chromatin accessibility noise compared with the focal signal within p53 CUT&RUN peaks and p53 target genes, consistent with the relative dearth of p53 target motifs within 50 kb of αTSGs (Fig. 4a,b and Extended Data Fig. 5a–c). Moreover, we observed a prominent loss of p53 occupancy at p53-binding sites (p53 CUT&RUN peaks) in mature versus immature mTECs (Fig. 4c,d), consistent with the depletion of p53-binding motifs within scATAC-seq peaks during mTEC maturation. These results indicate that p53 does not physically localize to genomic regions with elevated chromatin accessibility noise and instead imposes its suppressive influence indirectly.

Fig. 4: p53 suppresses chromatin noise indirectly.
figure 4

a,b, Aggregate histograms of p53 occupancy at αTSGs (blue), p53 CUT&RUN (C&R) peaks (green) (a) or p53-induced target genes (red) (b) in total mTECs from wild-type mice. c, Heatmaps of p53 occupancy in immature (left) or mature (right) mTECs at p53-binding sites (CUT&RUN peaks). d, Aggregate histograms of p53 occupancy at p53 CUT&RUN peaks in immature (purple) or mature (blue) mTECs. e, MA plot of the differential expression of known p53 target genes between mature p53-cHyper and p53-WT mTECs by bulk RNA-seq (n = 3 biological replicates). Point colours show Benjamini–Hochberg FDR levels. Red text is used for genes known to promote cell death. Inset bar plots (right) indicate the most enriched Gene Ontology (GO) terms for p53-induced (top) or p53-repressed (bottom) genes in p53-cHyper mTECs. Neg reg DDR, negative regulators of DNA damage response. f,g, Differences in the expression of the genes Bax (f) and Perp (g) for neighbouring p53-cHyper and p53-WT mTECs overlaid on the UMAP from Fig. 3b. h, MA plot of the differential expression of genes encoding proteins of the intrinsic apoptosis pathway between mature and immature p53-cHyper mTECs by bulk RNA-seq (n = 3 biological replicates). Point colours show the Benjamini–Hochberg FDR levels; red text indicates genes that encode pro-apoptotic proteins and green text indicates pro-survival proteins. i,j, Differences in the expression of the genes Bcl2l11 (i) and Mcl1 (j) between neighbouring p53-cHyper and p53-WT mTECs overlaid on the UMAP from Fig. 3b. k–n, Aggregate histograms of OOP ChIP–seq dyads over the indicated loci in mature mTECs from the indicated genotypes; γH2AX (k), TOP2α (l), TOP1 (m) and POL2 (n). o,p, Comparison of transcription factor motif enrichment within scATAC-seq fragments from mature mTECs mapping to OOP regions within 100 kb of αTSGs (o) or silent genes (p) versus the adenine/thymine content of each motif. The trend line (red) with two-sided 95% confidence intervals (t-distribution) for the linear regression fit (grey) is indicated.

Source Data

To understand how p53 stabilizes chromatin indirectly, we conducted differential expression analysis of validated p53 target genes15 between p53-cHyper and p53 wild-type mature mTECs and found p53 hyperactivity-induced genes encoding effectors of cell death (Fig. 4e–g and Extended Data Fig. 5d,e). The connection between p53 activity and BAX (an effector of the intrinsic apoptosis pathway29) was particularly notable because the conditional deletion of Bax in mTECs (on a Bak−/− background) caused a selective increase in the number of immature mTECs30, the only developmental subset with substantial p53 activity in wild-type mice (Fig. 2h–k and Extended Data Fig. 3a–c). We also found significant upregulation of other genes encoding pro-apoptotic BH3-only proteins31 (such as NOXA, BIM, BIK and BLK) and Hrk, which blocks the pro-survival function of Bcl2l1 (encoding BCL-XL)32, as well as downregulation of pro-survival Bcl2 in mature versus immature p53-cHyper mTECs (Fig. 4h).

Skewed levels of pro-apoptotic versus pro-survival genes was also observed in p53 wild-type mature versus immature mTECs (Extended Data Fig. 5f), indicating that mature mTECs are inherently predisposed to BAX/BAK-driven apoptosis31. Thus, the differential upregulation of Bax and other genes encoding the pro-apoptotic proteins BIM, BAD and BID by p53 hyperactivity, and the downregulation of the essential pro-survival protein MCL-1 (ref. 33), differentially sensitized p53-cHyper versus p53 wild-type mTECs to apoptosis (Fig. 4e,f,h–j and Extended Data Fig. 5g,h). Together with the data indicating normal thymic cellularity and frequencies of mTEC and thymocyte compartments in p53-cHyper mTECs, these results indicate that p53 hyperactivity imposes a selective advantage for more-stable p53-cHyper mTECs with low chromatin accessibility noise by triggering apoptosis of those exhibiting cellular plasticity, which is potentially similar to the modes by which p53 eliminates premalignant cancers34.

Epigenetic noise is linked to DNA damage

Because DNA damage is the primary trigger for p53 activation in many contexts34, we reasoned that it may serve as a connection between chromatin accessibility noise and p53-mediated apoptosis in p53-cHyper mTECs. To investigate this possibility, we profiled the deposition of γH2AX, which marks sites of DNA double-strand breaks35, is the substrate for ataxia telangiectasia mutated (ATM) kinase (which also phosphorylates p53 for activation35), and is upregulated in mature versus immature p53-cHyper mTECs, along with a cohort of other genes encoding DNA damage-response proteins (Extended Data Fig. 5i). Using published chromatin immunoprecipitation followed by sequencing (ChIP–seq) datasets from mTECs36, we found elevated levels of γH2AX at OOP regions flanking αTSGs compared with those at unexpressed silent loci with low levels of chromatin accessibility noise (Fig. 4k and Extended Data Fig. 5j). The elevated deposition of γH2AX was largely maintained in Aire−/− mTECs, indicating that AIRE-induced transcription and AIRE-mediated recruitment of topoisomerases36,37 did not contribute significantly to the DNA damage at destabilized regions near αTSGs (Fig. 4k). However, we did observe AIRE-independent differential localization of both TOP2α and TOP1 near αTSGs compared with silent loci, indicating the recruitment of enzymatic activity that causes DNA breaks (Fig. 4l,m).

To understand how topoisomerases are recruited to genomic regions with increased chromatin accessibility noise, we focused on their role in generating long-lived DNA breaks at transcription units to resolve the torsional stress caused by RNA polymerase II (Pol II)38. We explored the possibility that stochastic loading of Pol II may be enhanced at destabilized chromatin flanking αTSGs, because nucleosome density has a role in preventing spurious transcriptional initiation39. Towards this goal, we profiled global Pol II occupancy in mTECs using published ChIP–seq datasets36 and found elevated levels of OOP Pol II recruitment at destabilized regions near αTSGs compared with those near silent loci (Fig. 4n). The elevated Pol II occupancy was not a consequence of AIRE-induced transcription because the levels were largely maintained in Aire−/− mTECs (Fig. 4n). Furthermore, annotated distal enhancers40 in destabilized regions near αTSGs also exhibited elevated loading of Pol II, γH2AX deposition and topoisomerase recruitment compared with enhancers near silent loci (Supplementary Fig. 4a–p and Supplementary Notes). Together, these results indicate a concordance between amplified chromatin accessibility noise, spurious transcriptional initiation, recruitment of topoisomerases, increased DNA damage and p53-mediated triggering of apoptosis in p53-cHyper mTECs versus cellular plasticity in wild-type mTECs.

Epigenetic noise is genome-encoded

To identify the determinant that initially causes chromatin accessibility noise in mTECs, we conducted de novo motif enrichment on the destabilized chromatin fragments and found the enrichment of a near-homopolymeric 10-mer adenine tract to be different from the other motifs (Fig. 4o). This was notable because poly(dA:dT) tracts are known to destabilize nucleosomes at magnitudes proportional to the length and perfection of the tracts41; and an 18-mer poly-A tract was the most-enriched motif in genomic regions associated with allelic imbalances in chromatin accessibility and gene expression of nearby αTSGs in mTECs from NOD × B6 F1 hybrid mice37. We observed a highly significant positive correlation between motif enrichment in OOP fragments flanking αTSGs and the motif adenine/thymine (AT) content (R = 0.42, P = 4.9 × 10−24), with the top three enriched motifs consisting of an average of 93.3% AT (Fig. 4o). By contrast, the motif enrichment in fragments within peaks overlapping the same genomic regions was negatively correlated with motif AT content (R = −0.14, P = 1.9 × 10−3) with the top three motifs containing an average of 16.1% AT (Extended Data Fig. 5k). Similarly, neither OOP fragments from silent loci (R = –0.14, P = 1.8 × 10−3; top three enriched motifs, around 16.6% AT) nor highly expressed housekeeping loci (R = 0.05, P = 0.10; top three enriched motifs, around 25.0% AT) showed a substantial correlation between motif enrichment and motif AT content (Fig. 4p and Extended Data Fig. 5l). This association between chromatin accessibility noise and enrichment of low-complexity AT-rich motifs suggests that the intrinsic resistance of AT-rich sequences to adopt nucleosome-favourable conformations41,42 may be a determinant of chromatin accessibility noise in vivo.

The selective enrichment of low-complexity AT-rich motifs at OOP fragments near αTSGs could be a product of transcription factor activity and/or an inherent prevalence of these motifs in the genomic sequence. To investigate the first possibility, we assessed the expression levels of the transcription factors that target these motifs in maturing versus immature mTECs. We found their expression to be around 1–3 orders of magnitude lower than those encoding known regulators of mTEC differentiation (such as Foxn1 or Relb) and even lower than that for insulin (Ins2) transcripts in mature AIRE+ mTECs (Extended Data Fig. 5m), indicating that they were unlikely to contribute substantially to chromatin accessibility noise.

To investigate the inherent prevalence of poly-AT tracts in the genomic sequence near αTSGs, we generated a set of 500-base pair contiguous tiles spanning 100-kb regions flanking αTSGs minus any tile that overlapped with scATAC-seq peaks. We observed no positive correlation between motif enrichment and motif AT content; however, of the top 5% of the enriched motifs (n = 26), 4 had more than 75% AT content, 3 of which consisted of imperfect 10–12-mer AT tracts (Extended Data Fig. 5n). Another motif that met the top 10% threshold was 91.6% AT with an imperfect 10-mer AT tract (Extended Data Fig. 5n). By contrast, none of the significantly enriched motifs at the silent loci had an AT content of 75% or more, or long imperfect AT tracts (Extended Data Fig. 5o). These results reinforce a concordance between the genomic prevalence of nucleosome-disfavourable poly-AT tracts and amplified chromatin accessibility noise at regions flanking αTSGs, including at nearby tissue-specific enhancers (Supplementary Fig. 4q,r and Supplementary Notes).

Augmenting p53 activity in mTECs causes autoimmunity

To determine whether the suppression of chromatin accessibility noise and mTEC plasticity by p53 hyperactivity affected T cell repertoire selection, we assayed for evidence of the escape of self-reactive T cells in the organs of p53-cHyper mice. We found significant increases in activated T cells that became progressively larger as a function of age, as well as reciprocal decreases in naive T cells compared with their p53 wild-type sex-matched littermates (Fig. 5a,b and Extended Data Fig. 6a–c). We also found substantial lymphocytic infiltration in multiple organs of 6–12-month-old p53-cHyper mice, most frequently in the salivary gland, lung, kidney, lacrimal gland, pancreas, liver and prostate/ovaries, that was not present or less severe in p53 wild-type mice (Fig. 5c–e and Extended Data Fig. 6d), indicating autoreactive tissue damage by the activated effector T cells. Together, these results indicate that enhanced p53 activity in mTECs causes a failure in central tolerance induction, leading to systemic hyperactivity of peripheral T cells and autoimmune manifestations in multiple organs.

Fig. 5: Augmenting p53 activity in mTECs causes multi-organ autoimmunity.
figure 5

a, Representative flow-cytometry plots of the effector memory T (TEM) cell and naive T (TN) cell compartments of splenic CD4+ T cells from 9-month-old p53-WT (left) and p53-cHyper (right) sex-matched littermates. b, Frequency of CD4+ TEM cells in the spleen from p53-WT and p53-cHyper mice as a function of age and lines of best fit for each genotype. c, Histological analysis of the indicated tissues for infiltrating lymphocytes through haematoxylin and eosin (H&E) staining (filled triangles indicate a histopathology score ≥ 2) from 7–12-month-old p53-WT and p53-cHyper mice. Each heptagon represents an individual mouse. d, Comparison of the mean histopathology scores (based on the number and size of lymphocytic infiltrates) from the organs assessed in c and in Extended Data Fig. 6d for sex-matched littermates of the indicated genotypes. The P-value was calculated by a one-sided t-test. e, Representative H&E staining of the indicated organs from p53-WT and p53-cHyper mice (n = 8). The arrows indicate lymphocytic infiltration. Scale bars, 100 μm.

Source Data

Epigenetic noise is regulated by p53 in cancer

The importance of p53 in tumour suppression34 and the role of somatic plasticity in tumour progression2 prompted us to explore whether the link between chromatin accessibility noise, p53 repression and cellular plasticity in mTECs can be extended to tumour contexts. Towards this goal, we investigated the well-characterized lung adenocarcinoma (LUAD) model43 by using published Multiome datasets of lung epithelial cells from Trp53wt/wt (p53 wild-type), Trp53−/− (p53-knockout) and Trp53LSL-F53Q,F54S (p53-Hyper) backgrounds44 10 weeks after oncogenic KRAS-G12D activation. We based our analysis on the established stages of LUAD development in p53 wild-type and knockout backgrounds43, and identified four main transcriptional states of LUAD progression43: the native alveolar type-2 (AT2) state; a mixed AT1/AT2 state mimicking alveolar progenitors; an embryonic liver-like state representing the loss of alveolar identity; and an epithelial-to-mesenchymal transition (EMT) state representing the loss of epithelial identity (Extended Data Fig. 7a–f).

We identified cells in four clusters (5, 8, 6 and 7) with both a low-WIP fraction and a high prevalence of nucleosomal scATAC-seq fragments indicating destabilized chromatin (Extended Data Fig. 7b,e–j). A common feature across all these cells was the loss of AT2-specific gene expression, including the loss in expression and activity of the lung lineage-defining transcription factor NKX2.1 (refs. 43,45) (Extended Data Fig. 7c,k,l). Furthermore, cells from cluster 6 exhibited high expression and activity levels of RUNX2, a driver of LUAD metastatic transition46 (Extended Data Fig. 7m,n). Moreover, ‘transitional’ cells coexpressing Nkx2-1 and Runx2 (cluster 3), or exhibiting transcription factor activity for both NKX2.1 and RUNX2 (cluster 4), had intermediate WIP fractions with a low prevalence of nucleosomal fragments (Extended Data Fig. 7g–n). Together, these results indicate a concordance between the magnitude of chromatin destabilization and the degree to which cells deviated from the native lung epithelial lineage.

We next re-integrated the p53-Hyper cells on the developmental axis we established with p53 wild-type and p53-knockout cells (Extended Data Fig. 8a–j) and quantified the differential WIP fraction and prevalence of nucleosomal scATAC-seq fragments as a function of the Trp53 genotype for each cluster (Extended Data Fig. 8l–o). For all clusters except the embryonic liver-like cluster 5 (perhaps owing to the paucity of p53-Hyper cells in this cluster), we observed significant increases in WIP fraction in p53-Hyper versus p53 wild-type cells, indicating that p53 hyperactivity suppressed chromatin accessibility noise (Extended Data Fig. 8n). Reciprocally, we observed significant decreases in the WIP fraction in p53-knockout cells compared with p53 wild-type cells in all clusters except the knockout-enriched Runx2posNkx2-1pos cluster 3, indicating less stable nucleosomal barriers in p53-deficient cells (Extended Data Fig. 8n). By and large, we also observed consistent associations between the Trp53 genotype and the prevalence of nucleosomal scATAC-seq fragments (Extended Data Fig. 8o). Together, these results indicate that the distinct roles of p53 in suppressing chromatin accessibility noise and promoting lineage fidelity work in concert to inhibit phenotypic plasticity during LUAD tumorigenesis.

Discussion

The mechanistic basis of somatic plasticity has remained enigmatic despite its broad relevance to organismal adaptation, tissue repair and tumorigenesis1,2. Our findings indicate a central role for amplified fluctuations in background chromatin accessibility that increase the entropy of the epigenetic landscape, which is similar to raising a ‘statistical temperature’ to facilitate a more permissive state. In the developmental framework of cellular differentiation, these results indicate that somatic plasticity is not initially driven by the induction of alternative cell fates, but rather by the stochastic erosion of chromatin barriers.

Our findings show that amplified accessibility noise at chromatin barriers contributes to a highly mixed transcriptional state that lacks clear regulatory logic, as observed in the coexpressed genes in individual AIRE+ mTECs47,48,49 or high-plasticity states of LUAD progression43 that are not related by tissue-specific function, lineage-specific regulation or developmental origin. This high-plasticity state then promotes heterogeneity in downstream phenotypes, as observed in the differentiation of mimetic mTECs3,5,6 or the emergence of primordial gut programs and the EMT state in advanced LUAD tumours43,44 (Supplementary Discussion).

Notably, the destabilization of chromatin barriers and the repression of p53 in mTECs did not require AIRE, in concordance with previously published studies demonstrating that AIRE acts late in the transcription cycle8,9,14. Considering the affinity of AIRE to positive elongation factors of transcription50,51,52, it may function to facilitate Pol II pause release at sites of destabilized chromatin, increasing the probability and frequency of transcriptional bursts. AIRE could target these sites through multiple modes, including: interactions with topoisomerases and DNA repair machinery36,37,53,54; the formation of Z-DNA, resulting from the negative supercoils generated by nascent transcription37,38,55; AIRE’s histone-binding module specific for unmodified amino-terminal histone H356,57,58,59; and the interaction of AIRE with the ATF7ip–MBD1 complex, which targets repressive chromatin enriched in H3K9me3 and DNA methylation60.

Our findings identify molecular levers that regulate fluctuations in nucleosome dynamics at chromatin barriers that influence a cell’s potential for alternative fates. Understanding the molecular and biophysical mechanisms that govern noise in the chromatin landscape may illuminate principles of gene regulation and cellular differentiation, and enable therapeutic innovation for diseases linked to chromatin instability.

Methods

Mice

The mice used in this study were housed in pathogen-free facilities at the University of Chicago and Stanford University. All mice were housed in positively pressurized, individually ventilated cage racks and changed in biological safety cabinets. Cage supplies were sanitized using hot water (82 °C). Bedding and shredded-paper enrichment were autoclaved and cages were provided with irradiated food. Reverse Osmosis water was provided by an automated watering system directly to each cage. Rodent housing rooms were maintained at a 12 h:12 h light:dark cycle. Temperature and humidity were within the Guide for the Care and Use of Laboratory Animals recommended ranges: 20–26 °C and 30–70% humidity. All experiments and animal-use procedures were conducted in compliance with the Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Chicago. B6.129-Trp53LSL-L25Q,W26S,F53Q,F54S heterozygous mice27,61 were provided by Laura Attardi (Stanford University) and were bred with B6-Foxn1cre homozygous mice62 purchased from Jackson Laboratories to generate Trp53LSL-L25Q,W26S,F53Q,F54S/wt;Foxn1cre/wt and Trp53wt/wt;Foxn1cre/wt littermates. Trp53fl/fl mice were purchased from Jackson Laboratories and bred with B6-Foxn1cre mice to generate Trp53fl/fl;Foxn1cre/wt mice. C57BL/6J mice were purchased from Jackson Laboratories. mTECs and thymocytes were collected from mice 4–5 weeks old. Sex-matched littermates were used for all comparisons of genetic perturbations.

Isolation, sorting and analysis of mouse mTECs

Thymic epithelial cells were isolated as previously described63 with minor modifications. In brief, thymi from 4–6-week-old mice were removed and connective tissue was removed. Stromal tissue was perforated using scissors and incubated with rotation in DMEM-F12 (Gibco) at room temperature for 10 min to liberate the thymocytes. The remaining stromal tissue was enzymatically digested (0.5 mg ml−1 Collagenase D (MilliporeSigma), 0.2 mg ml−1 DNaseI (MilliporeSigma), 0.5 mg ml−1 Papain (Worthington Biochemical)). Cells were stained with anti-EpCAM antibodies conjugated to APC-Cy7 (clone G8.8, BioLegend, 3 µl per 100 million cells) and EpCAM+ cells were enriched by positive selection using magnetic anti-Cy7 beads (Miltenyi, 10 µl per 100 million cells). The enriched fraction was stained with the appropriate panel of fluorochrome-conjugated antibodies to CD45 (clone 30-F11, Invitrogen, 1:100), Ly-51 (clone 6C3, BioLegend, 1:100), MHC-II I-A/I-E (clone M5/114.15.2, Invitrogen, 1:100), CD104 (clone 346-11A, BD Biosciences, 1:200), GP2 (clone 2F11-C3, MBL, 1:10), CD177 (clone 1171 A, R&D, 1:25), Ly-6D (clone 49-H4, Invitrogen, 1:200), Sca-1 (clone D7, BioLegend, 1:200), AIRE (clone 5H12, Invitrogen, 1:500), Ki-67 (clone SolA15, Invitrogen, 1:100), SynCAM (clone 3E1, MBL, 1:100), CD171/L1CAM (clone 555, Miltenyi, 1:25) along with fluorescein-labelled UEA-I (Vector Labs, 1:100), Zombie Aqua (BioLegend, 1:500) and DAPI (Invitrogen, 1:20). Intracellular staining for AIRE and Ki-67 was subsequently done using the eBioscience FoxP3 transcription factor staining kit (Invitrogen) according to the manufacturer’s instructions. Intracellular staining for MDM2 (clone EPR22256-98, Abcam, 1:25) was also done using the eBioscience FoxP3 transcription factor staining kit (Invitrogen) according to the manufacturer’s instructions with the addition of a 1-h incubation in blocking buffer (eBioscience permeabilization buffer with 5% normal donkey serum) before a secondary stain (BV412 donkey anti-rabbit, Jackson Immuno, 1:50). Cells were sorted using FACS Symphony S6, FACSAria Fusion or FACSAria II equipped with a 100-μm nozzle (BD Biosciences). Flow-cytometry data for thymic mimetic cells were acquired using a Cytek Aurora. All other flow-cytometry data were acquired using a BD LSRII or Fortessa. All flow-cytometry data were analysed using FlowJo (v.10).

Human thymic tissue acquisition and processing

Thymus fragments were obtained from a 12-week-old human patient with no known genetic abnormalities undergoing standard-of-care cardiac surgery. The patient was de-identified on receipt with written informed consent for the release of genomic sequence data in accordance with IRB protocol 20–1392 approved by the Biological Sciences Division and University of Chicago Medical Center Institutional Review Boards at the University of Chicago and protocol 2020-203 approved by the Advocate Aurora Health Research Subject Protection Program and Advocate Aurora Health Care Institutional Review Board. Connective tissue was removed and the remaining tissue was minced, then incubated with rotation in DMEM-F12 (Gibco) at 4 °C for 20 min to liberate the thymocytes. Stromal tissue was enzymatically digested using 0.5 mg ml−1 Collagenase D (MilliporeSigma) and 0.2 mg ml−1 DNase I (MilliporeSigma) at 37 °C for 20 min. The remaining fragments were incubated with rotation in 0.5 mg ml−1 Papain (Worthington), 0.25 mg ml−1 Collagenase D and 0.1 mg ml−1 DNase I at 37 °C for 20 min. Cells were stained with anti-EpCAM antibodies conjugated to APC-Cy7 (clone 9C4, BioLegend, 1:100) and EpCAM+ cells were enriched by positive selection with magnetic anti-Cy7 beads (Miltenyi). The enriched fraction was stained with DAPI (Invitrogen, 1:20), CD45 (clone 2D1, BioLegend, 1:100), LY51/CD249 (clone 2D3/APA, BD Biosciences, 1:00) and HLA-DRA (clone L243, BioLegend, 1:100) and sorted on a Symphony S6 (BD Biosciences).

Flow cytometry of thymocytes and splenocytes

Thymi from 4–6-week-old mice were removed and small cortical incisions were made before mechanical agitation with wide-bore glass pipettes in DMEM/F-12 (Gibco) to liberate the thymocytes. Spleens from mice aged 4 weeks to 12 months old were isolated in RPMI (Gibco) supplemented with 10% FCS. Cells were liberated by mincing with a syringe plunger and filtered through a 40-μm strainer. Following red blood cell lysis (BD PharmLyse), cells were stained with fluorochrome-conjugated antibodies specific for mouse CD4 (GK1.5, 1:100), CD8α (53-6.7, 1:100), CD25 (PC61, 1:100), CD44 (IM7, 1:100), CD69 (H1.2F3, 1:100), CD62L (MEL-14, 1:100), TCRβ (H57-597, 1:100) and DAPI (Invitrogen, 1:20). Intracellular staining for FoxP3 (clone FJK-16s, eBioscience, 1:100) was done using an eBioscience FoxP3 transcription factor staining kit (Invitrogen) according to the manufacturer’s instructions. Flow-cytometry data were acquired using a BD LSRII or Fortessa and analysed using FlowJo (v.10).

Bulk RNA-seq sample preparation

We FACS-sorted 75,000 primary mTECs directly into RULT lysis buffer (Qiagen RNEasy UCP Micro Kit) and total RNA was extracted following the manufacturer’s instructions. The mRNA was enriched and RNA-seq libraries were constructed using an Illumina TruSeq Stranded mRNA kit. Paired-end, dual-index sequencing was performed on an Illumina NovaSeq 6000 platform.

Bulk RNA-seq data processing

RNA-seq reads were mapped to the mm10 mouse genome assembly using TopHat (v.2.1.1) with the setting –microexon-search. Unmapped, unpaired and low-quality reads (MAPQ ≤ 5) were removed using samtools (v.1.9) view with settings -q 5 -f 2. Paired reads were counted for each gene using featureCounts from Subread (v.2.0.1). TPM values were calculated for each gene to quantify the relative abundance of transcripts for clustering analysis. The trimmed mean of M values was calculated for each gene for differential comparisons across samples using edgeR (v.4.0.2) (calcNormFactors()). Common dispersions were estimated using estimateCommonDisp() and Benjamini–Hochberg FDRs were calculated for pairwise comparisons using the exactTest(). Genes with FDR ≤ 0.05 were regarded as significant.

Definition of tissue-specific and AIRE-dependent genes

Previously published transcriptional data64 from Aire wild-type and Aire-knockout mTEChi were analysed according to the bulk RNA-seq pipeline outlined above. Genes that exhibited at least 1.5-fold induction in Aire wild type relative to Aire knockout and had Benjamini–Hochberg FDR ≤ 0.05 were regarded as Aire-induced. TSGs were classified as previously64, and αTSGs were taken to be the intersection of these two gene sets. For human TSGs, GTEx65 expression counts (median TPM), Shannon entropy \(\left(S=-\sum p{\log }_{2}p\right)\) across tissues was calculated for each gene. Genes with an entropy S ≤ 3 were included for downstream analyses.

Multiome sample preparation and sequencing

For all Multiome experiments, we used an ATAC + GEX single-cell kit and protocol (10X Genomics 1000236 with protocol CG000338 RevE) with minor modifications to sample preparation. In brief, 40,000 mTECs were FACS-sorted into 1× PBS supplemented with 2% BSA and centrifuged at 300g for 5 min. Cells were gently washed in 50 μl lysis buffer (10 mM Tris, 10 mM NaCl, 3 mM MgCl2 in nuclease-free water) and centrifuged at 300g for 5 min. Cells were resuspended in 50 μl permeabilization buffer (10 mM Tris, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween20, 0.01% digitonin and RNase inhibitor (Invitrogen) in nuclease-free water) and incubated for 5 min on ice. Nuclei were gently washed with wash buffer (10 mM Tris, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween20 and RNase inhibitor in nuclease-free water) and centrifuged at 500g for 5 min. Finally, nuclei were resuspended in 5 μl chilled diluted nuclei buffer (10X Genomics) and added to the transposition mix. Paired-end, dual-index sequencing was performed on an Illumina NovaSeq 6000 platform.

Multiome data quality control

After sequencing, bcl files were converted to fastq using cellranger-arc (v.2.0.2) mkfastq. FASTQ files were aligned to the mm10 or hg38 genome assembly using cellranger-arc count. ATAC-seq fragment files were used as inputs to the ArchR66 (v.1.0.2) analysis pipeline in R (v.4.3.2). Transcript count matrices were used as inputs to the Seurat (v.5.1.0) gene expression analysis pipeline. For gene expression quality control, cells with nFeature_RNA ≥ 250 and ≤ 6,000, nCount_RNA ≤ 25,000 and percent_mitochondrial ≤ 25 were included for downstream analyses. Transcript counts were log-normalized. For scATAC-seq quality control, cells with n_ATAC_Frags ≥ 3,000 and TSS_Score ≥ 10 were included for downstream analyses. Doublet inference was conducted using ArchR addDoubletScores(), and presumed doublets were excluded. Cells that passed each filter were admitted for downstream analyses. Finally, based on gene expression markers, contaminating cells (thymocytes) and putative mTEC mimetic cells were excluded from analysis (except for targeted analyses of mimetic compartments). In the wild-type multiome (Fig. 1), a further cluster of cells that exhibited uncharacteristically low TSS enrichment scores was excluded.

Multiome data processing

Dimensionality reduction, scATAC-seq clustering, projections, pseudotime, transcription factor motif enrichment (except for scATAC-seq fragments or genomic tiles, which was computed using HOMER2 (v.5.1) findMotifsGenome.pl with settings -size given), and transcription factor footprinting were performed using the ArchR pipeline with default parameters. For UMAP plots overlaid with continuous colour scales, MAGIC67 (v.2.0.3) imputation was used for data smoothing to facilitate better visualization. MAGIC-imputed values were used for UMAP display purposes only; imputed values were not used anywhere else in the analysis of scATAC-seq or scRNA-seq datasets (such as violin plots or heatmaps). For scATAC-seq peak calling, the standard ArchR workflow was used using MACS2 (v.2.2.9.1). To maximize the detection of open chromatin regions specific to each sample and stage in the mTEC developmental trajectory, fixed-width 501-bp scATAC-seq peaks were called (extendSummits = 250) on the Tn5-corrected single base insertions (shift = −75, extsize = 150, –nomodel) for each scATAC-seq cluster identified per sample (groupBy = Clusters, reproducibility = 1) using the ArchR wrapper function addReproduciblePeakSet(). The significance of each called peak was calculated as a false discovery rate (q-value) comparing the observed number of Tn5 insertions in the sliding window (300 bp) and the expected number of insertions (total number of insertions/genome size (–nolambda)). A q-value cutoff (cutOff = 0.1) and an upper limit for the number of peaks called per cell (peaksPerCell = 1,000, minCells = 100) were applied to prevent consideration of low-quality peaks. We also excluded peaks that mapped to the mitochondrial or Y chromosomes (excludeChr = c(chrM, chrY)). Peak sets called from each scATAC-seq cluster from respective samples were combined and trimmed for overlap using an iterative procedure that discarded any peak that directly overlapped with the most significant peak66. The resultant ‘union peak set’ was applied to all cells for WIP and OOP count-based and motif-based analyses. The fraction of fragments within peaks was computed automatically as a product of the addReproduciblePeakSet() function. Subnucleosomal and mononucleosomal fractions for each cell or sample were computed as the fraction of the cell’s scATAC-seq fragments whose length L ≤ 100 bp (subnucleosomal) or 100 < L ≤ 200 bp (mononucleosomal). To ensure reproducibility of bioinformatic analysis results, for each dataset, a single script was used for all the quality control and pre-processing, including purging of low-quality cells, doublet removal, peak calling, motif enrichment, dimensionality reduction and clustering. A file representing the full processed data was saved using saveArchRProject() and loaded for all subsequent analyses (this file was not edited after pre-processing). More individual scripts were used to load processed data and perform specific analyses or generate specific figures.

Peak-centric differential accessibility analysis

Differential chromatin accessibility analysis across peaks was done using ArchR getMarkerFeatures() with the following arguments: useMatrix = PeakMatrix, bias = c(TSSEnrichment, log10(number of scATAC-seq fragments)), testMethod = wilcoxon.

Processing of OOP scATAC-seq fragments

For each Multiome dataset, WIP and OOP fragments near genes of interest (such as αTSGs, housekeeping genes and maturation-induced genes) were retrieved using the ArchR and GenomicRanges R packages. For each gene: first, a search window, search_window, was established around the \({\rm{TSS}}({\rm{search}}\_{\rm{window}}={\rm{TSS}}\pm {\ell })\); and second, scATAC-seq fragments intersecting the search_window were retrieved from cells of interest, cell_subset, using the ArchR getFragmentsFromProject() function with arguments subsetBy = search_window and cellNames = cell_subset. Fragments were then partitioned based on whether they overlapped the data’s union peak set using subsetByOverlaps() with arguments invert = FALSE to retrieve WIP fragments, or invert = TRUE to retrieve OOP fragments. Finally, fragments were binned and/or tallied for the specific application (see below).

Analyses comparing αTSGpos and αTSGneg mTECs

Cells from early mature, mid mature and late mature clusters expressing any αTSGi > 0 were selected as the αTSGpos cohort and a size-matched cohort of αTSGneg cells was sampled randomly from the remaining cells from the same three clusters. These cohorts were then used as inputs to getMarkerFeatures()in ArchR for differential accessibility of peaks between αTSGpos and αTSGneg mTECs. For local OOP and WIP analysis, ATAC-seq fragments within peaks and outside of peaks from αTSGpos and αTSGneg cohorts were intersected with a ±5 kb sliding window with 1 kb increments, normalized to the total number of ATAC-seq fragments per cell, and tallied in each window within a region flanking αTSGi . For αTSG coexpression analysis, the probability of detecting each αTSGi  neighbouring αTSG0  within the specified length scale (or a randomly selected alternative αTSG as a control) was computed for each of the αTSGpos and αTSGneg cohorts.

Regression analysis

For each αTSGi, the total number of OOP and WIP scATAC-seq fragments within the characteristic window of instability \(({\ell }=\pm 50\,{\rm{kb}})\) was computed for each mTEC in the early mature, mid mature and late mature clusters. A logistic regression framework was used (glm() with family = binomial) to estimate the probability of expressing a given αTSG based on the number of log10(OOP + 1) or log10(WIP + 1) fragments using log10(n_ATAC_Frags) per cell as a covariate. P-values for regression coefficients were generated using the Wald-χ2 test (anova(test = ‘LR’)).

CUT&RUN sample preparation

CUT&RUN was performed as previously described28 with minor modifications. In brief, 350,000–500,000 cells were washed 3 times in wash buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM spermidine, 1× EDTA-free protease inhibitor cocktail (Roche)) then bound to Concanavalin-A beads (Bangs Laboratories) according to the manufacturer’s instructions. Cells were incubated with 1:100 dilution of anti-p53 antibody (Leica NCL-L-p53-CM5p) for 2 h or overnight at 4 °C in permeabilization buffer (1× permeabilization buffer (eBioscience), 0.5 mM spermidine, 1× EDTA-free protease inhibitor cocktail, 2 mM EDTA). The sample was then incubated with 700 ng ml−1 pA-MNase (S. Henikoff) in permeabilization buffer at 4 °C for 1 h. Digestion was done in 0.5× permeabilization buffer supplemented with 2 mM CaCl2 at 4 °C for 1 h. The reaction was stopped by the addition of 2× stop buffer (final concentration 100 mM NaCl, 10 mM EDTA, 2 mM EGTA, 20 μg ml−1 glycogen, 25 μg ml−1 RNase A (Thermo Fisher)) and the sample was incubated at 37 °C for 20 min. Protein in the sample was then digested in 0.1% SDS and 250 μg ml−1 Proteinase K (New England Biolabs) for 2 h at 56 °C, shaking gently. CUT&RUN fragments were purified by phenol chloroform extraction. CUT&RUN libraries were generated using NEBNext UltraII DNA Library Prep Kit for Illumina coupled with NEBNext Multiplex Oligos for Illumina (New England Biolabs) with modifications optimized for small fragments, as detailed in https://doi.org/10.17504/protocols.io.wvgfe3w. Paired-end, dual-index sequencing was performed on the Illumina NextSeq500 platform.

CUT&RUN data processing

CUT&RUN reads were mapped to mm10 mouse genome assembly using Bowtie2 (v.2.2.9) with settings --local --very-sensitive-local –no-unal –no-mixed –no-discordant –phred33 -I 10 -X 700. PCR duplicates were removed using Picard (v.2.21.8) MarkDuplicates REMOVE_DUPLICATES=true VALIDATION_STRINGENCY = LENIENT. Reads with MAPQ scores below 30 were purged and excluded from downstream analysis using samtools (v.1.9) view -b -q 30 -f 2 -F 1804. Peaks were called for each sample using MACS2 (v.2.2.7.1) with settings --shift 0 --extsize 200 --nomodel --call-summits --keep-dup all -p 0.01. For each sample, a 301-bp fixed-width peak set was generated by extending the MACS2 summits by 150 bp in both directions. Peaks were ranked by significance (MACS2 peak score) and overlapping peaks with lower peak scores were removed iteratively to create non-overlapping sample peak sets. Peaks mapping to chrY, as well as any that spanned genomic regions containing “N” nucleotides, were removed. Robust peaks were defined by a score per million (SPM) (each peak score divided by the sum of all peak scores in the sample, divided by 1 million), and we retained only those peaks with SPM ≥ 5. We defined p53 CUT&RUN peaks by further filtering for peaks that overlapped with known p53-binding motifs (HOMER2, v5.1) from samples with characterized p53 activity (mTEClo samples). CUT&RUN fragment counts across regions of interest were normalized by the number of unique fragments in the sample library.

ChIP–seq data processing

ChIP–seq reads were mapped to mm10 mouse genome assembly using Bowtie2 (v.2.2.9) with settings --very-sensitive -X 2000. PCR duplicates were removed using Picard (v.2.21.8) MarkDuplicates REMOVE_DUPLICATES=true VALIDATION_STRINGENCY = LENIENT. Reads with MAPQ scores below 30 were purged and excluded from downstream analysis using samtools (v.1.9) view -b -q 30 -F 1796. ChIP–seq read counts were normalized by the number of unique reads in the sample library.

Histopathology

Histopathology experiments were carried out as previously described9. In brief, tissues were fixed in buffered 10% formalin and paraffin-embedded. H&E staining was done by the standard methods. Histopathology scores were assigned using a four-tier system based on the degree and distribution of lymphocytic infiltration observed in the tissue sections. A score of 0 was assigned when no lymphocyte infiltration was detected; a score of 1 corresponded to minimal infiltration, characterized by very few small, isolated clusters; a score of 2 corresponded to moderate infiltration, in which several small to moderately sized clusters of lymphocytes were observed; a score of 3 corresponded to severe, diffuse infiltration, indicated by the presence of numerous large clusters distributed throughout the tissue.

Statistical analysis

De novo and known transcription factor motif P-values were determined using HOMER2 (v.5.1). For bulk RNA-seq, P-values for differentially expressed genes were computed using edgeR (v.4.0.2) (estimateCommonDisp()) and corrected for multiple testing using the Benjamini–Hochberg FDR method. For scATAC-seq and scRNA-seq, FDR-corrected Wilcoxon test P-values for differentially accessible ATAC peaks and differentially expressed genes were computed using ArchR (v.1.0.2) (getMarkerFeatures(testMethod = “wilcoxon”)). Logistic regression coefficient estimate P-values were computed using analysis of variance (ANOVA; anova(test = “Chisq”)) to compare the regression results from glm(). Box plots show the median (centre line), 25th and 75th percentiles (edges), and whiskers show ±1.5 times the interquartile range. Outliers beyond the interquartile range are represented as individual dots. All other P-values and statistical tests were computed in R or Prism and are specified in the figure legends.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.