Abstract
Transposable elements (TEs) are DNA sequences that expand selfishly in the genome, possibly causing severe cellular damage. While normally silenced, TEs have been shown to activate during aging. DNA 5-methylcytosine (5mC) is one of the main epigenetic modifications by which TEs are silenced and has been used to train highly accurate age predictors. Yet, one common criticism of such predictors is that they lack interpretability. In this study, we investigate the changes in TE 5mC methylation that occur during aging in human blood using published methylation array data. We find that evolutionarily young long interspersed nuclear elements 1 (L1s), the only known TEs capable of autonomous transposition in humans, undergo the fastest loss of 5mC methylation, suggesting an active mechanism of de-repression. The same young L1s also showed preferential gain in chromatin accessibility but not expression. The long terminal repeat retrotransposons THE1A and THE1C also showed very rapid 5mC loss. We then show that accurate age predictors can be trained on both 5mC methylation of individual TE copies and average methylation of TE families genome wide. Lastly, we show that while old L1s gradually lose 5mC during the entire lifespan, demethylation of young L1s only happens late in life and is associated with cancer.
Similar content being viewed by others
Main
Repetitive elements (REs) are DNA sequences found in high copy number in the genome1. Transposable elements (TEs), or selfish REs, are REs that have the ability to copy themselves and move to new genomic locations, either directly as DNA (DNA transposons) or through an RNA intermediate that is reverse-transcribed (long interspersed nuclear elements (LINEs); short interspersed nuclear elements (SINEs); long terminal repeat (LTR) retrotransposons). The selfish replication of TEs has led them to occupy a large portion of genomes (around 40% in mammals). TE activity can cause cellular damage by causing insertional mutagenesis2, generation of double-stranded DNA breaks2, stimulation of inflammatory pathways3,4,5,6,7 and disruption of gene regulatory networks8. Because of their pathogenic potential, TEs are kept under tight control by the host with multiple regulatory layers9. TEs are recognized by transcription factors (TFs), notably KRAB-zinc fingers, and PIWI-interacting RNAs (piRNAs) leading to silencing by H3K9 trimethylation and DNA 5-cytosine methylation (5mC)10. Notably, other rarer forms of DNA methylation, such as 5-hydroxymethylcytosine (5hmC) and N6-methyladenine (6mA) are involved in transcriptional regulation, and are associated with gene expression11,12. However, for the rest of this study, we will focus on 5mC methylation. DNA methylation patterns are established in bulk during development and are then largely maintained throughout the lifespan, although de novo methylation and active demethylation still occur13. Previous studies in multiple organisms and tissues found that methylation patterns undergo a slow drift during aging, with many normally hypermethylated regions becoming less repressed14,15,16. At the same time, TEs are activated during aging in invertebrates, mice, human senescent cells and certain cancers2,7,17,18. Thus, it is possible that age-related alterations of DNA methylation could have a role in TE activation in humans, as has been observed in other species and during development11,19,20.
Aging clocks are statistical models trained to predict age and age-related phenotypes, including time to death21. In addition to predicting the age of samples of unknown age, for example in forensics, aging clocks have been used to study health conditions, lifestyles, and genetic or pharmacological treatments that alter an organism’s biological age. Typically, age predictions are based on omic data types, including gene expression22,23, protein abundance24, chromatin accessibility25 and, most commonly, DNA methylation26,27,28,29,30,31. One common criticism of aging clocks deals with the difficulty in interpreting the biological meaning of observed changes in DNA methylation patterns. One strategy previously used to improve clock interpretability is to group clock CpGs into different modules corresponding to different biological processes32,33.
In this study, we explored the use of TE methylation as a biomarker of age and disease. First, we reanalyzed public human blood methylation data to determine the trajectory of TE methylation during aging, comparing evolutionarily young and old TEs. We then constructed age predictors for mice and humans. Lastly, we investigated associations between accelerated age prediction, and more generally loss of methylation at TEs, and disease.
Results
Data description
To investigate changes in RE methylation that occur during aging we collected publicly available human blood methylation array data (GSE40279: n = 656; GSE64495: n = 113; GSE87648: n = 382; GSE147221: n = 679; GSE157131: n = 1,218). Later, we investigate the association between TE methylation and disease using the Women’s Health Initiative (WHI) BA23 dataset (n = 2,175). The characteristics of the dataset are summarized in Fig. 1a. All datasets were generated with the Illumina Infinium 450k array, which measures methylation at 485,578 CpGs. We annotated array CpGs based on the type of RE and genic region (exon, intron, promoter, 5′ UTR, 3′ UTR, intergenic) they laid within. Array CpGs were generally biased to genic regions, whereas complex repeats were generally found in intergenic regions or introns (Extended Data Fig. 1a). Nonetheless, 69,426 CpGs were contained within REs, mainly LINEs, SINEs, LTR retrotransposons, DNA transposons and simple repeats (Extended Data Fig. 1b). While most RE CpGs were primarily intergenic and intronic (Extended Data Fig. 1b), simple repeats and low-complexity regions were predominantly found in promoters. It should be noted that because methylation array CpGs are not a random selection of all CpGs, but are instead biased to certain regions, array data may not be generalizable to the whole genome.
a, Public human blood DNA methylation datasets and age distributions. b, Youthful methylation level and age-related drift of CpGs inside and outside of repetitive elements. c, Methylation drift rate of CpGs grouped according to major repeat class. Selfish (transposons) and nonselfish repeats were grouped separately. Only RE classes represented by 100 or more CpGs in the Infinium array are shown. d, Methylation drift rate of CpG in L1s, grouped according to family and sorted according to average sequence length, a proxy of evolutionary age. e, Methylation drift rate of CpG in LTR retrotransposons, grouped according to family and sorted according to average sequence length. d,e, Only families represented by 40 or more CpGs in the Infinium array are shown. c–e, Boxes show the median, and the 25th and 75th percentiles. The whiskers extend to the 25th and 75th percentiles ± 1.5 times the interquartile range (IQR). Points outside the whiskers are not shown.
TEs, especially young L1s become demethylated during aging
First, we investigated the age dynamics of RE and non-RE CpGs. We used limma34 to fit linear regression models to the methylation levels of all array CpGs, including age, sex and the study of origin as independent variables (Supplementary Data 1). Patients with reported health conditions in the original studies were not included in the analysis to initially focus on RE methylation changes that are associated with aging rather than disease. RE CpGs were hypermethylated in young individuals (20 years old), but were more likely to have decreased methylation in older individuals compared to non-RE CpGs (Fig. 1b). When investigating different classes of REs individually, we found that TEs (LINEs, SINEs, LTR retrotransposons, DNA transposons, SINE-VNTR-Alu (SVA) retrotransposons) were much more prone to losing methylation than nonselfish REs (tRNA, rRNA, satellites, simple repeats, low-complexity regions; Wilcoxon rank-sum test P < 2.2 × 10−16; Fig. 1c). We initially focused on L1s because they are the only TEs known to be active and autonomous in humans35. Therefore, de-repression of L1s could be sufficient to cause cellular damage. Fortunately, most L1 copies are truncated or have mutated over evolutionary timescales and are thus inactive36,37. Conversely, competent, evolutionarily young L1 copies are closer to 6,000-bp long. We found an association between the average copy length of L1 families and their propensity to become demethylated with age (Spearman’s r = -0.28, P = 0.0012; Fig. 1d). The L1 families with the most consistent methylation loss across measured CpGs were L1HS, L1PA2, L1PA3 and L1PA4, which are the four youngest L1 families present in the human genome36. Middle-aged, partially truncated L1s, such as L1PB1, L1PA15 and L1PA16, showed intermediate rates of methylation loss. Older, highly truncated families were also generally prone to methylation loss, but to a much smaller extent. We wondered if this discrepancy in drift rate could be due to representation of different parts of the L1 sequence in the array. Specifically, if the regions most affected by methylation loss are found on the L1 5′ end, the truncation commonly observed in older families could explain why older L1s may have slower drift, as they would often lack 5′ ends. We aligned CpG probe sequences to the consensus of several representative L1 families and found that indeed most young L1 array CpGs were located in the 5′ UTR, whereas older L1 CpGs were more spread throughout the L1 sequence, with a 3′ bias (Extended Data Fig. 2). However, 5′ UTR CpGs did not lose methylation at a faster rate than CpGs found in the rest of the element; thus, differences in CpG localization between young and old L1 elements do not explain differences in methylation drift rate. We then investigated other TE classes. Among LTR retrotransposons, the THE1A and THE1C families showed the fastest methylation loss (Fig. 1e). While not retrotransposition-competent, de-repression of these families was found to drive expression of oncogenes8. Most SINEs, SVAs and DNA transposons were also biased toward losing methylation during aging, but the median drift rate was relatively small and no particular family stood out (Extended Data Fig. 3a–c).
Young L1s gain accessibility but not expression during aging
We asked if the age-related changes in DNA methylation we observed in TEs and other REs were associated with coherent changes in chromatin accessibility and expression. We used matched RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin with sequencing (ATAC–seq) data from 159 human peripheral blood mononuclear cell (PBMC) samples (GSE193142, n = 155 and n = 143 after quality control for RNA-seq and ATAC–seq, respectively; Supplementary Data 3)25. Chromatin accessibility of TEs followed similar trends as DNA methylation: most LINE, LTR, SVA and DNA transposon families showed increased accessibility during aging (Extended Data Fig. 4b). We also observed a stronger gain in accessibility at young L1s compared to old, and age-related changes in L1 family methylation and accessibility were highly correlated (Pearson’s r = -0.443, P < 0.001; Extended Data Fig. 4c). Interestingly, tRNAs and rRNAs showed a decline in accessibility with age, whereas they had not shown particular gains in DNA methylation. TE expression, on the other hand, did not follow similar trends as DNA methylation or chromatin accessibility (Extended Data Fig. 4a). L1 families in general showed a mild increase in expression during aging, but the youngest L1 families did not increase in expression (Extended Data Fig. 4c). Similar findings were recently reported, where inducing large-scale demethylation using 5-aza-2′-deoxycytidine only led to expression of few L1HS loci, with refractory loci being primarily located in heterochromatic regions38. However, in our findings, we also saw increased accessibility at young L1s, which indicates heterochromatin relaxation at these elements. Then, perhaps expression of these young L1s is prevented by RNA silencing pathways, which are crucial in TE repression.
Demethylation of young L1s outpaces passive methylation loss
The difference in demethylation rate between young and old L1s could indicate that they become de-repressed by different means: de-repression of old L1s may be a result of global age-related methylation loss, which has been previously documented and is often attributed to imperfect maintenance of methylation marks by DNA methyltransferase 1 (DNMT1)15. Conversely, young L1s may be actively de-repressed by recruiting activating TFs at their 5′ UTR39. Alternatively, this discrepancy may be explained by differences between the CpG landscape of young and old L1 families. For example, young L1s have a higher CpG density, which is gradually lost over evolutionary timescales because of C>T mutations40, and CpG density affects the rate of passive methylation loss41,42. Additionally, the initial (after development) level of CpG methylation may affect the methylation drift rate simply because highly polarized states (for example, fully methylated) can only lose methylation, while intermediate methylation states can both gain and lose methylation during aging. Thus, we modeled the average methylation drift rate of CpGs based on local CpG density, youthful methylation level and the interaction of the two (Fig. 2a,b). This model explained 24.7% of age coefficient variation and confirmed previous reports that low CpG density is associated with age-related methylation loss. Hypomethylated CpGs (<20% methylated) were more likely to gain methylation during aging; however, hypermethylated CpGs (>80% methylated) were not particularly biased toward methylation loss. We then adjusted the previously calculated age coefficients with this information (Supplementary Data 1). These adjusted age coefficients should be interpreted as ‘the age drift rate of a given CpG, compared to what would be expected from the average CpG with the same local CpG density and youthful methylation level’. The adjusted coefficients of most TE families of all five major classes were close to zero or even slightly positive, meaning that their age-related methylation loss could be explained by the local CpG context and youthful methylation state and is probably a passive phenomenon (Fig. 2c,d and Extended Data Fig. 3d–f). Conversely, L1HS, L1PA2, L1PA3 and L1PA4 retained a higher-than-expected rate of methylation loss, reinforcing the hypothesis that their de-repression may be, at least in part, an active process.
a, Trends of methylation drift based on youthful methylation levels. b, Trends of methylation drift based on local CpG density. c,d, Age coefficient of methylation at LINEs (c) and LTR retrotransposons (d) after adjustment for CpG density and youthful methylation level. Only families represented by 40 or more CpGs in the Infinium array are shown. e, TF binding motifs enriched at young L1s and associated with increased or decreased methylation drift rate in young L1s (two-sided Wilcoxon rank-sum test, P < 0.05 for both adjusted and unadjusted age drift rate coefficient). P values are only shown for comparisons in young L1s; the exact P values are, in order, 0.021, 0.039, 0.028, 0.016. n of CpGs in old L1s: 341; n CpGs in young L1s: 621, of which 416 are flanked by an ARNTL motif, 589 by an NFKB2 motif, 375 by a FOXO1 motif and 390 by a HIC2 motif. Additional motifs are shown in Extended Data Fig. 5. c–e, Boxes show the median and the 25th and 75th percentiles. The whiskers extend to the 25th and 75th percentiles ± 1.5 times the IQR. Points outside the whiskers are not shown. * P < 0.05.
TF binding motifs associating with differences in methylation drift rate between L1s
Because age-related methylation loss at young L1s could not be fully described as a passive process, we searched for TF binding motifs whose presence or absence may be associated with faster methylation loss at young L1s compared to older L1s (after accounting for CpG density and youthful methylation). Because of the evolutionary arms race between TEs and host, we expected to find that younger L1s may have lost motifs important for repression, such as those of KRAB-zinc fingers, and potentially may have gained new activators43,44. We used simple enrichment analysis (SEA)45 to search for motifs enriched within 250 bp upstream and downstream of CpGs in L1HS, L1PA2, L1PA3 and L1PA4, compared to CpGs in older L1 families. In particular, we selected L1PB1, L1PA15, L1PA16, L1M1, L1M2 and L1MA3 as representative older L1s because while older than L1HS, L1PA2, L1PA3 and L1PA4, they showed the lowest amount of truncation among L1 families adequately represented in the Infinium array (>40 CpGs per family; Fig. 1d). Moreover, L1PA15, L1PA16, L1PB1 and L1MA3 are primate-specific families36. We took this precaution to ensure that the sequences used for motif enrichment would be as complete and as comparable as possible. This yielded 621 sequences in young L1s and 341 in older ones. Motif enrichment using SEA found 620 motifs enriched in young L1s (Supplementary Data 2), including motifs of known regulators of L1 expression (YY1 (ref. 46), SRY47, ZFP57 (ref. 48)). To narrow down this list, we further restricted our analysis to motifs found in the majority of young L1s (>50%) and a minority of old ones (<20%), yielding 103 motifs. These motifs could be both activating or repressive as both types of motifs could be subject to mutation over evolutionary timescales and thus be absent in older L1s, leading to an enrichment in young L1s compared to old ones. To infer the regulatory role of these motifs, we further filtered these 103 motifs to ones whose presence was associated with accelerated or attenuated methylation loss at young L1 CpGs. Thus, we considered motifs found at young L1s with decreased methylation loss as putative repressive; conversely, we considered motifs found at young L1s with accelerated methylation loss as putative activating. However, no motif fell in the latter category (Fig. 2e and Extended Data Fig. 5a). The putative repressive set included 20 motifs. Of these, 13 had a similar consensus sequence (GTAACA) belonging to the forkhead box (FOX) TF family. FOXA1 was reported to activate, rather than repress L1s37; however, given the extreme similarity between the recognition sequence of many FOX family members, we suspect that FOXA1 may not be the sole regulator of L1 activity and that other members of the family may have repressive effects. Although FOXA1 motifs were enriched at young L1 CpGs, young L1s CpGs with FOXA1 motifs did not lose methylation faster than those without. Instead, the presence of several other factors, such as FOXO1, FOXE1 and FOXL1, was associated with a mildly attenuated methylation drift rate (full results in Supplementary Data 2). The remaining motifs belonged to the TFs NFKB2, NR1I2, AR, RXRA::VDR, hypermethylated in cancer 2 (HIC2), SP5 and BMAL1 (also called ARNTL). NFKB2 is a member of the noncanonical NFKB pathway; a previous study found that NFKB activation by TRIM5α induced L1 promoter silencing49. ARNTL is a circadian regulator recently shown to be an important repressor of L1 (ref. 50). Decreased ARNTL expression leads to an accelerated aging phenotype in human and macaque cells because of stimulation of cytosolic DNA sensing pathways by L1. HIC2 and the highly homologous HIC1 were downregulated in multiple cancer types51,52,53. Interestingly, HIC2 also induced SIRT1 expression54, which itself may be involved in L1 silencing55. Lastly, we performed motif enrichment directly comparing young L1 copies with faster-than-median methylation drift to those with slower-than-median methylation drift, regardless of enrichment compared to older families. The only motif enriched at ‘fast demethylating’ L1HS CpGs was that of BHLHE22, a factor with no known associations with TE activity (Extended Data Fig. 5c). Two studies investigated its role as a modulator of inflammation, finding an anti-inflammatory effect in prostate cancer, but a pro-inflammatory effect in endometrial cancer56,57. Future efforts may want to investigate this apparent contradiction and investigate if the pro-inflammatory effect may be mediated by L1HS activation. Fast demethylating L1PA2 CpGs showed enrichment for the most TFs, in part because of large representation in the Infinium array (318 CpGs in total; Extended Data Fig. 5b). A notable number of TFs were members of the AP-1 complex (JUN–BATF–FOSL1), which initiates the senescence program and upregulates part of the senescence-associated secretory phenotype58,59. It has also been observed that inflammation in senescent cells is in part driven by L1 and ERV expression18,60. No motifs were enriched at fast demethylating L1PA3 CpGs, whereas motifs enriched at fast demethylating L1PA4 CpGs included RUNX3, a known L1 activator (Extended Data Fig. 5d)61.
TE methylation as an accurate and interpretable biomarker of age
Next, we investigated if the methylation state of TEs could be used to predict chronological age. Thus, we selected CpGs found in TEs (LINEs, SINEs, LTR retrotransposons, DNA transposons, normalized CpGs = 56,352; Fig. 3a) and trained an elastic net model on a portion of our data (n = 999), leaving out a portion of each dataset (n = 248) and the entirety of GSE64495 (n = 104) as external validation (Fig. 3b,d). The coefficients are available in Supplementary Data 4. This individual CpG TE clock was in both cases highly accurate (root mean square error (RMSE) = 5.58, median absolute error (MAE) = 2.96, r = 0.95 on GSE64495). We compared this performance with other state-of-the-art chronological age clocks and found that the individual CpG TE clock performed better than the Hannum clock and Horvath Pan-tissue clock but worse than the Horvath Skin & Blood clock. Thus, the methylation state of individual CpGs in TEs can be used to construct a remarkably accurate clock. We additionally visualized how many CpGs selected by the individual CpG TE clock were shared with the aforementioned published clocks and their importance in the model (Extended Data Fig. 6b and Supplementary Data 4). In general, there was little overlap between the clocks, probably because of redundancy in the biological information provided by CpGs, and the imposed restriction to use CpGs in TEs only.
a, Feature construction strategy. b, Test set performance of the individual CpG clock. c, Test set performance of the combined CpG clock. d, Benchmark of individual and combined CpG clocks against state-of-the art methylation clocks. The benchmark was performed on GSE64495, which was not included in the training set of any of the clocks shown. e, Performance of a combined CpG clock trained on multi-tissue mouse RRBS data, tested using nested cross-validation. f, Age prediction on long-lived mouse strains compared to matching controls (two-sided Wilcoxon rank-sum test: *P = 0.026 and **P = 0.002, respectively). Boxes show the median and 25th and 75th percentiles. The whiskers extend to the 25th and 75th percentiles ± 1.5 times the IQR. Points outside whiskers (outliers) are shown individually.
While constructing a biomarker on a particular biological process such as TE de-repression can help with interpretability, further considerations should be made. Most importantly, transposons are disseminated everywhere in the genome, including near genes and very commonly in introns. Thus, while the state of methylation of a single TE CpG may be representative of the status of that TE copy, it may also be affected by the local chromatin context (for example, whether a nearby gene is transcribed or not). To further improve interpretability, we trained a new clock, this time on the average genome-wide methylation state of TE families, separating genic and intergenic TE copies. We chose not to completely discard genic TE copies because a sizable portion of TEs, including some active L1s, is found in introns. Additionally, we only kept groups of at least five CpGs to reduce the impact of the local regulatory context at each CpG and ensure that each feature could be interpreted as the global methylation of a given TE family (Fig. 3a). Validation was again performed on a portion of each dataset (n = 248) and the entirety of GSE64495 (n = 104). The coefficients are available in Supplementary Data 4. We were surprised to see that while performance of this combined CpG TE clock was worse than that of the individual CpG TE clock, it was still satisfactory (Fig. 3c,d). In particular, it still had an r of 0.90 when validated on the external dataset GSE64495.
Lastly, we applied the same combined CpG training strategy on the reduced representation bisulfite sequencing (RRBS) data of multiple mouse tissues. Because of limited data availability, the predictor was trained and validated using nested cross-validation, once again only including wild-type (WT) untreated mice (n = 276). The coefficients are available in Supplementary Data 4. This again yielded an accurate predictor, with r = 0.90 (Fig. 3e). Thus, our feature construction strategy was successful on multiple sequencing platforms, tissues and organisms. We note that while the strategy was indeed successful across different species, generating a single TE-based biomarker for multiple species would be difficult because TEs evolve very rapidly. For example, mice and humans have a very different number and set of active TEs36,62.
Accelerated TE methylation age is associated with health status
Next, we investigated associations between age acceleration (the difference between predicted and chronological age) and health status. We tested our biomarkers on methylation data from the WHI, a long-term study, deeply phenotyping postmenopausal women. Specifically, we used data from substudy BA23, consisting of 2,175 women aged 50–79 years at baseline, of which ~1,070 developed coronary heart disease (CHD) during the follow-up. We examined associations between age acceleration and time to death, diagnosis of any cancer and CHD using Cox regression, including chronological age as a covariate. Accelerated aging according to the individual CpG TE clock was significantly associated with higher risk for all three outcomes (Fig. 4a). Age acceleration according to PhenoAge29, an aging biomarker trained on clinical phenotypes rather than chronological age alone, had similar associations with risk of cancer and mortality as our individual CpG TE clock. However, increased CHD risk was most associated with age acceleration according to PhenoAge. Our combined CpG TE clock, on the other hand, showed no significant associations with cancer or CHD risk, but was still associated with risk of death. We suspect this may be because of the decreased accuracy of this predictor, which relies on genome-wide methylation features. We additionally tested our mouse RRBS clock on data from ref. 63, which included long-lived growth hormone receptor knockout (GHRKO) and Snell dwarf mice63. Matching WT controls were not used to train the RRBS clock. Excitingly, both GHRKO and Snell dwarf mice were predicted as significantly younger than matching controls (Fig. 3f; Wilcoxon rank-sum test: P = 0.026 and 0.002, respectively). Thus, we conclude that both individual CpG clock and combined CpG TE clock show an association with the health status of the individual and not only their chronological age.
a, Association between age acceleration and risk of cancer, CHD and mortality according to the individual and combined CpG clocks in the WHI BA23 dataset. Results were benchmarked against state-of-the-art chronological age (Horvath Pan-tissue and Horvath Skin & Blood) and biological age (Levine PhenoAge) clocks. Associations between predicted age acceleration and risks were determined with a Cox proportional hazards model, accounting for age. The bars represent coefficients ± s.e. b, Age trajectory of methylation at young (first row) and old L1s with the largest coefficients in the combined CpG clock (second row). Data are from GSE40279. The orange dashed line shows a linear fit ± 95% confidence interval (CI), excluding patients older than 65 years. The teal line shows a locally estimated scatterplot smoothing fit ± 95% CI on the full age range.**P < 0.01, ***P < 0.001. c, Effect of cancer within 3 years and age on methylation of young and old L1s in the WHI data. Coefficients and P values were determined by fitting a linear model with the following formula: methylation ~ age + any cancer in 3 years. d, Performance of predictors of risk of cancer, CHD and mortality within 3 years, based on young L1 CpGs. The bars represent the mean area under the receiver operating characteristic curve (ROC a.u.c.) ± s.d. in tenfold cross-validation. Best and parsimonious models are shown.
Properties of young and old L1s as biomarkers
Finally, we investigated the TE families selected by our combined CpG clock. Among the notable TE families we identified, only L1HS (genic) was chosen as a feature by the human combined CpG clock, with methylation loss associating with increasing age. However, several older L1 families were chosen with stronger coefficients (for example, L1MEi, L1PA11, L1MA4A, L1M7). We found this puzzling, because we expected that the strong age association of younger L1s (L1HS, L1PA2, L1PA3 and L1PA4) would make them useful for age prediction. Thus, we investigated the exact trajectory of young L1 de-repression in greater detail (Fig. 4b). We were surprised to see that young L1s had negligible methylation loss under the age of 65 and then rapidly lost methylation in older patients with a nonlinear trajectory. In comparison, the older L1 families selected by our combined CpG predictor showed a more linear trajectory and began demethylating at younger ages. This led us to suspect that older, ‘passively demethylating’ TE families may be better predictors of chronological age, whereas methylation loss at younger TEs, in particular those with pathogenic potential, may be better predictors of disease risk. Thus, we modeled average methylation at young L1s (L1HS, L1PA2, L1PA3, L1PA4) and old L1s with large clock coefficients (L1MEi, L1PA11, L1MA4A, L1M7) as a function of age, this time including whether individuals would be diagnosed with any cancer within 3 years of sample collection (linear model: methylation ~ age + any cancer in 3 years; Fig. 4c). We found that cancer was significantly associated with decreased methylation of young L1s, but not older ones, although a trend was still present. Conversely, when accounting for cancer, age was associated with decreased methylation at older L1s but not at young ones. We found that most cancer types in the dataset showed lower than average L1 methylation, although the sample size for most types was insufficient to draw strong conclusions (Extended Data Fig. 7a). The main exception to this trend was lung cancer, which showed higher than average young L1 methylation compared to healthy patients. With this knowledge, we trained predictors of cancer, CHD and mortality within the next 3 years solely based on young L1 CpGs (n = 621) in the WHI data. These events were quite rare (cancer: n = 52; CHD: n = 140; death: n = 39; total: n = 2,175) making training challenging. Nonetheless, the resulting models had mild predictive ability (Fig. 4d). Interestingly, while the mortality and CHD predictors were rather complex, even when choosing the optimal model with parsimony (best mortality predictor: normalized CpGs = 136; parsimonious mortality predictor: normalized CpGs = 36; best CHD predictor: normalized CpGs = 135; parsimonious CHD predictor: normalized CpGs = 52; Supplementary Data 5), the cancer predictors were remarkably simple, using only a handful of CpGs. The simplest model based predictions on just two CpGs: cg07575166, found in an intergenic L1HS 5′ UTR, and cg26106149, located in a full-length L1PA3 in an intron of FBXL4, a gene with no known role in cancer initiation. The more complex model used four more CpGs but assigned the most weight to the aforementioned two. We trained an additional set of models allowing THE1A and THE1C CpGs to be selected by the model, intending to determine if the reported links between THE1 element activation and cancer could improve predictive performance; however, this only led to improvements in the performance of the CHD predictor and not of the cancer and mortality predictors (Extended Data Fig. 7b).
Discussion
In summary, we studied the age dynamics of TE methylation, finding that most TEs, from evolutionarily young to ancestral ones, were likely to lose methylation during the course of aging. However, this tendency was accentuated for young L1 elements (L1HS, L1PA2, L1PA3 and L1PA4), and two LTR retrotransposon families (THE1A and THE1C). We observed similar patterns of de-repression in chromatin accessibility data, but not in gene expression data, suggesting that RNA interference pathways may be able to compensate for the looser chromatin state, as found in other studies38. Future works should investigate if RNA interference pathways or other epigenetic mechanisms are indeed responsible for this discrepancy.
Local CpG density and youthful methylation affect methylation drift rate during aging. The rate of methylation loss at most TEs was well described by those two factors, but this was not the case for young L1s. Thus, we hypothesize that most TEs have lost their regulatory sequences, and thus lose methylation passively. Conversely, young L1s probably still contain regulatory sequences that enable recruitment of activating epigenetic machinery. Therefore, we searched for TF binding motifs enriched at young L1s compared to older ones and which associated with faster or slower methylation loss in young L1s. Several TF binding motifs, including FOX, BMAL1 (ARNTL), HIC2 and NFKB2, were associated with slower drift. However, we did not find motifs associated with faster methylation loss. An additional way to identify evidence of active de-repression of young L1s would be to measure methylation marks such as 5hmC or 6mA, which are associated with transcriptional activation, as opposed to 5mC11,12.
We next explored the use of TE methylation loss as a biomarker of age and disease. An age predictor based on individual CpGs found in TEs had remarkable accuracy and showed associations with cancer and mortality comparable to PhenoAge. Another study recently obtained similar results using individual CpGs within full-length L1s and human endogenous retroviruses64. We generated additional predictors based on the average methylation of TEs genome-wide, for both human blood methylation array data and multi-tissue mouse RRBS data. While less accurate than their individual CpG counterparts, these predictors were still satisfactory (r > 0.9) and showed associations with health status. Future studies may want to explore clinical uses for such biomarkers to prevent disease and inform lifestyle choices. An important question is which health conditions beyond cancer and overall mortality can be predicted by these biomarkers. Additionally, restricting feature selection to TEs provides some benefits in terms of interpretability; however, the models remain complex and more work is required to establish the precise links between TE de-repression and aging or disease, especially for older transposon families. Indeed, we were surprised to see that these predictors did not mainly rely on young L1s, despite their strong age association, prompting us to investigate the exact timing of young L1 de-repression. We found that young L1s rapidly de-repressed only after the age of 65 and were otherwise very stable beforehand. This age coincides with the age of onset of many age-related diseases. Thus, we explored associations between loss of methylation and disease, finding that methylation loss at young L1s was associated with cancer but not age, while the opposite was true for the older L1s selected by the clock. Finally, we trained predictors for cancer, CHD and mortality within 3 years of the methylation measurement, solely based on young L1 CpGs. The mortality and cancer predictors were mildly successful; in particular, the cancer predictor made use of only two CpGs in young L1s. Future studies may want to investigate the mechanism behind this seemingly direct relationship. An obvious question is whether young L1 de-repression is the cause or consequence of cancer. Indeed, both mechanisms are possible as mutations of epigenetic machinery are common in cancer65. However, because the loss of CpGs was detected in the blood and was predictive of cancer events in other organs, it is possible that TE de-repression may promote cancer by accelerating inflammation or by promoting other pathological processes through other non-cell-autonomous mechanisms. Finally, loss of methylation at young L1s could be neither the cause nor the consequence of cancer; instead, both events could have common drivers. Clonal hematopoiesis is a probable suspect as the most common mutation in clonal hematopoiesis is loss of function of DNMT3A, a de novo methyltransferase66,67,68, which may also contribute to the loss of methylation on TEs.
Methods
Datasets
We used four public human blood array datasets (GSE64495 (ref. 69), GSE40279 (ref. 26), GSE157131 (ref. 70) and GSE147221 (ref. 71)) to determine the associations between age and TE methylation loss, and to train and validate the human age predictors. GSE87648 (ref. 72) was only included in predictor training and validation because it had an internal batch effect (determined by principal component analysis). The WHI human blood dataset BA23 (https://www.whi.org/study/BA23) and related metadata were used to investigate the relationships between TE clock age acceleration and risk of disease and mortality, and later to investigate the associations between young L1 methylation loss and disease. The mouse multi-tissue datasets GSE60012 (ref. 73), GSE93957 (ref. 74) and GSE80672 (ref. 63) were used to train and validate the mouse age predictor. All data were used as preprocessed by the original authors with the exception of GSE60012, as the needed processed files were unavailable. The matched ATAC–seq and RNA-seq dataset GSE193142 (ref. 25) from human PBMCs was used to determine if the observed alterations in RE methylation were associated with consistent changes in chromatin accessibility and expression.
Annotation of CpGs and REs
The coordinates of Infinium array CpGs were obtained from the Illumina manifest. We used RepeatMasker to annotate repeats in the GRCh37 and GRCm38 genomes. ChIPSeeker75 was used to annotate the genomic context of CpGs. Consensus sequences for TEs were obtained from Repbase76. The position of 5′ UTR, open reading frames 1 and 2, 3′ UTR, LTR and internal sequences were derived from the information included with the consensus sequences in the EMBL format. CpGs were then located in consensus sequences by aligning the probe target sequences with the consensus using pairwiseAlignment in the Biostrings R package (http://bioconductor.org/packages/Biostrings/).
RRBS data processing
Raw reads were downloaded from the Sequence Read Archive (SRA) and trimmed using Trim Galore! (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with the --rrbs option. We aligned the trimmed reads to the GRCm38 genome build using Bismark (https://www.bioinformatics.babraham.ac.uk/projects/bismark/) and quantified methylation using bismark_methylation_extractor and bismark2bedGraph.
ATAC–seq data processing
Raw reads were downloaded from the SRA, trimmed with Trim Galore! using default settings and aligned to the GRCh38 using Bowtie 2 with the following options: --very-sensitive -X 1,000 --dovetail. Multimapping, mitochondrial reads and PCR duplicates were removed using SAMtools77 and Picard MarkDuplicates (https://broadinstitute.github.io/picard/). Reads over REs were counted with featureCounts78 with the options -p -B--read2pos 5, over regions defined by an SAF file derived from the RepeatMasker output. All RE loci were then collapsed by summing the counts of loci of the same TE family. Age coefficients were fitted using edgeR79 by including age and sex as independent variables. Comparison with DNA methylation data was only performed for REs that were adequately represented in the Infinium array (no. of CpGs per RE class ≥ 100; no. of CpGs per RE family > 40).
RNA-seq data processing
Raw reads were downloaded from SRA, trimmed with Trim Galore! using default settings and aligned to GRCh37 using STAR with the following options: --outFilterMultimapNmax 100--winAnchorMultimapNmax 200--outFilterMismatchNoverLmax 0.04. Reads over REs were counted and collapsed to family using TEcounts, using the RE annotations provided on the TEtranscripts website (https://www.mghlab.org/software/tetranscripts). Age coefficients were fitted using edgeR by including age and sex as independent variables, filtering out REs expressed in fewer than three samples. Comparison with DNA methylation data was only performed for REs that were adequately represented in the Infinium array (no. of CpGs per RE class ≥ 100; no. of CpGs per RE family > 40).
Motif analysis
TF binding motif enrichment was performed with SEA45 on 500-bp sequences flanking young and old L1 CpGs (250 bp upstream and downstream). We used the JASPAR (nonredundant) vertebrate DNA motif database and an E-value threshold of less than ten. The SEA output was then further filtered to focus on motifs highly enriched on young L1s (present near more than 50% of young L1 CpGs and less than 20% of old L1 CpGs) and whose presence in young L1s was associated with faster or slower methylation loss (Wilcoxon rank-sum test P < 0.05 for both adjusted and unadjusted age coefficients). We performed additional motif enrichment analysis contrasting young L1 CpGs with faster-than-median methylation loss to those with slower-than-median methylation loss. No further filtering was applied to this analysis.
Predictor training and validation
All predictors in this study are a form of elastic net, implemented using the glmnet R package. Age predictors use the Gaussian family argument whereas the disease and mortality predictors use the binomial (logistic) family argument. Age predictions were evaluated using the RMSE, MAE and Pearson’s r. Disease and mortality predictions were evaluated using the ROC a.u.c. Before training and predicting, we transformed ages using the same age transformation used by Horvath in the Pan-tissue27 and Skin & Blood28 clocks. Briefly, ages below the age of maturity (20 years for humans, 6 weeks for mice) were log-transformed to linearize the relationship between age and methylation in developmental stages. When sufficient samples were available, we validated our predictors by leaving out a portion of all data and an entire dataset (GSE64495) for testing, and training and choosing hyperparameters on the remainder of the data using cross-validation. When the number of samples was limited, we used nested cross-validation. Hyperparameters were explored using grid search and selected to give the lowest cross-validation mean squared error or ROC a.u.c., with the exception of the models we called ‘parsimonious’ for which hyperparameters were selected to give the simplest model within 1 s.d. of the best performance. Any individual with known health conditions or treatments were excluded from model training. The matching WT controls of the GHRKO and Snell dwarf strains were also excluded from clock training to have a fair comparison.
Predictor benchmarking
We downloaded the clock coefficients published with the original manuscripts. Ages were transformed (and inverse-transformed) for prediction if required (Horvath Pan-tissue and Skin & Blood). All clocks were then applied to the same samples of GSE64495 and the WHI BA23 dataset. Clock features with missing values in the WHI BA23 (1.5% of all values) were imputed using the makeX R function.
Statistics and reproducibility
No statistical methods were used to predetermine sample sizes. Instead, we estimated the number of samples required to successfully train age predictors based on previous publications26,27,28,29 and assembled the public datasets accordingly. Individuals with known health conditions were excluded from any analyses investigating normal aging and from predictor training. Further outlier removal was performed before predictor training and benchmarking based on principal component analysis. Briefly, samples outside of the 95% confidence ellipse calculated using the stat_ellipse R function were excluded. This was performed separately for each dataset, sex and health status. Where possible, nonparametric statistical tests were used; in other cases, data distribution was not formally tested. Associations between age and Infinium array CpG methylation were determined using limma34, with the age + sex + study design. The fitted coefficients were used as methylation drift rates, whereas methylation at 20 years of age was calculated as intercept + coefficient × 20. Our fitting of expected age drift as a function of CpG density and youthful methylation level used a general additive model (gam) with covariates for CpG density within 100 bp of the CpG in question, the methylation of that CpG at 20 years of age and the interaction of the two covariates (age_coef ~ s(methylation at 20 years, bs = ‘cs’) + s(CpG_density, bs = ‘cs’) + s(methylation at 20 years, bs = ‘cs’, by = CpG_density)). Associations between age acceleration and mortality or disease risk were tested using a Cox regression model (coxph in R) with the formula Surv(time-to-event, status) − acceleration + age. The boxplots show the median, lower and upper quartiles and 1.5 times the IQR, with outliers shown individually unless indicated otherwise. The barplot error bars represent the s.d.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All data analyzed in this study are publicly available on Gene Expression Omnibus under accession nos. GSE40279, GSE64496, GSE87648, GSE147221, GSE157131, GSE60012, GSE80672, GSE93957 and GSE193142. The WHI BA23 datasets is under controlled access, which can be requested at https://www.whi.org/. The genomic references GRCh37, GRCh38 and GRCm38 were sourced from ENSEMBL (https://www.ensembl.org/index.html).
Code availability
The code used for preprocessing and analyzing the data is available at https://github.com/SunScript0/TE-clock.
References
Liao, X. et al. Repetitive DNA sequence detection and its role in the human genome. Commun. Biol. 6, 954 (2023).
Gorbunova, V. et al. The role of retrotransposable elements in ageing and age-associated diseases. Nature 596, 43–53 (2021).
Decout, A., Katz, J. D., Venkatraman, S. & Ablasser, A. The cGAS–STING pathway as a therapeutic target in inflammatory diseases. Nat. Rev. Immunol. 21, 548–569 (2021).
Stetson, D. B., Ko, J. S., Heidmann, T. & Medzhitov, R. Trex1 prevents cell-intrinsic initiation of autoimmunity. Cell 134, 587–598 (2008).
Thomas, C. A. et al. Modeling of TREX1-dependent autoimmune disease using human stem cells highlights L1 accumulation as a source of neuroinflammation. Cell Stem Cell 21, 319–331 (2017).
Gázquez-Gutiérrez, A., Witteveldt, J., Heras, S. R. & Macias, S. Sensing of transposable elements by the antiviral innate immune system. RNA 27, 735–752 (2021).
Simon, M. et al. LINE1 derepression in aged wild-type and SIRT6-deficient mice drives inflammation. Cell Metab. 29, 871–885 (2019).
Babaian, A. & Mager, D. L. Endogenous retroviral promoter exaptation in human cancer. Mob. DNA 7, 24 (2016).
Di Stefano, L. All quiet on the TE front? The role of chromatin in transposable element silencing. Cells 11, 2501 (2022).
Jansz, N. DNA methylation dynamics at transposable elements in mammals. Essays Biochem. 63, 677–689 (2019).
Sturm, Á. et al. Downregulation of transposable elements extends lifespan in Caenorhabditis elegans. Nat. Commun. 14, 5278 (2023).
Cui, X.-L. et al. A human tissue map of 5-hydroxymethylcytosines exhibits tissue specificity through gene and enhancer modulation. Nat. Commun. 11, 6161 (2020).
Greenberg, M. V. C. & Bourc’his, D. The diverse roles of DNA methylation in mammalian development and disease. Nat. Rev. Mol. Cell Biol. 20, 590–607 (2019).
Xiao, F.-H., Kong, Q.-P., Perry, B. & He, Y.-H. Progress on the role of DNA methylation in aging and longevity. Brief. Funct. Genomics 15, 454–459 (2016).
Wang, K. et al. Epigenetic regulation of aging: implications for interventions of aging and diseases. Signal Transduct. Target. Ther. 7, 374 (2022).
Sen, P., Shah, P. P., Nativio, R. & Berger, S. L. Epigenetic mechanisms of longevity and aging. Cell 166, 822–839 (2016).
De Cecco, M. et al. Transposable elements become active and mobile in the genomes of aging mammalian somatic tissues. Aging 5, 867–883 (2013).
De Cecco, M. et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature 566, 73–78 (2019).
Walter, M., Teissandier, A., Pérez-Palacios, R. & Bourc’his, D. An epigenetic switch ensures transposon repression upon dynamic loss of DNA methylation in embryonic stem cells. eLife 5, e11418 (2016).
Kaluscha, S. et al. Evidence that direct inhibition of transcription factor binding is the prevailing mode of gene and repeat repression by DNA methylation. Nat. Genet. 54, 1895–1906 (2022).
Bell, C. G. et al. DNA methylation aging clocks: challenges and recommendations. Genome Biol. 20, 249 (2019).
Meyer, D. H. & Schumacher, B. BiT age: a transcriptome‐based aging clock near the theoretical limit of accuracy. Aging Cell 20, e13320 (2021).
LaRocca, T. J., Cavalier, A. N. & Wahl, D. Repetitive elements as a transcriptomic marker of aging: evidence in multiple datasets and models. Aging Cell 19, e13167 (2020).
Lehallier, B., Shokhirev, M. N., Wyss-Coray, T. & Johnson, A. A. Data mining of human plasma proteins generates a multitude of highly predictive aging clocks that reflect different aspects of aging. Aging Cell 19, e13256 (2020).
Morandini, F. et al. ATAC-clock: an aging clock based on chromatin accessibility. Geroscience 46, 1789–1806 (2024).
Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359–367 (2013).
Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, 3156 (2013).
Horvath, S. et al. Epigenetic clock for skin and blood cells applied to Hutchinson Gilford Progeria Syndrome and ex vivo studies. Aging 10, 1758–1775 (2018).
Levine, M. E. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging 10, 573–591 (2018).
Lu, A. T. et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging 11, 303–327 (2019).
Lu, A. T. et al. DNA methylation GrimAge version 2. Aging 14, 9484–9549 (2022).
Levine, M. E., Higgins-Chen, A., Thrush, K., Minteer, C. & Niimi, P. Clock Work: deconstructing the epigenetic clock signals in aging, disease, and reprogramming. Preprint at bioRxiv https://doi.org/10.1101/2022.02.13.480245 (2022).
Moqri, M. et al. PRC2-AgeIndex as a universal biomarker of aging and rejuvenation. Nat. Commun. 15, 5956 (2024).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Beck, C. R., Garcia-Perez, J. L., Badge, R. M. & Moran, J. V. LINE-1 elements in structural variation and disease. Annu. Rev. Genomics Hum. Genet. 12, 187–215 (2011).
Khan, H., Smit, A. & Boissinot, S. Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res. 16, 78–87 (2006).
Boissinot, S. & Sookdeo, A. The evolution of LINE-1 in vertebrates. Genome Biol. Evol. 8, 3485–3507 (2016).
Lanciano, S. et al. Locus-level L1 DNA methylation profiling reveals the epigenetic and transcriptional interplay between L1s and their integration sites. Cell Genom. 4, 100498 (2024).
Protasova, M. S., Andreeva, T. V. & Rogaev, E. I. Factors regulating the activity of LINE1 retrotransposons. Genes 12, 1562 (2021).
Zhou, W., Liang, G., Molloy, P. L. & Jones, P. A. DNA methylation enables transposable element-driven genome expansion. Proc. Natl Acad. Sci. USA 117, 19359–19366 (2020).
Bertucci, E. M. & Parrott, B. B. Is CpG density the link between epigenetic aging and lifespan? Trends Genet. 36, 725–727 (2020).
Higham, J. et al. Local CpG density affects the trajectory and variance of age-associated DNA methylation changes. Genome Biol. 23, 216 (2022).
Bruno, M., Mahgoub, M. & Macfarlan, T. S. The arms race between KRAB-zinc finger proteins and endogenous retroelements and its impact on mammals. Annu. Rev. Genet. 53, 393–416 (2019).
Zuo, Z. Quantifying the arms race between LINE-1 and KRAB-zinc finger genes through TECookbook. NAR Genom. Bioinform. 5, lqad078 (2023).
Bailey, T. L. & Grant, C. E. SEA: simple enrichment analysis of motifs. 2021.08.23.457422 Preprint at bioRxiv https://doi.org/10.1101/2021.08.23.457422 (2021).
Becker, K. G., Swergold, G. D., Ozato, K. & Thayer, R. E. Binding of the ubiquitous nuclear transcription factor YY1 to a cis regulatory sequence in the human LINE-1 transposable element. Hum. Mol. Genet. 2, 1697–1702 (1993).
Tchénio, T., Casella, J.-F. & Heidmann, T. Members of the SRY family regulate the human LINE retrotransposons. Nucleic Acids Res. 28, 411–415 (2000).
Shi, H. et al. ZFP57 regulation of transposable elements and gene expression within and beyond imprinted domains. Epigenetics Chromatin 12, 49 (2019).
Volkmann, B. et al. Human TRIM5α senses and restricts LINE-1 elements. Proc. Natl Acad. Sci. USA 117, 17965–17976 (2020).
Liang, C. et al. BMAL1 moonlighting as a gatekeeper for LINE1 repression and cellular senescence in primates. Nucleic Acids Res. 50, 3323–3347 (2022).
Luo, F. et al. Hypermethylation of HIC2 is a potential prognostic biomarker and tumor suppressor of glioma based on bioinformatics analysis and experiments. CNS Neurosci. Ther. 29, 1154–1167 (2023).
Fleuriel, C. et al. HIC1 (hypermethylated in cancer 1) epigenetic silencing in tumors. Int. J. Biochem. Cell Biol. 41, 26 (2009).
Fujii, H. et al. Methylation of the HIC-1 candidate tumor suppressor gene in human breast cancer. Oncogene 16, 2159–2164 (1998).
Song, J.-Y. et al. HIC2, a new transcription activator of SIRT1. FEBS Lett. 593, 1763–1776 (2019).
Maugeri, A. et al. Resveratrol modulates SIRT1 and DNMT functions and restores LINE-1 methylation levels in ARPE-19 cells under oxidative stress and inflammation. Int. J. Mol. Sci. 19, 2118 (2018).
Darmawi et al. BHLHE22 expression is associated with a proinflammatory immune microenvironment and confers a favorable prognosis in endometrial cancer. Int. J. Mol. Sci. 23, 7158 (2022).
Yin, C. et al. BHLHE22 drives the immunosuppressive bone tumor microenvironment and associated bone metastasis in prostate cancer. J. Immunother. Cancer 11, e005532 (2023).
Martínez-Zamudio, R. I. et al. AP-1 imprints a reversible transcriptional programme of senescent cells. Nat. Cell Biol. 22, 842–855 (2020).
Wang, Y., Liu, L., Song, Y., Yu, X. & Deng, H. Unveiling E2F4, TEAD1 and AP-1 as regulatory transcription factors of the replicative senescence program by multi-omics analysis. Protein Cell 13, 742–759 (2022).
Liu, X. et al. Resurrection of endogenous retroviruses during aging reinforces senescence. Cell 186, 287–304 (2023).
Yang, N., Zhang, L., Zhang, Y. & Kazazian, H. H. Jr An important role for RUNX3 in human L1 transcription and retrotransposition. Nucleic Acids Res. 31, 4929–4940 (2003).
Sookdeo, A., Hepp, C. M., McClure, M. A. & Boissinot, S. Revisiting the evolution of mouse LINE-1 in the genomic era. Mob. DNA 4, 3 (2013).
Petkovich, D. A. et al. Using DNA methylation profiling to evaluate biological age and longevity interventions. Cell Metab. 25, 954–960 (2017).
Ndhlovu, L. C. et al. Retro-age: a unique epigenetic biomarker of aging captured by DNA methylation states of retroelements. Aging Cell 23, e14288 (2024).
Muntean, A. G. & Hess, J. L. Epigenetic dysregulation in cancer. Am. J. Pathol. 175, 1353–1361 (2009).
Fabre, M. A. et al. The longitudinal dynamics and natural history of clonal haematopoiesis. Nature 606, 335–342 (2022).
Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022).
Uddin, M. D. M. et al. Clonal hematopoiesis of indeterminate potential, DNA methylation, and risk for coronary artery disease. Nat. Commun. 13, 5350 (2022).
Walker, R. F. et al. Epigenetic age analysis of children who seem to evade aging. Aging 7, 334–339 (2015).
Kho, M. et al. Epigenetic loci for blood pressure are associated with hypertensive target organ damage in older African Americans from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. BMC Med. Genomics 13, 131 (2020).
Hannon, E. et al. DNA methylation meta-analysis reveals cellular alterations in psychosis and markers of treatment-resistant schizophrenia. eLife 10, e58430 (2021).
Ventham, N. T. et al. Integrative epigenome-wide analysis demonstrates that DNA methylation may mediate genetic risk in inflammatory bowel disease. Nat. Commun. 7, 13507 (2016).
Reizel, Y. et al. Gender-specific postnatal demethylation and establishment of epigenetic memory. Genes Dev. 29, 923–933 (2015).
Stubbs, T. M. et al. Multi-tissue DNA methylation age predictor in mouse. Genome Biol. 18, 68 (2017).
Yu, G., Wang, L.-G. & He, Q.-Y. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Acknowledgements
This research was supported by grants from the US National Institutes of Health (NIH) to A.S. and V.G., and the Milky Way Research Foundation to V.G. A.H.S. was supported by grant no. RF1AG074345 from the National Institute on Aging. The WHI program is supported by contracts from the National Heart, Lung, and Blood Institute (NHLBI) and NIH. We thank the WHI investigators and staff for their dedication, and the study participants for making the program possible. A list of WHI investigators can be found at https://www.whi.org/doc/WHI-Investigator-Long-List.pdf.
Author information
Authors and Affiliations
Contributions
F.M. and J.Y.L. assembled the data, performed the analysis and validated the results. V.G., A.S. and J.Y.L. conceptualized the study. F.M., C.R., J.Y.L., V.G. and A.S. provided intellectual contribution as the study developed. C.R. and F.M. designed the data visualizations. F.M. wrote the initial manuscript. A.H.S., R.C. and B.M.S. provided advice on the statistical analysis. All authors contributed to reviewing and editing the manuscript. V.G. and A.S. supervised the study.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Aging thanks Tibor Vellai and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Genomic context of RE CpGs.
(A) Genomic context of all REs, all probes in the Infinium array, RE probes in the Infinium array. (B) Genomic context of Infinium probes by major RE class.
Extended Data Fig. 2 Position of CpGs on TEs and methylation drift.
Consensus position of CpGs and methylation drift rate of selected L1s and LTR retrotransposons.
Extended Data Fig. 3 Age trends of SINE, SVA and DNA transposon methylation.
(A) Methylation drift rate of CpGs in SINEs, grouped by family. (B) Methylation drift rate of CpGs in SVAs, grouped by family. (C) Methylation drift rate of CpGs in DNA transposons, grouped by family. (D, E, F) Age coefficient of methylation at SINE, SVA and DNA transposon CpGs after adjustment for CpG density and youthful methylation level. (A-F) Only families represented by 40 or more CpGs in the Infinium array were shown. Families are sorted by average sequence length. Boxes show the 25th, median, and 75th percentiles. Whiskers extend to the 25th/75th percentiles ± 1.5*IQR. Points outside of whiskers not shown.
Extended Data Fig. 4 Age-related changes in expression and chromatin accessibility of REs in human PBMCs (GSE193142).
(A) Expression drift rate of REs grouped by major repeat class. Selfish (transposons) and non-selfish repeats grouped separately. (B) Chromatin accessibility drift rate of REs grouped by major repeat class. Selfish (transposons) and non-selfish repeats grouped separately. (C) Expression drift rate of L1s, grouped by family (D) Chromatin accessibility drift rate of L1s, grouped by family. (A, B) Boxes show the 25th, median, and 75th percentiles. Whiskers extend to the 25th/75th percentiles ± 1.5*IQR. Points outside of whiskers not shown. (C, D) The same families are shown as in Fig. 1 D, E. No bar is shown if the family was not expressed or accessible.
Extended Data Fig. 5 Additional motif enrichment analysis.
(A) Additional transcription factor binding motifs enriched at young L1s and associated with increased or decreased methylation drift rate within young L1s (Two-sided Wilcoxon rank-sum test p < 0.05 for both adjusted and unadjusted drift rate coefficient). P-values are only shown for comparisons within yL1s, exact P-values are 0.0029, 0.003, 0.038, 0.046 in order. N CpGs within old L1s: 341, N CpGs within young L1s: 621, of which 515 are flanked by an AR motif, 562 by an NR1I2 motif, 499 by an RXRA::VDR motif, and 454 by an SP5 motif. Boxes show the 25th, median, and 75th percentiles. Whiskers extend to the 25th/75th percentiles ± 1.5*IQR. Points outside of whiskers not shown. (B) Motifs enriched near L1PA2 CpGs with faster-than-median methylation drift vs those with slower-than-median drift. TFs that have similar motifs, or are part of the same TF family are grouped together. (C) Motifs enriched near L1HS CpGs with faster-than-median methylation drift vs those with slower-than-median drift. (D) Motifs enriched near L1PA4 CpGs with faster-than-median methylation drift vs those with slower-than-median drift.
Extended Data Fig. 6 TE clock feature information.
(A) TE class of features in the individual CpG TE clock, combined CpG TE clock and mouse RRBS TE clock (B) Overlap of CpGs selected by the individual CpG TE clock with published clocks. The top panel shows a histogram of coefficients for all selected CpGs, the bottom shows which CpGs are also used by other clocks, with the x coordinate representing coefficient size.
Extended Data Fig. 7 Detail on young L1 hypomethylation in cancer and additional disease/mortality predictors.
(A) Young L1 methylation in patients who developed cancer withing 3 y of sampling compared to healthy individuals, broken down by cancer site. Dots represent individual (future) cancer patients, and the mean is represented by a red triangle. The blue dashed line represents the average methylation in healthy individuals, while the orange dashed line represents average methylation over patients diagnosed with any cancer within 3 years. (B) Additional cancer, CHD and mortality predictors, trained as those in Fig. 4 D but allowing THE1A and THE1C CpGs the feature selection process in addition to young L1 CpGs. Bars represent mean ROC-AUC + /- SD in 10-fold cross-validation. Best and parsimonious models are shown.
Supplementary information
Supplementary Data 1
Infinium array CpG annotation, and adjusted and unadjusted age coefficients.
Supplementary Data 2
SEA motif enrichment results.
Supplementary Data 3
Repetitive element differential accessibility and expression results.
Supplementary Data 4
Chronological age predictor coefficients.
Supplementary Data 5
Disease and mortality predictor coefficients and CpG annotation.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Morandini, F., Lu, J.Y., Rechsteiner, C. et al. Transposable element 5mC methylation state of blood cells predicts age and disease. Nat Aging 5, 193–204 (2025). https://doi.org/10.1038/s43587-024-00757-2
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s43587-024-00757-2
This article is cited by
-
Unlocking longevity through the comparative biology of aging
Nature Aging (2025)