Main

Plants undergo photorespiration because ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) poorly discriminates between CO2 and O2 (ref. 1). Changes in CO2 levels, temperature and light intensity alter carboxylation and oxygenation rates, making photorespiration highly dynamic2. Detoxification of the oxygenation product and carbon salvage through the photorespiratory pathway consume ATP and release CO2, thereby reducing net photosynthesis by up to 50% (ref. 3). Reducing oxygenation and improving photorespiratory efficiency can therefore increase crop yields3. However, photorespiration is tightly connected to other cellular processes, including folate-mediated one-carbon metabolism (FOCM)2,4,5.

During photorespiration, glycine accumulates and is transported into mitochondria, where it is oxidized by the glycine decarboxylase complex (GDC), transferring a C1 unit to tetrahydrofolate (THF)6,7 (Fig. 1a). In parallel, mitochondrial serine hydroxymethyltransferase (SHMT) converts glycine to serine, regenerating THF8,9. A portion of the C1 units generated by the GDC supports mitochondrial FOCM, while surplus C1 is released by 10-formyl-THF deformylase (FDF) as formate to sustain flux and avoid the build-up of photorespiratory intermediates10. Mitochondrial formate dehydrogenase readily oxidizes accumulating formate to CO2 (ref. 11).

Fig. 1: Growth defects and loss of transcriptional silencing in mthfd1-1 are suppressed by thfs.
figure 1

a, Cross-compartmental biochemical pathways of FOCM. (1a) and (1b) SHMT; (2) MTHFR; (3) methionine synthase; (4) S-adenosylmethionine (SAM) synthetase; (5) SAM-dependent methyltransferase, including DNA methyltransferases; (6) SAH hydrolase (SAHH); (7a) methylenetetrahydrofolate dehydrogenase and (7b) methenyltetrahydrofolate cyclohydrolase (MTHFD); (8) THFS; (9) 10-FDF; (10) GDC. Hcy, homocysteine. b, Representative pictures of 3-week-old mthfd1-1 (m), mthfd1-1thfs double mutant (mt), thfs (t) and wild-type (WT) plants grown under LD conditions, and leaf area quantification from automated phenotyping (right). Scale bars, 1 cm. The data are presented as mean values ± s.d. (n = 8). c, Corrected total cell fluorescence (CTCF) from SDCpro-GFP expression quantified via confocal laser scanning microscopy. The box plots represent the median (centre lines), the 25th (bottom) and 75th (top) percentiles, and the minimum and maximum points within 1.5× the interquartile range (IQR) (n = 10). Lowercase letters represent significant differences (P < 0.05, one-way analysis of variance followed by post-hoc Tukey test). See Supplementary Table 8 for the P values. d, Hierarchically clustered heat map of mean scaled mCG ratios (z scores) of all DMRs (rows) from pairwise comparisons to the WT (left); distributions of mCG ratios in four main clusters, A1 (n = 9,959), A2 (n = 2,537), A3 (n = 245) and A4 (n = 268) (centre); and per cent overlaps of DMRs with hypervariable DNA methylation (HV) sites, CMT2-dependent methylated sites, RNA-directed DNA methylation (RdDM) sites, gbM sites and the remaining reference mthfd1-1 hypo-DMRs (Other m), as well as the remaining non-overlapping DMRs (Rest) per cluster (right). mCG ratios were computed from per-site C/T counts pooled across two biological replicates for each genotype. The box plots indicate the medians (centre lines), IQRs (boxes) and 1.5 × IQR (whiskers); the violin width reflects the kernel density. n indicates the number of DMRs. e, Heat map of mean (n = 3) normalized transcript levels (left) and mean mCG ratios (right) of all differentially expressed transposable elements (DETEs; rows) compared to the WT. The rows are hierarchically clustered by transcript level. The upper right plot shows the correlation between transcript levels and mCG ratios over DETEs per cluster. TSS, transcription start site; TTS, transcription termination site.

Source data

While activated C1 units are not exchanged between compartments, both serine and formate produced in mitochondria can supply cytosolic C1 metabolism: serine via cytosolic SHMT, generating 5,10-methylene-THF, and formate via 10-formyl-THF synthetase (THFS)4,8,9,12 (Fig. 1a). 5,10-methylene-THF directly serves for thymidylate synthesis (required for DNA replication) or is reduced to 5-methyl-THF for homocysteine-to-methionine recycling and production of the universal methyl donor S-adenosylmethionine (SAM)13,14. The THFS product, 10-formyl-THF, is converted to 5,10-methylene-THF via 5,10-methenyl-THF by 5,10-methylene-THF dehydrogenase/cyclohydrolase 1 (MTHFD1)12,15. The SHMT and THFS/MTHFD1 reactions operate reversibly12,15,16. The THFS/MTHFD1 pathway is essential for de novo purine biosynthesis in yeast and animals (through tri-functional C1-THF synthase)17, but its role in plant FOCM has been unclear12,18. We previously showed that mutations in MTHFD1 strongly impair DNA methylation in Arabidopsis, implicating the THFS/MTHFD1 branch in methylation reactions19.

In plants, distinct DNA methyltransferases catalyse methyl transfer from SAM to cytosine in CG, CHG and CHH sequence contexts (H represents A/C/T)20. The byproduct S-adenosylhomocysteine (SAH) is a competitive inhibitor and must be cleared by SAH hydrolase to sustain methylation21. Stable DNA methylation is essential for transposable element (TE) silencing and is faithfully maintained during replication22. Still, epimutations (that is, heritable DNA methylation changes) accumulate stochastically over generations and occasionally generate phenotypic variation23,24; climate and environmental stress, including drought and cold, associate with epigenetic variation and adaptation in natural populations and crops24,25,26. Although most stress-induced DNA methylation changes are somatic, they can act as cellular memory to support transient environmental adaptations27,28. How stress drives such epigenetic changes remains poorly understood.

Because SAM supply and SAH clearance directly affect DNA methylation, we dissected the cytosolic folate network and its coupling with photorespiration in Arabidopsis. This revealed that photorespiratory formate is reassimilated into FOCM to sustain DNA methylation and silencing, implying that environmental modulation of photorespiration may influence plant epigenetic regulation.

Results

Suppression of mthfd1 by thfs

To dissect cytosolic FOCM genetically, we used the hypomorphic mthfd1-1 mutant (hereafter mthfd1), which exhibits SAH accumulation and DNA hypomethylation19. This prompted us to ask why FOCM impairment in mthfd1 is not compensated by the SHMT4-dependent serine branch. To address this, we screened for mutants that modified the growth defects and derepression of the DNA-methylation-sensitive SDCpro-GFP reporter in mthfd1 (ref. 19). Unexpectedly, a loss-of-function mutation in THFS fully suppressed the mthfd1 phenotype (Extended Data Fig. 1a,b). Instead of phenocopying mthfd1, homozygous thfs single and mthfd1thfs double mutants (mt) were indistinguishable from the wild type in leaf growth, GFP fluorescence, flowering time and root length (Fig. 1b,c and Extended Data Fig. 1c–e).

To exclude background genetic effects, we generated CRISPR–Cas9 THFS knockouts in the mthfd1 background and confirmed the restoration of global DNA methylation (Extended Data Fig. 2a–e). To verify that suppression was not specific to mthfd1-1, we crossed thfs with a plant heterozygous for the strong mthfd1-3 allele19 (Extended Data Fig. 1a). Double mutant offspring suppressed the DNA hypomethylation and dwarf phenotype of mthfd1-3 (Extended Data Fig. 2f). Together, these results indicate that the THFS/MTHFD1 branch of cytosolic FOCM is dispensable for growth and global DNA methylation under standard conditions—consistent with compensation by SHMT4.

To examine DNA methylation changes in detail, we performed whole genome bisulfite sequencing (WGBS) on leaf tissue of the wild type, mthfd1, thfs and mt (Supplementary Table 1). Principal component analysis of methylome profiles and genome tracks of CG, CHG and CHH methylation (mCG, mCHG and mCHH) showed that the widespread DNA hypomethylation across pericentromeric regions and chromosome arms of mthfd1 was largely suppressed in mt (Extended Data Fig. 3a,b). Consistently, analysis of differentially methylated regions (DMRs) and hierarchical clustering of CG DMRs identified a major cluster of mthfd1 hypo-DMRs fully suppressed in mt (A1), a partially suppressed cluster (A2) and two small clusters—one comprising hypo-DMRs shared by the mutants and the other DMRs that were hypermethylated in thfs and mt (Fig. 1d and Supplementary Table 2). Although dispensable for maintaining global DNA methylation levels, the THFS/MTHFD1 branch appeared to stabilize DNA methylation patterns at specific genomic regions. We therefore defined regional DNA methylation classes (detailed in Methods) and quantified their overlap with DMRs per cluster. Heterochromatic regions—defined as Chromomethylase 2 (CMT2) sites—were enriched in A1 and thus preferentially restored, whereas gene-body-methylated (gbM) regions were enriched in A2 (Fig. 1d and Extended Data Fig. 3c). Accordingly, mCG over gbM sites was significantly lower in mt than in the wild type (Wilcoxon rank-sum test; false discovery rate, <2 × 10−16), but not over CMT2 sites (FDR > 0.01) (Extended Data Fig. 3d). These regional patterns indicate that perturbations to FOCM differentially affect methylation in heterochromatin versus euchromatin.

TE expression analysis via mRNA sequencing (mRNA-seq) showed that thfs suppressed mthfd1-induced TE derepression genome-wide, with the exception of a small cluster comprising primarily autonomous Mutator-like DNA transposons (VANDAL21) that can antagonize DNA methylation by anti-silencing proteins29 (Fig. 1e and Extended Data Fig. 3e–g).

Photoperiod controls FOCM

We observed that the mthfd1 phenotype is sensitive to photoperiod—a major regulator of Arabidopsis development30—as growth under short days (SD) mitigated the growth defects exhibited under long days (LD) (Figs. 1b and 2a). Methylome comparisons confirmed that the hypomethylation seen in mthfd1 under LD was largely attenuated under SD (Fig. 2b and Extended Data Fig. 4a,b). Furthermore, shifting plants from SD to LD for 48 hours led to the upregulation of SDC and selected TEs previously found derepressed in mthfd1 (ATCOPIA28 and SADHU3)19, underscoring the dynamic influence of photoperiod on folate-mediated C1 supply for DNA methylation (Fig. 2c).

Fig. 2: THFS and MTHFD1 control FOCM dynamics in a photoperiod-dependent way.
figure 2

a, Representative pictures of 4-week-old m, mt, t and WT plants grown under SD, as well as leaf area quantification from automated phenotyping. Scale bars, 1 cm. The data are presented as mean values ± s.d. (n = 10). b, Distribution of mCG ratios in MET1-targeted gbM regions (n = 420) and CMT2-targeted regions (n = 1,623) in rosette leaves of m and WT plants grown under SD. The values were computed from per-site C/T counts binned into 100-bp windows and pooled across two biological replicates for each genotype. The box plots indicate the medians (centre lines), IQRs (boxes) and 1.5 × IQR (whiskers); the violin width reflects the kernel density. n indicates the number of regions. c, Transcript levels (ΔΔCt) of three marker loci for transcriptional silencing (SADHU3, ATCOPIA28 and SDC) that were analysed via reverse transcription quantitative PCR (RT–qPCR). Plants were grown continuously under SD or shifted from SD to LD for 48 h before sampling at day 28 after germination. The data are presented as mean values ± s.d. (n = 3). The indicated P values are from two-sided Welch’s t-tests. d,e, Diurnal steady-state levels of target metabolites in leaves under LD (d) and SD (e). The grey shading depicts dark periods. The data are presented as mean values ± s.d. (n = 5). For glycine and homocysteine, values below the detection limit were treated as not available. The P values and sample sizes are provided in Supplementary Tables 12 and 13 for plants grown under LD and SD, respectively. FW, fresh weight. f,g, Steady-state levels of selected THF intermediates in leaves under LD (f) and SD (g). The grey shading depicts dark periods. The data are presented as mean values ± s.d. (n = 3). Values below the detection limit were treated as not available. The P values and sample sizes are provided in Supplementary Tables 12 and 13 for plants grown under LD and SD, respectively.

Diurnal profiling showed that, in mthfd1 leaves, homocysteine and SAH accumulate strongly in the morning (0–6 h after dawn/ZT6) and decline at night, when MTHFD1 appears dispensable (Fig. 2d). Whereas the wild type, thfs and mt accumulated methionine and SAM during the day, mthfd1 showed an attenuated and prolonged methionine rise (Fig. 2d). Sustained SAM levels indicate that DNA hypomethylation in mthfd1 is caused by SAH accumulation rather than SAM limitation (Fig. 2d). These metabolic defects in mthfd1 were mitigated under SD (Fig. 2e).

Among folate intermediates, 5-methyl-THF varied only modestly, and methylenetetrahydrofolate reductase (MTHFR) activity was similar across genotypes (Extended Data Fig. 4c,d). Homocysteine accumulation despite stable 5-methyl-THF in mthfd1 leaves indicates that bulk folate pools may obscure subcellular constraints and that steady-state 5-methyl-THF is a poor proxy for C1 flux through the methionine cycle19.

By contrast, 5-formyl-THF was consistently elevated in mthfd1 under LD, but not in other genotypes or under SD (Fig. 2f,g). 5-formyl-THF has no known function as a cofactor, but it can act as folate storage and an FOCM regulator; its inhibition of serine–glycine interconversion by SHMT is well established31,32,33. Accordingly, the accumulation of 5-formyl-THF in 5-fcl and fdf1,2 double mutants is associated with the build-up of photorespiratory glycine10,32. In comparison, mthfd1 exhibited elevated serine (but not glycine) during the day (Fig. 2d), suggesting that cytosolic 5-formyl-THF accumulation may inhibit SHMT4. Moreover, THF peaked at ZT12—consistent with simultaneous blocking of serine- and formate-derived C1 flux in mthfd1 under LD (Fig. 2f). Upregulation of SHMT4 in mthfd1 (Extended Data Fig. 4e) that mirrored methionine dynamics supports a compensatory response to C1 restriction, most apparent as homocysteine build-up. Homocysteine is an activator of 3-phosphoglycerate dehydrogenase (PGDH)34, the first enzyme of the phosphorylated serine biosynthesis pathway (PSBP), reinforcing that C1 supply via serine is upregulated in mthfd1.

5-formyl-THF arises from 5,10-methenyl-THF via a side reaction of SHMT35 (Fig. 1a) and is reconverted by mitochondrial 5-FORMYLTETRAHYDROFOLATE CYCLOLIGASE (5-FCL)36. The arginine-to-glutamine substitution in mthfd1 is predicted to impair NADP+/NADPH binding, affecting the interconversion of 5,10-methenyl-THF to 5,10-methylene-THF—supported by diminished MTHFD1 activity in mthfd1 and mt (Extended Data Fig. 4d)—while maintaining the cyclohydrolase conversion between 10-formyl-THF and 5,10-methenyl-THF37,38. Accordingly, our data suggest that, in mthfd1-1, C1 flux through the THFS/MTHFD1 branch is diverted towards 5-formyl-THF. This folate sink is mended by blocking 10-formyl-THF synthesis, which we confirmed in thfs and mt (Fig. 2f and Extended Data Fig. 4d).

While THFS did not show diurnal regulation, the relative expression of MTHFD1 increased during the day (Extended Data Fig. 4e). Together, the profiles and genetic analyses are consistent with predominant daytime activity and increased engagement of the THFS/MTHFD1 branch under LD, assimilating formate into 10-formyl-THF and converting it via 5,10-methenyl-THF to 5,10-methylene-THF.

Formate serves as a C1 source

To confirm the assimilation of formate into cytosolic FOCM, we established [13C]formate labelling with targeted quantification by gas chromatography–mass spectrometry (Extended Data Figs. 5a,b and 10). We observed THFS- and MTHFD1-dependent labelling of 5-methylcytosine and methionine, with stronger incorporation during the day (Fig. 3a,c). Glycine and serine also showed label enrichment that peaked at the start of the photoperiod upon [13C]formate supplementation, but this was independent of THFS and MTHFD1 (Extended Data Fig. 5c), indicating that—unlike the THFS/MTHFD1-dependent labelling of methionine and 5-methylcytosine—glycine/serine labelling probably resulted from the oxidation of [13C]formate by formate dehydrogenase and the reassimilation of 13CO2 into 3-phosphoglycerate18. Short-term labelling with [2-13C]glycine indicated that photorespiratory glycine-to-serine conversion was unaffected in mthfd1 and thfs (Fig. 3b). Moreover, thymine was labelled from [13C]formate in a THFS- and MTHFD1-dependent way, demonstrating that 5,10-methylene-THF produced by MTHFD1 also contributes to de novo thymidylate synthesis (Fig. 3c). In contrast, adenine labelling was THFS- and MTHFD1-independent, indicating that the cytosolic folate network is not involved in de novo purine biosynthesis, which is plastid-localized39; this result aligns with the viability of mthfd1-3thfs double mutants.

Fig. 3: The THFS/MTHFD1-dependent pathway supplies C1 units from formate for homocysteine-to-methionine recycling, DNA methylation and thymidylate synthesis.
figure 3

a, Label enrichment in methionine after [13C]formate supplementation at the beginning (left) or end (right) of the photoperiod. The grey shading depicts dark periods. The data are presented as mean values ± s.d. (n = 3). See Supplementary Table 16 for the P values. b, Label enrichment in serine after [2-13C]glycine supplementation at the beginning of the photoperiod. Dual labelling (M + 2) results from combined glycine cleavage and glycine-to-serine conversion in mitochondria. The grey shading depicts dark periods. The data are presented as mean values ± s.d. (n = 3). See Supplementary Table 17 for the P values. c, Label enrichment in targeted nucleobases after [13C]formate supplementation. Guanine was not quantified due to low recovery (Methods). The data are presented as mean values ± s.d. (n = 3). Lowercase letters represent significant differences between genotypes for each nucleobase (P < 0.05, one-way analysis of variance followed by post-hoc Tukey test). See Supplementary Table 18 for the P values. 5mC, 5-methylcytosine.

Source data

Photorespiration stabilizes DNA methylation

Given that non-enzymatic oxidation of photorespiratory glyoxylate and probably mitochondrial FDF1/2 activity are major physiological sources of formate10,40, we hypothesized that suppressing photorespiration would restrict C1 flux through the THFS/MTHFD1 branch and partially rescue the mthfd1 phenotype. To test this, we profiled DNA methylation in wild-type and mthfd1 plants grown in control air (cCO2, 394 ± 39 ppm) and in non-photorespiratory high CO2 (hCO2, 3,141 ± 53 ppm) conditions that rescue photorespiratory cat2-15 and shmt1 mutants41,42 (Extended Data Fig. 6a). In the wild type, hCO2 caused only minor genome-wide changes—that is, slight CG hypomethylation, no apparent changes in mCHG and weak CHH hypermethylation in pericentromeric regions (Extended Data Fig. 6b–e). This indicates that the C1 supply for DNA methylation is largely maintained when photorespiration is suppressed, consistent with a serine-derived C1 supply via SHMT4 compensating for reduced formate-derived C1 units. In contrast, mthfd1 showed pronounced mCG, mCHG and mCHH gains under hCO2 relative to cCO2 (Extended Data Fig. 6b–e). Clustering of CG DMRs delineated mthfd1 hypo-DMRs that strongly (B1) or weakly (B2) regained mCG under hCO2, as well as a wild-type-specific hCO2-sensitive cluster (B3) (Fig. 4a and Supplementary Table 3). Cluster B3 accounted for approximately 10% of the combined CG DMRs and showed a significant decrease in mean CG ratios in the wild type under hCO2 versus cCO2 (Wilcoxon rank-sum test, FDR < 6 × 10−76), revealing that suppressing photorespiration induces CG hypomethylation in the wild type in addition to mitigating methylation defects in mthfd1 (Fig. 4a). As with cluster A1 (Fig. 1d), B1 and B3, which were both enriched in CMT2 sites, exhibited greater mCG dynamics than gbM-enriched B2 (Fig. 4a–c and Extended Data Fig. 6f). As expected, hCO2 was less effective than thfs in restoring DNA methylation in mthfd1, probably because formate is also produced independently of photorespiration (for example, by the oxidation of methanol released during pectin demethylation) under hCO2 (ref. 43).

Fig. 4: Suppressing photorespiration alters DNA methylation patterns in the wild type and mitigates methylome and transcriptome defects in mthfd1-1.
figure 4

a, Hierarchically clustered heat map of mean scaled mCG ratios (z scores) of all DMRs (rows) compared to the WT under cCO2 (left); distributions of mCG ratios in four main clusters, B1 (n = 9,019), B2 (n = 13,430), B3 (n = 2,499) and B4 (n = 76) (centre); and per cent overlaps of DMRs with categorized DNA-methylated regions per cluster (right). mCG ratios were computed from per-site C/T counts pooled across two biological replicates for each genotype and CO2 condition. The box plots indicate the medians (centre lines), IQRs (boxes) and 1.5 × IQR (whiskers); the violin width reflects the kernel density. b, Complete-linkage clustering dendrogram (Euclidean distance) of genotypes and CO2 conditions based on mCG, shown above distributions of mCG ratios in union DMRs (covered across samples) from clusters A2 (n = 2,374) and B1 (n = 5,277). The cluster labels follow a and Fig. 1d. DMR mCG ratios were computed from per-site C/T counts pooled across two biological replicates for each genotype and CO2 condition. The box plots indicate the medians (centre lines), IQRs (boxes) and 1.5 × IQR (whiskers); the violin width reflects the kernel density. n indicates the number of DMRs. c, DMR overlaps between CMT2-dependent methylation (A1 and B1) and gbM (A2 and B2) enriched clusters. d, Volcano plot of TE expression in m under hCO2 compared to m under cCO2 (n = 3). The top eight TEs ranked by q value are indicated by subfamily names. e, Heat map of mean (n = 3) expression changes relative to the WT under cCO2 of mthfd1-1 DEGs encoding enzymes involved in FOCM, photorespiration and other related metabolic pathways (see row annotation; SMM, S-methylmethionine cycle). The rows were hierarchically clustered by scaled normalized transcript levels.

Photorespiration drives expression changes in mthfd1

TE derepression in mthfd1 was broadly reduced under hCO2 (Extended Data Fig. 7a–c), but with TE-family-specific differences. Compared with all mthfd1 differentially expressed transposable elements (DETEs), CO2-insensitive mthfd1 DETEs, which remained highly expressed despite increased DNA methylation, were enriched for En-Spm DNA transposons (pairwise comparison of proportions, P = 3.3 × 10−8), whereas CO2-responsive mthfd1 DETEs were enriched for Gypsy family retrotransposons (such as ATHILA) (P = 1.1 × 10−4) (Fig. 4d and Extended Data Fig. 7d,e).

Consistent with its pleiotropic phenotype, mthfd1 exhibits broad gene expression changes compared with the wild type19. Under hCO2, these changes were attenuated, and the number of differentially expressed genes (DEGs) in mthfd1 decreased by 60% compared with cCO2; consistently, mthfd1 showed ~10-fold more CO2-responsive DEGs than the wild type (Extended Data Fig. 7f). Among the most dynamic DEGs in mthfd1 was GLN1;4, encoding a cytosolic glutamine synthetase catalysing ammonium assimilation that is inhibited by 5-formyl-THF binding35. Prompted by this observation, we tested for enrichment and found that 18 of 51 genes encoding reported 5-formyl-THF-binding proteins were among the hCO2-responsive mthfd1 DEGs (hypergeometric test, P = 1.4 × 10−5) (Extended Data Fig. 7g), suggesting that 5-formyl-THF accumulation may contribute to these expression changes in mthfd1. Consistent with the attenuation of mthfd1 defects under hCO2, SDC expression was reduced, and DNA methylation at the SDC promoter was increased (Extended Data Figs. 7g and 8).

Photorespiration and serine biosynthesis are transcriptionally upregulated in mthfd1

Querying a curated gene annotation list, we identified 42 DEGs involved in FOCM, photorespiration and associated metabolic pathways, 36 of which were differentially expressed in mthfd1 under cCO2 but not under hCO2 (Supplementary Tables 4 and 5). Hierarchical clustering of these 42 DEGs revealed strongly co-expressed members of the photorespiratory glycolate and associated ammonia recycling pathways1, including GLUTAMINE SYNTHETASE 2 (GS2); DiT1, encoding the plastidal dicarboxylate transporter for the subsequent glutamine oxoglutarate aminotransferase (GOGAT) reaction, which together with GS2 constitutes the N-assimilatory GS/GOGAT cycle; and genes encoding the T and H subunits of the GDC (Fig. 4e and Extended Data Fig. 9). Together with the P subunit, which was also among the cCO2-upregulated mthfd1 DEGs (Extended Data Fig. 7g), the H subunit is known to limit the rate of glycine cleavage42. Additional cCO2-upregulated mthfd1 DEGs included key enzymes of assimilatory nitrate reduction; two of the three PSBP enzymes, including homocysteine-activated PGDH3; and several FOCM components, including SHMT4 and FDF2 (Fig. 4e). The patterns align with published transcriptome changes in shmt1 and PSBP mutants5 and suggest that the photorespiratory glycolate pathway and the PSBP (the two principal routes for serine production) are upregulated in mthfd1 under photorespiratory conditions, probably in response to perturbed C1 and sulfur-amino-acid homeostasis.

Discussion

Our study reveals a previously unrecognized link between photorespiration and DNA methylation: photorespiratory formate feeds the cytosolic folate cycle via THFS and MTHFD1, supplying C1 units to the methionine cycle in Arabidopsis leaves during the photoperiod (Fig. 5). While formate is a recognized C1 donor in yeast and animals, direct evidence in plants has been scarce5. Gashu et al.4 quantified photorespiratory carbon fluxes in Arabidopsis grown under SD using 13CO2 labelling and identified serine as the principal cytosolic C1 source, with formate contributing little and not fitting their flux model. Our results draw a contrasting picture in which serine and formate act complementarily to maintain cytosolic C1 homeostasis. Notwithstanding possible bias from high [13C]formate supplementation, we observed THFS- and MTHFD1-dependent labelling of methionine and 5-methylcytosine, with free methionine labelling conspicuously strong during the photoperiod—consistent with photorespiration driving C1 flux. Together, these observations suggest that the relative contributions of formate and serine to cytosolic FOCM vary with time of day and photoperiod, as reflected by the conditional mthfd1 phenotype.

Fig. 5: Model of photorespiration acting as an interface between the environment and DNA methylation.
figure 5

Formate produced during photorespiration enters C1 metabolism through THFS and MTHFD1, supplying C1 units for thymidylate synthesis, recycling of Hcy to methionine (Met) and SAM-dependent DNA methylation in leaves. Photorespiratory C1 flux (yellow) is modulated by environmental variables. Negative feedback from THFS/MTHFD1 controls serine-derived C1 flux through SHMT4 (blue), stabilizing C1 metabolism under fluctuating conditions. When fluctuations in photorespiratory C1 flux are not compensated by SHMT4, local SAM limitation and SAH build-up drive DNA methylation changes.

Under LD, mthfd1 shows daytime build-up of homocysteine and SAH along with DNA hypomethylation and TE derepression, whereas these signatures are largely attenuated under SD. A longer photoperiod relaxes carbon constraints and increases ATP and NADPH generation by light reactions, which is expected to favour C1 supply via the ATP- and NAD(P)H-dependent steps of the THFS/MTHFD1 branch2,3,12,15. Conversely, a shorter photoperiod probably increases reliance on the serine-derived supply via SHMT4.

This photoperiod dependence intersects with tissue specificity. Earlier studies showed that photorespiratory formate can supply cytosolic C1 for glycine-to-serine conversion in GDC mutants and heterotrophic tissue8,9,44. THFS was recently characterized in Arabidopsis as a cytosolic formate-activating enzyme and appeared to mainly support serine production in roots, where thfs mutants showed reduced growth under SD12. By contrast, under our LD conditions, root growth in thfs was indistinguishable from that in the wild type. Moreover, in leaves, [13C]formate labelling of serine (unlike methionine) was independent of THFS and MTHFD1. Nevertheless, complete suppression of the pleiotropic mthfd1 phenotype—including short roots—by thfs in mt double mutants indicates that the THFS/MTHFD1 branch operates in leaves as well as in roots.

Our genetic analysis of the cytosolic folate network indicates that the THFS/MTHFD1 branch is largely compensable by the SHMT4 branch. The restoration of FOCM in mt double mutants further suggests that THFS negatively regulates SHMT4 activity (Fig. 5), probably via competition for THF8,12,16 and/or accumulation of inhibitory 5-formyl-THF32. This built-in redundancy and feedback regulation underscores the flexibility of the cytosolic folate network across organisms in balancing C1 sources45. By contrast, selectively blocking MTHFD1 while THFS remains intact perturbs this balance, reduces serine- and formate-derived C1 supply for homocysteine-to-methionine recycling, alters diurnal SAM and SAH dynamics, and impairs DNA methylation and TE silencing in Arabidopsis.

Correspondingly, carbon throughput from photorespiration to folate metabolism can directly influence epigenetic regulation. Suppressing photorespiration reduces serine-derived C1 flux by approximately fivefold4. According to our results, formate-derived C1 throughput likewise diminishes, as inferred from the partial restoration of DNA methylation in mthfd1 under hCO2. Together, curbing photorespiratory C1 supply can induce DNA methylation changes, as seen in wild-type plants grown under hCO2, albeit to a lesser extent than in mthfd1, probably because alternative serine production (particularly the PSBP) becomes upregulated, as observed in shmt1 (ref. 18) and in mthfd1.

The C1 metabolic bridge provides a framework for environmentally induced DNA methylation changes24,27,28,46 (Fig. 5). Consistent with this view, a recent study on acclimation to elevated CO2 (1,000 ppm) showed transgenerational DNA methylation changes in Arabidopsis that correlated with enhanced growth47. Persistent DMRs were enriched in gbM and underrepresented in heterochromatic regions47, resembling methylome dynamics in mthfd1. This suggests that gbM may be more vulnerable to limited C1 supply than heterochromatin—potentially because replication-coupled mCG maintenance becomes error-prone when SAM is scarce and the methionine cycle competes with thymidylate synthesis for activated C1 units38. The nuclear localization of SAM synthetase48 further raises the possibility that SAM is preferentially supplied to heterochromatin. In addition, stabilizing mechanisms, including feedback between non-CG methylation and H3 K9 dimethylation49,50, may facilitate the recovery of heterochromatic methylation upon restoration of C1 supply. Our model remains simplified, as illustrated by pericentromeric CG hypomethylation with CHH hypermethylation in the wild type under hCO2.

Because photorespiration is highly sensitive to ambient fluctuations, the formate-based C1 supply described here provides a basis for predicting how climate change may modulate plant methylomes. Although differences in DNA methylation among Arabidopsis accessions are largely explained by genetic variation51, they also correlate with climate of origin25, raising the question of how photorespiration contributes to methylome variation regarding climate change. Canopy-scale models project that photorespiratory losses will decline under future climates: in soybean, the penalty on midday net CO2 assimilation is predicted to drop by ~50% by 2100 under a high-emissions scenario (~1,000 ppm atmospheric CO2; +3.7 °C warming)3—implying lower photorespiratory provision of formate and serine to cytosolic FOCM. That trend may be offset by heat or drought, which increase photorespiration52. However, prior work and our data argue against simple proportionality between photorespiration, C1 flux and DNA methylation, because FOCM is tightly coupled to other pathways—notably the methionine cycle53. Plant FOCM is responsive to cellular demand and can sustain the synthesis of methylated secondary metabolites at levels far exceeding the C1 equivalents in nucleic acids5. Defining the regulatory network that couples environmental fluctuations to the C1 supply–demand balance and DNA methylation warrants further investigation, with implications for predicting adaptation to future climates and for strategies to reduce photorespiratory losses in crops54.

Methods

Plant materials and growth conditions

Arabidopsis thaliana (L.) Heynh. accession Columbia-0 was used throughout. The hypomorphic mthfd1-1 allele, the mthfd1-3 null allele, and wild-type and mthfd1-1 lines carrying SDCpro-GFP have been described previously19. The transfer DNA (T-DNA) insertion mutant thfs (SALK_067510) was obtained from the Arabidopsis Biological Resource Center (Ohio State University), and homozygosity of the T-DNA insertion in THFS (AT1G50480) was confirmed via PCR genotyping (the primer sequences are listed in Supplementary Table 6). The thfs line was crossed with mthfd1-1SDCpro-GFP and the wild type carrying SDCpro-GFP. Homozygous thfsSDCpro-GFP and mthfd1-1thfsSDCpro-GFP F2/F3 offspring (mt) were identified via PCR genotyping.

Plants were grown in controlled-environment growth chambers (Weiss Technik & Vötsch) at 23/20 °C (day/night), 65% relative humidity and 100 μmol m−2 s−1 photosynthetically active radiation (400–700 nm) under LD (16 h light/8 h dark) or SD (10 h light/14 h dark) cycles. Unless stated otherwise, the plants were germinated and grown on soil (4:1 (v/v) Floragard multiplication substrate:quartz sand). For isotope labelling and root length measurements, plants were grown in liquid and solid half-strength Murashige and Skoog (MS) medium (Duchefa Biochemie), respectively (see ‘Isotopic labelling’ below and ‘Root length analysis’ in Supplementary Methods).

For experimental suppression of photorespiration, plants were grown in an exposure chamber at the Environmental Simulation phytotron (Helmholtz Munich)55. The growth conditions were 130 µmol m−2 s−1 photosynthetically active radiation at plant height, LD cycle with 23/21 °C and 65% relative humidity. The plants were grown in two Plexiglas cuvettes (55 × 28 × 110 cm3; width × depth × height). One cuvette was flushed with 60 m3 h−1 air at the ambient CO2 level (394 ± 39 ppm, cCO2) and the other with air continuously enriched with CO2 (Air Liquide) to 3,141 ± 53 ppm (hCO2). Airflows were controlled using a 1-l mass flow controller (MKS Instruments GmbH), calibrated with a mass flow meter (ADM 3000, Agilent). CO2 levels were monitored with an infrared gas analyser (BINOS 100, Rosemount Analytical Inc.) calibrated against a certified 2,000-ppm gas standard (Air Liquide) (Supplementary Fig. 1). At ambient CO2, no morphological changes attributable to the phytotron growth systems in either the wild type or mthfd1 were observed via visual inspection. Even so, all analyses were performed within the growth systems to minimize potential system effects.

Unless stated otherwise, samples consisted of rosette leaf tissue harvested 21 days after germination under LD and 28 days after germination under SD. Tissue was harvested 6 h after dawn (ZT6) for single-time-point experiments and at the end of the night (0 h), ZT6, ZT12 and ZT18 for time-course experiments. The samples were flash-frozen in liquid nitrogen and pulverized in a bead mill (Retsch TissueLyser, Qiagen) using two 2-mm steel beads.

Confocal microscopy

SDCpro-GFP expression was quantified via confocal microscopy (Supplementary Methods) of wild-type, mthfd1-1, thfs, mthfd1-1thfs and CRISPR–Cas9 lines (Supplementary Methods). CTCF was calculated as described in ref. 56 using the formula CTCF = Integrated Density − (Area × Mean Background Fluorescence). For each genotype, data were collected from ten biological replicates (that is, plants) for statistical analysis. Further methodological details are provided in Supplementary Information.

mRNA-seq

The mRNA-seq experiments were performed with three biological replicates. Total RNA was isolated from 100 mg (fresh weight) of leaf tissue pooled from three or four plants using the RNeasy Plant Mini Kit (Qiagen) followed by DNA digestion with amplification-grade DNase I (Thermo Fisher Scientific). Library construction and sequencing were performed by Novogene. For the preparation of non-directional libraries, mRNA was purified from total RNA using oligo-dT-attached magnetic beads. After fragmentation, the first-strand cDNA was synthesized using random hexamer primers, followed by the second-strand cDNA synthesis, end repair, A-tailing, adapter ligation, size selection, amplification and purification. The libraries were sequenced as paired-end reads of 150 bp on NovaSeq Illumina Platforms.

WGBS

The WGBS experiments were performed with two biological replicates. Genomic DNA was isolated from 100 mg (fresh weight) of leaf tissue pooled from three or four plants per sample using the DNeasy Plant Mini Kit (Qiagen), except for the comparison of DNA methylation under SD and LD conditions, where 14-day-old seedlings grown on Phyto agar (RPI Corp.) with half-strength MS basal salt mixture (MP Biomedicals) were used. The isolated DNA was sheared into 200–400-bp fragments via ultrasonication (Covaris S2/S220/E220). WGBS of the wild type, mthfd1-1 and thfs single mutants and mt double mutants was performed by BGI TECH Solutions, proceeding with end repair of the sheared DNA, deoxyadenylation and ligation of methylated sequencing adaptors, followed by bisulfite conversion using the EZ DNA Methylation-Gold kit (ZYMO Research), size selection, PCR amplification and final size selection of libraries. Libraries for WGBS of seedlings grown under SD and LD conditions and CRISPR–Cas9 lines were prepared the same way, except using the KAPA HyperPrep Kit (Roche), TruSeq DNA LT (Illumina)/NEBNext Multiplex Oligos (New England Biolabs) methylated adaptors and the EpiTect Bisulfite Kit (Qiagen), and sequenced at Novogene. WGBS of wild-type and mthfd1-1 plants grown under cCO2 and hCO2 was performed by Novogene using post-bisulfite adaptor tagging with the Accel-NGS Methyl-Seq DNA Library Kit for Illumina (Swift Biosciences). All libraries were sequenced on Illumina platforms (HiSeq 2000/X-Ten, NovaSeq 6000/X Plus) as paired-end reads of 150 bp, except for libraries from seedlings grown under SD and LD, which were sequenced as single-end 50-bp reads.

Isotopic labelling

Seeds of wild-type, mthfd1-1, thfs and mthfd1-1thfs plants were surface-sterilized with 40% (v/v) commercial bleach at 1,500 rpm and 22 °C for 10 min in a thermoblock, followed by six washes with sterile water. Sterilized seeds were stratified for 2 days at 4 °C and subsequently grown under SD or LD conditions on a shaker at 100 rpm. Approximately 30 seeds were sown per well in six-well plates containing 2.5 ml of half-strength MS medium (Duchefa Biochemie) supplemented with 0.5% sucrose and adjusted to pH 5.7 using KOH.

After 9 days of culture, seedlings had established upright growth, and the medium was replaced according to the labelling treatment. For short-term stable isotope tracing in free amino acids, seedlings were transferred to half-strength MS medium without sucrose. For long-term stable isotope tracing in nucleobases, growth media were supplemented with 1 mM [13C]formate. Control samples received 1 mM unlabelled formate to account for background levels of naturally occurring isotopes. For long-term labelling, both labelled and control media were refreshed after 11 and 13 days of culture. After 15 days of culture under SD conditions, samples were harvested at 6 h after dawn. The seedlings were rinsed thoroughly with deionized water to remove residual medium and carefully patted dry, the roots were excised, and the shoots were harvested and immediately flash-frozen in liquid nitrogen. Long-term labelling was carried out under SD conditions to avoid the observed growth constraints under LD conditions in mthfd1-1.

For short-term isotope tracing in free amino acids, half-strength MS medium was replaced after 13 days of culture under LD with labelled (1 mM [13C]formate or 1 mM [2-13C]glycine) or control (1 mM unlabelled formate or 1 mM unlabelled glycine) media. To assess label incorporation during the day and night, media were exchanged at the onset of the light or dark period, respectively. Samples were collected after 4, 8 and 16 h of incubation as described above. Shoot tissues were ground in liquid nitrogen using a bead mill (Retsch TissueLyser, Qiagen).

We used 100 mg of pulverized plant material per sample for genomic DNA isolation. DNA was dissolved in ultrapure water using ultrasonication for 10 min, transferred to 2-ml amber glass vials with 11-mm crimp tops (Thermo Fisher Scientific) and dried under a constant stream of nitrogen gas for 30 min. For DNA hydrolysis into nucleobases, 100 μl of neat formic acid were added to the sample. The vials were briefly flushed with nitrogen gas and wiped with lint-free Kimwipes (Kimtech) before being sealed with 11-mm crimp caps with PTFE/silicone septa (Agilent). The samples were incubated at 150 °C for 6 h and subsequently dried under a constant stream of nitrogen gas. For nucleobase derivatization, 50 μl of acetonitrile and 50 μl of N,O-bis-(trimethylsilyl)-trifluoracetamid (BSTFA) containing 1% trimethylchlorosilane (TMCS) were added to each sample. The vials were again flushed with nitrogen gas, wiped with Kimwipes and tightly sealed with crimp caps (Agilent). The samples were briefly mixed and incubated at 150 °C for 60 min. The derivatized nucleobases were then transferred to 250-μl glass inserts (Merck) using glass pipettes and placed back into 2-ml amber glass vials.

For the isolation of free amino acids, 100 mg of pulverized plant material was incubated in 1 ml of 1 M HCl at 60 °C and 500 rpm for 30 min in a thermoblock. After centrifugation at 10,621 g and 4 °C for 15 min, 800 μl of the supernatant was transferred to a new reaction tube. The centrifugation step was repeated, and 250 μl of the clear supernatant was added to 2-ml amber vials and dried under a constant stream of nitrogen gas for 60 min. Free amino acids were derivatized as described above, except that BSTFA was replaced with N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide containing 1% TMCS. Derivatization was performed for 90 min at 60 °C.

Stable isotope enrichment in derivatized nucleobases and free amino acids was determined using thermal desorption–gas chromatography–mass spectrometry (thermal desorption unit, Gerstel; gas chromatograph: 7890A; mass spectrometer: 5975C, both Agilent Technologies). 1 μl of each sample was injected into the thermal desorption unit and vaporized by increasing the temperature from 40 °C to 280 °C at a rate of 360 °C min−1, with a hold for 2.5 min. Vaporized compounds were cryofocused using a cooled injection system (Gerstel) at −50 °C and subsequently desorbed in splitless mode to 280 °C at 6 °C s−1, holding for 2.5 min. Compounds were separated on a J&W 122-5562G_1- DB-5MS + 10m DG column (Agilent Technologies, 70 m × 250 μm × 0.25 μm) using helium as a carrier gas at a constant flow of 1 ml min−1. Separation of compounds lasted for 31.5 min and started at 90 °C followed by a ramp-up to 150 °C at a rate of 30 °C min−1. Afterwards, the temperature rose to 260 °C at a rate of 5 °C min−1 and subsequently to 310 °C at 100 °C min−1 with a final hold of 7 min. Target compounds were unambiguously identified by mass spectra and retention time comparison with pure standards (Extended Data Fig. 10).

Mass spectral data of tert-butyldimethylsilyl- and trimethylsilyl-derivatized compounds were acquired from total ion chromatogram and selected ion monitoring modes. To obtain label enrichment in target compounds, the selected ion monitoring parameters were set in groups with mass-to-charge (m/z) ratios as follows for amino acids and nucleobases: glycine: start time, 12.1 min; ions, M + 0: m/z 246.0, M + 1: m/z 247.0, M + 2: m/z 248.0; dwell time each, 10 ms, methionine: start time, 19.1 min; ions, M + 0: m/z 320.0, M + 1: m/z 321.0, M + 2: m/z 322.0; dwell time each, 5 ms, serine: start time, 19.9 min; ions, M + 0: m/z 390.0, M + 1: m/z 391.0, M + 2: m/z 392.0; dwell time each, 10 ms; 5-methylcytosine: start time, 11.78 min; ions, M + 0: m/z 254.0, M + 1: m/z 255.0, M + 2: m/z 256.0; dwell time each, 5 ms, adenine: start time, 17.3 min; ions, M + 0: m/z 264.0, M + 1: m/z 265.0, M + 2: m/z 266.0; dwell time each, 10 ms, thymine: start time, 8.5 min; ions, M + 0: m/z 255.0, M + 1: m/z 256.0, M + 2: m/z 257.0; dwell time each, 5 ms. Guanine was not quantified due to weak, tailing peaks indicative of adsorption or degradation under hydrolysis/derivatization conditions.

M + 0, M + 1 and M + 2 represent the ions with zero, one and two heavy isotopes (for example, 13C), respectively, and were used to define the isotopic composition of each target compound. Isotopologue fractions (k = 0, 1, 2) were calculated as percentages of the sum of the different isotopologue abundances; naturally occurring isotope abundances were subsequently removed by subtraction of mean isotopologue fractions obtained from labelling controls supplied with the same concentrations of non-labelled substrates (n, biological replicates):

$${\mathrm{M}}+k = \frac{{\mathrm{Abundance}}_{{\mathrm{M}}+k}^{{13\atop}{\mathrm{C}}}}{\sum_{i=0}^{2} {\mathrm{Abundance}}_{{\mathrm{M}}+i}^{{13\atop}{\mathrm{C}}}} \times 100-\frac{1}{n} \sum_{j=1}^{n} \frac{{\mathrm{Abundance}}_{{\mathrm{M}}+k}^{{12\atop}{\mathrm{C}}}}{\sum_{i=0}^{2} {\mathrm{Abundance}}_{{\mathrm{M}}+i}^{{12\atop}{\mathrm{C}}}} \times 100$$

Enzymatic activity analyses

Spectrophotometric MTHFD and THFS activity measurements were adapted from previous methods57. The liquid chromatography–mass spectrometry-based MTHFR activity assay was adapted from methods previously described58. Details of the protocols are available in Supplementary Information.

Thiol, amino acid and adenosine nucleotide quantification

Amino acids, thiol compounds and adenosines were analysed as described in ref. 59, ref. 60 and ref. 61, respectively. A complete description of the method is included in Supplementary Information.

Folate quantification

Folate quantification was performed as described previously62 and is outlined in detail in Supplementary Information.

mRNA-seq data analysis

mRNA-seq reads were processed using the nf-core/rnaseq pipeline (v.3.10.1)63 in Nextflow (v.22.10.6) executed using singularity containers on the high-performance computing cluster at Helmholtz Munich; the default parameters were used for read trimming/filtering, alignment and quantification with Trim Galore! (v.0.6.7)64, STAR (v.2.6.1d)65 and Salmon (1.9.0)66, respectively, except as stated below. TEs and genes were analysed separately. For TEs, reads were aligned to the TAIR10 reference genome using a transcript-based TE annotation67 and the following extra STAR alignment arguments: outMultimapperOrder, Random; outFilterMultimapNmax, 50; alignIntronMax, 10,000; alignSJoverhangMin, 3. For genes, Araport11 gene models and the following extra STAR alignment arguments were used: outFilterMultimapNmax, 50; alignIntronMax, 10,000; alignSJoverhangMin, 3. The read numbers are summarized in Supplementary Table 1.

Transcript abundance estimates from Salmon were used to generate TE/gene-level count matrices for visualization and differential expression analysis with DESeq2 (v.1.38.3)68 in R (v.4.2.2)69. TEs/genes with less than six counts across all samples were removed. Principal component analysis plots were generated after variance stabilizing transformation using within-group variability. Log fold changes were adjusted using the ashr option for effect size shrinkage. TEs/genes with FDR ≤ 0.05 and absolute fold change value ≥2 were counted as differentially expressed. Araport11 gene annotations including gene names were retrieved from Ensembl Plants with the R package biomaRt (v.2.54.1)70. TE annotations including (sub-)family names and genome positions of start and stop sites were imported from the annotated bed file in ref. 67. Volcano plots were generated with the R package EnhancedVolcano (v.1.16.0)71. Overlaps between differentially expressed TEs/genes from different comparisons were determined on the basis of TE/gene identifier (TAIR ID) and visualized with the R package VennDetail (v.1.14.0)72. Heat maps (Figs. 1e and 4e) were generated using the R package ComplexHeatmap (v.2.14.0)73. Transcripts per million (TPM) values were obtained from the Salmon output files. Heat map rows were hierarchically clustered by Euclidean distance using Ward clustering. For this, mean TPM values across replicates were log2-transformed after adding 1 pseudo count and subsequently standardized (z score) across groups. Groups were clustered by Euclidean distance using complete linkage clustering. For visualization of co-expression, Pearson correlation coefficients were calculated from z scores and Ward clustered by Euclidean distance. Read coverage at SDC was visualized with IGV (v.2.6.1)74 after averaging bigWig coverage files from the nf-core/rnaseq pipeline across replicates at single base bin size using deepTools (v.3.5.1)75.

WGBS data analysis

The raw reads were pre-processed with Trimmomatic (v.0.39)76 using the parameters shown in Supplementary Table 7. Pre-processed reads were mapped to TAIR10 using Bismark (v.0.23.1)77 with parameters N set to 1 (allowing one mismatch in seed alignments) and X (the maximum insert size for valid paired-end alignments) set to 700. Duplicated reads and reads flagged as unconverted due to three or more consecutive methylated CHH sites were discarded. Methylation ratios were calculated separately for CG, CHG and CHH sites as no. of C/(no. of C + no. of T). Non-conversion ratios were calculated from chloroplast cytosine methylation ratios. The read numbers and non-conversion ratios are summarized in Supplementary Table 1. Genome browser tracks showing CG, CHG and CHH methylation ratios at single-cytosine resolution were generated from Bismark genome-wide cytosine methylation output files for cytosine sites with at least four reads.

Bismark genome-wide cytosine methylation output files were imported into the R package methylKit (v.1.20.0)78 for further analysis. Cytosine sites with less than four reads were excluded. Chromosomal methylation plots were generated by summarizing the numbers of C and T in 10-kb bins with 10 or more covered cytosines in each of the samples across the genome and subsequently calculating CG, CHG and CHH methylation ratios from pooled replicates. Metaplots were generated by dividing each TE/gene from the TAIR10 genome release into 20 bins of equal size between start and stop sites and 100-bp bins in 1-kb flanking regions and averaging weighted methylation levels79 calculated from pooled replicates for each bin across all TEs/genes. Principal component analysis plots were generated from methylation ratios in 100-bp bins with four or more covered cytosines in each sample across the genome (100-bp tiles). DMRs were called from 100-bp tiles with FDR ≤ 0.01 from a logistic regression test with SLIM adjusted P values and absolute methylation differences of at least 0.4, 0.2 and 0.1 for CG, CHG and CHH sequence contexts, respectively. Overlaps between DMRs from different pairwise comparisons were determined and visualized with Intervene (v.0.6.4)80. DMRs from all pairwise comparisons to the control group were combined, and subsequently all DMRs that were separated by 100 bp or less were merged for cluster analysis and heat map visualization using the R package pheatmap (v.1.0.12)81. CG methylation ratios in merged DMRs were calculated from the numbers of C and T after pooling replicates. DMR CG methylation ratios were subsequently standardized (z score) across groups and hierarchically clustered by Euclidean distance using Ward clustering. Groups were clustered by Euclidean distance from CG methylation ratios using complete linkage clustering.

DNA methylation is maintained by different pathways, including the CMT2 pathway, which acts on pericentromeric heterochromatin, and RdDM, which targets repetitive sequences in chromosome arms82. In addition, mCG in coding sequences of expressed genes constitutes gene body methylation82. Overlap percentages of clustered DMRs with defined methylated genomic regions were determined using the function annotateWithFeatures from the R package genomation (v.1.30.0)83. Methylation hypervariable sites (HV) were defined as in ref. 84, except that the HV regions, which were retrieved from Gene Expression Omnibus (GEO) series GSE165095, sample GSM5026060, were merged with 249 spontaneous DMRs identified in ref. 85. gbM regions were defined by the genome coordinates of gbM genes identified in ref. 86. MET1 regions correspond to met1 CG hypo-DMRs identified in ref. 82 that were retrieved from GEO series GSE39901. Regions methylated by the RdDM pathway were defined as RNA polymerase V target sites identified in ref. 87, and the genome coordinates were retrieved from GEO series GSE100010. CMT2 regions representing constitutive heterochromatin were defined as cmt2 CHH hypo-DMRs identified in ref. 82, and the genome coordinates were retrieved from GEO series GSE39901. CG hypo-DMRs from mthfd1-1 mutants, as previously identified19, that did not overlap with HV, gbM, RdDM or CMT2 regions were defined as Other m. Bins separated by 100 bp or less within each of the defined methylated genomic regions were merged. TAIR10 genomic regions that did not overlap with any of the defined methylated genomic regions were called Rest. The precedence of the defined genomic regions in case of multiple overlaps was HV > gbM > RdDM > CMT2 > Other m > Rest. Genome coordinates of Arabidopsis chromatin states (CS1–36) were retrieved from the Plant Chromatin State Database88. Overlaps between clustered DMRs and each of the CS1–36 were quantified using the function countOverlaps from the R package GenomicRanges (v.1.50.2)89 and visualized with pheatmap81.

To plot correlations between DNA methylation and TE expression in different expression clusters, each differentially expressed TE was divided into four bins of equal size between start and stop sites and 250-bp or 500-bp windows in 1-kb flanking regions to generate a matrix of weighted methylation levels79 from pooled replicates for each bin. Subsequently, the methylation levels were rank-compared to log-transformed TPM values (see ‘mRNA-seq data analysis’) to calculate Spearman correlation coefficients in R.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.