Main

Peatlands are globally important ecosystems and the largest terrestrial carbon store1,2,3,4. Despite covering just 3% of the Earth’s surface, they are estimated to contain up to one-third of the global soil carbon due to accumulation of undecomposed organic matter in anoxic, waterlogged soils1,2,3,4. However, peatlands are sensitive to environmental disturbances such as drainage and desiccation caused by land-use changes, climate change and other anthropogenic impacts5,6,7. When peatlands are degraded, they shift from being carbon sinks to becoming carbon sources, releasing stored carbon as CO2 and exacerbating global climate change7,8,9.

Restoration efforts, often focused on rewetting, aim to reverse these effects by restoring the natural hydrology that maintains peatland carbon storage, but outcomes for soil carbon storage and function are variable, highlighting a need to identify key ecological drivers of recovery10,11,12. Soil microbiomes are central to peatland function, regulating carbon retention and loss3,13. Restoration success probably depends on reestablishing vital microbial processes alongside hydrology and vegetation. Despite their known influence on microbiomes and global biogeochemistry, the roles of viruses in peatland ecology and recovery remain poorly understood.

Soil viruses are now recognized as ubiquitous members of microbiomes and potent regulators of nutrient cycles, as shown in marine environments. Research on peatland viruses is nascent, but pioneering studies indicate that viruses can strongly influence carbon cycling. For instance, a peatland undergoing permafrost thaw revealed viral communities that respond strongly to environmental change: as permanently frozen ‘bogs’ transition to wetter, thawed ‘fens’, viral community composition shifts from ‘soil-like’ viruses to ‘aquatic-like’ assemblages, mirroring changes in hydrology and host communities14. In some peatlands, viruses encode auxiliary metabolic genes (AMGs) that may enhance carbon cycling14,15,16,17. These findings indicate that peatland soil viruses are not passive bystanders, but dynamic, active drivers of carbon flow and ecosystem function. However, comprehensive understanding of their interactions with microbiomes and peatland health remains lacking.

As peatlands are increasingly targeted for restoration to mitigate climate change and preserve biodiversity, understanding how viral dynamics influence microbial processes is crucial for predicting and improving restoration outcomes. We hypothesized that (1) peatlands of varying levels of restoration harbour unique viral assemblages; (2) viral populations mirror host shifts across ecosystem states, including in metabolic contexts; and (3) increased microbial activity in damaged sites favours ‘piggyback-the-winner’ dynamics, boosting lysogeny and favouring fast-growing hosts. By examining the relationships between viral communities, microbial hosts and environmental factors, our study enhances our understanding of the ecological and functional roles of viruses in peatlands.

Results

Viral communities in UK peatlands

We sampled soils from seven UK peatlands (Fig. 1a and Supplementary Table 1) spanning a gradient of ecosystem health statuses (EHSs): near-natural (undrained/undisturbed reference, hereafter natural), damaged (drained/eroded) and restored (formerly damaged, then rewetted)18. Damaged peatlands were probably drained for decades, although the exact duration is unknown. Restoration age varied by site but was within 10 years of sampling. Compared with natural sites, damaged sites have drier, more oxygenated soils and higher community-wide growth rates18. Restored sites show signs of recovery but remain chemically distinct from natural peatlands18.

Fig. 1: Geography and ecosystem health structure peatland environmental variation.
Fig. 1: Geography and ecosystem health structure peatland environmental variation.
Full size image

a, Map of the seven peatland sites. Natural, restored and damaged sampling blocks of EHS are coloured by their average EHI. The map shading indicates peatland cover. No natural blocks were sampled at Stean. b, PCoA of viral community Bray–Curtis dissimilarities (n = 60 soil samples). ANOSIM (999 permutations) shows significant separation by site (R = 0.656, P = 1.0 × 10−3, unadjusted). c, Site-specific PCoAs with ANOSIM statistics for separation of communities by EHS (999 permutations; exact R and P values shown on plots). d, The PCoA from b coloured by EHI. PerMANOVA (two-sided, 999 permutations) reports the marginal R2 for variance in community dissimilarity explained by EHI (R2index = 0.029, P = 4.5 × 10−3 BH-adjusted) and EHS (R2status = 0.051, P = 1.0 × 10−4, BH-adjusted). e, Linear regression of PCoA axis 1 against EHI, where the black line represents the fitted regression mean and the shaded band indicates the 95% confidence interval around the fitted line. Regression statistics from a linear model of the two axes are provided (R2 = 0.30, P = 5.28 × 10−6).

Source data

We sequenced community DNA of soil samples and co-assembled metagenomes by combining triplicate sequence read libraries from each sampling site and EHS, yielding 22 assemblies (one per site × EHS combination) of high quality (Supplementary Tables 1 and 2). We identified 3,177 viral scaffolds across all sites and EHS from metagenome co-assemblies, which were binned into 2,281 viral metagenome-assembled genomes (vMAGs) (Supplementary Table 3). These genomes were dereplicated and clustered into 1,548 virus species-level clusters which were analysed downstream using virus species-level representative genomes.

Environmental differences between EHSs

A principal components analysis (PCA) across sites revealed that EHS generally reflected the composition of environmental parameters (Extended Data Fig. 1a). PCA loadings indicated that total carbon (0.72), pH (0.63) and oxygen concentration (0.61) were the strongest parameters, with moisture (0.57), total nitrogen (0.51) and conductivity (0.38) being also important, supporting EHS’s role in structuring peatland soils. Site-specific effects were strong, as shown by separate PCAs within sites, where the relative influence of environmental variables differed (Extended Data Fig. 1b). To capture the complex variation across sites and EHSs, an ecosystem health index (EHI) was previously calculated for each sample18 (Fig. 1a), incorporating peat chemistry, oxygen, moisture and vegetation. This index provides a holistic, continuous measure of peatland ecosystem health that effectively reflects variation across all samples.

While EHS grouped samples within sites, the degree of separation varied, suggesting that local environmental conditions matter. This is consistent with the fact that these sites span a climatic gradient and vary in their level of degradation and restoration18. Nevertheless, soils from damaged peatlands were less waterlogged, more oxygenated and more acidic compared with the natural contrast where soils from restored peatlands demonstrated signs of mitigation. Overall, these results indicate that while site-level differences are prominent, EHS captures meaningful environmental variation across peatland sites. Likewise, changes in EHI offer a useful means to track relative improvements in ecosystem health when comparing areas of varying EHSs within a given peatland.

Geography and ecosystem health structure peatland virus communities

To explore the drivers of viral community composition, we performed principal coordinate analysis (PCoA) and found that geography was the primary structuring factor (Fig. 1b). Although the viral communities of some samples from different sites were similar in composition, samples in the PCoA were mostly grouped by their geographic origin (R = 0.656, P = 0.001, analysis of similarities (ANOSIM)). The influence of EHS on community structure became more apparent when analysing sites separately. Within sites, we observed a strong separation of samples by EHS (R > 0.5, P < 0.05, ANOSIM; Fig. 1c). The exception was Stean, where lack of natural reference samples probably reduced power. Alongside overall EHS grouping, we mapped the previously calculated EHI for each sample18 to our PCoA (Fig. 1d) and found that EHI also significantly impacted viral community structure (R2 = 0.029, P = 4.5 × 10−3, PerMANOVA; Supplementary Table 4) independently of EHS (R2 = 0.051, P = 1 × 10−4). EHI was also strongly positively correlated with virus community PCo1 (Fig. 1e), providing further evidence that ecosystem health is a significant factor that drives viral community structure. Host community composition (Extended Data Fig. 2) also significantly impacted viral community structure, but this did not overshadow the independent effects of sample site and ecosystem health (Supplementary Table 4 and Supplementary Results).

Peatland soils contain a mix of endemic and shared viral populations

Considering the strong effects of geography and ecosystem health on structuring environmental variation across sites, we examined the degree of endemism among our identified virus species. Most viral genomes were detected in soil metagenomes from multiple sample sites (76% of species representatives, Fig. 2a). However, 54% of virus species were endemic to individual EHS (found exclusively in one of natural, damaged or restored soils across all sites) compared with 46% that were shared (Fig. 2b). We also assessed whether the viral genomes identified were largely novel or instead represented in published soil virus databases. We gathered a comprehensive collection of genomes from the three largest and most recent soil virus databases17,19,20. We found that more viruses from this study formed genus-level genome clusters with viruses from other databases than other viruses from this study (Fig. 2c). Thus, many viral genomes clustered with known soil viral genomes from other ecosystems, indicating that not all are unique to peatlands at the genus level. These results suggest that soil viruses in UK peatlands share a core of virus lineages with other soils, alongside a substantial fraction of locally endemic viruses.

Fig. 2: UK peatland soil viruses are widely distributed across EHSs, sample sites and databases.
Fig. 2: UK peatland soil viruses are widely distributed across EHSs, sample sites and databases.
Full size image

a,b, Detection of species-representative virus genomes across soils from different sample sites (a) and soil with different EHSs (b). For simplicity, intersections with <15 genomes are omitted in a and b. c, Intersections of databases represented in genus-level clusters of the viruses here and other genomes from three soil virus databases17,19,20 (n = 729,998 genomes). Bars represent clusters with genomes originating from each database in the intersection. Numbers above or inside bars indicate the total number of genome clusters with genomes originating from each database in the intersection.

Source data

Viruses are differentially abundant across EHSs

Having established that ecosystem health significantly shapes viral communities, we next identified viruses that were differentially abundant across EHSs. Using DESeq2 (ref. 21), we created ecosystem health ‘trend groups’ for a qualitative analysis of functions (Supplementary Results and Extended Data Fig. 3). Host genomes were also differentially abundant across EHSs and were clustered into trend groups (EHS group; Supplementary Results and Extended Data Fig. 4). For detailed information on the distribution and clustering of viral species-representative genomes, see Supplementary Results. Across all sites, there was a greater proportion of damaged-enriched viruses (37%) than restored-enriched viruses (33%) and natural-enriched viruses (29%) (Fig. 3a). This contrasted with trends for hosts, indicating that damaged peatlands host a greater share of enriched viruses among differentially abundant groups. In summary, the differential abundance of viral species across EHSs shows that environmental health strongly influences viral population sizes, which vary strongly between natural, restored and damaged peatland soils.

Fig. 3: Relative abundance of viruses and hosts across EHSs.
Fig. 3: Relative abundance of viruses and hosts across EHSs.
Full size image

a, The relative abundance of virus (n = 1,448 genomes) and host (n = 411 genomes) genomes differentially abundant across EHS ‘trend groups’. b, The relative abundance of host genomes (n = 411 host genomes, top row) and virus genomes (n = 1,351 virus genomes, bottom row), categorized by host class. Only differentially abundant host and viral genomes are included. Host classes are labelled as follows: Domain (A = Archaea, B = Bacteria); Phylum; Class. Individual classes that were <1% in relative abundance were grouped into the same ‘<1% abundant’ category. c, Abundance ratios of differentially abundant hosts encoding eight key metabolic functions relevant to peat soils (n = 407 host genomes) and their predicted viruses (n = 550 virus genomes).

Source data

The abundance of viruses across EHSs is discordant with dominant host taxa

We examined the relative abundance of differentially abundant bacterial and archaeal MAGs and their predicted viruses at the host class level within each EHS (Fig. 3b). Relative abundances of viruses and their hosts varied substantially across sites. However, when examining all sites together, viruses infecting hosts in the phyla Actinomycetota, Desulfobacterota and Planctomycetota showed a marked decrease in abundance from the natural to the restored trend groups. This was met with an increase in viruses infecting Pseudomonadota hosts, particularly Alphaproteobacteria. Viruses of Alphaproteobacteria and Desulfobaccia also increased in abundance from natural to damaged trend groups. These viral abundance shifts did not mirror host changes. For example, while the relative abundance of Pseudomonadota viruses surged from natural to restored groups, the abundance of Pseudomonadota hosts remained stable. Similarly, Desulfobaccia hosts made up only 1.8% of damaged-enriched hosts, yet Desulfobaccia viruses represented 6.6% of damaged-enriched viruses. These findings show that viral and host dynamics across EHSs are discordant, suggesting that viral responses to environmental changes may depend on factors beyond host availability.

Viral and host dynamics across key biogeochemical functions in peatlands

Given that peatlands at different EHSs are chemically distinct, we explored whether viruses infecting microbes with key biogeochemical functions changed across EHSs, and whether these changes reflected overall viral and host abundance trends. We calculated the relative abundance of viruses infecting hosts with eight metabolic functions (Supplementary Table 5) within each EHS group, normalized by overall viral abundance within that same trend group (Fig. 3c). Similar calculations were done for predicted hosts. Below, we focus on notable trends, but comprehensive results for all eight metabolic functions are provided in Supplementary Results.

Across the eight metabolic functions, several key trends stood out. For oxidative phosphorylation, viral abundance (n = 451 virus genomes) increased from natural (1.00) to restored soils (1.12, +11%), but decreased again from restored to damaged (0.89, −25%). Host abundance (n = 365 host genomes), in contrast, decreased from natural to restored (0.82, −32%). For fermentation, viral abundance (n = 491 virus genomes) remained stable between natural (1.07) and restored soils (1.08, +1.7%) but showed a decrease in damaged soils (0.87, −22%), paralleling a similar decrease in host abundance (−32%, n = 477 host genomes). Carbohydrate degradation showed minor changes, with viral abundance (n = 542 virus genomes) being stable between natural and restored soils (1.05) but decreasing slightly in damaged soils (0.91, −15%), alongside minor fluctuations in host abundance (n = 590 host genomes). For assimilatory and dissimilatory sulfate reduction, there were decreases in viral abundance (−52% and −36%, n = 20 and n = 105 virus genomes) from natural to restored soils, accompanied by an even greater decrease for their hosts (−159% and −48%, n = 34 and n = 68 host genomes). From restored to damaged soils, viruses infecting hosts with these functions increased by 25% (assimilatory) and 17% (dissimilatory). Although assimilatory sulfate-reducing hosts showed a major increase of 70% from restored to damaged soils, dissimilatory sulfate-reducing hosts declined by 25% over the same transition, contrasting with the pattern observed for their viruses. A similar pattern was observed for thiosulfate oxidation (n = 33 virus genomes, n = 16 host genomes). It is important to note that these percentage changes reflect descriptive trends based on aggregated ratios and were not subjected to null hypothesis testing (see Methods). These patterns suggest that while virus and host dynamics often align, the enrichment of hosts with specific metabolic functions, such as oxidative phosphorylation and sulfur cycling, can sometimes diverge from the enrichment of viruses that infect them across different EHSs.

Viral proteins are functionally distinct across EHSs

To assess viral functional differences across EHSs at the protein level, we clustered protein-coding viral genes from all sites and examined their distribution across EHSs (Fig. 4a). The three largest groups were protein clusters unique to individual EHS, suggesting that soils of each EHS harbour viruses encoding proteins with distinct functions. This degree of adaptation is notably greater than what we previously observed at the genome level (Fig. 2b). These patterns indicate that common pools of viral genomes exist across EHSs, but their functional potential is locally adapted to their specific environmental conditions. Despite these functional distinctions, the distribution of functional categories (based on PHROG22) across the intersections remains consistent (Fig. 4a). Therefore, while viruses are specialized at the protein level, they perform similar high-level functions across all EHSs. Overall, our results demonstrate that viral protein functions are finely tuned to their environments, even when broader categories are conserved.

Fig. 4: Viral protein-coding genes and AMGs across EHSs.
Fig. 4: Viral protein-coding genes and AMGs across EHSs.
Full size image

a, UpSet plot showing amino acid identity-based clustering of all viral protein-coding genes (n = 77,662 genes). Intersections represent protein clusters with viral proteins from multiple EHSs, while non-intersecting groups represent proteins unique to a single EHS. The distribution of PHROG22 functional categories across these intersections is shown in the stacked bar plot at the top. b, UpSet plot of unique KEGG KOfams24 (n = 59 families) among viral AMGs (n = 100 genes) across EHSs. Intersections indicate KOfams shared across different EHSs, while non-intersecting groups highlight KOfams unique to a single EHS. The stacked bar plot at the top illustrates the distribution of KEGG metabolism categories associated with these viral KOfams across the intersections.

Source data

We also focused on viral AMGs23 and their distribution across EHSs (Fig. 4b). AMGs are host-derived proteins with metabolic functions that provide viruses with evolutionary and fitness benefits. Similar to the all-protein results, the largest intersections correspond to KEGG24 protein families unique to individual EHS, reinforcing the idea that the metabolic functions encoded by these viral genomes are distinct across different environmental conditions. Likewise, the distribution of high-level KEGG metabolism categories across the major intersections remained largely similar, with categories such as ‘Carbohydrate metabolism,’ ‘Metabolism of cofactors and vitamins’ and ‘Amino acid metabolism’ being well represented. Yet, there was a small increase in the proportion of energy metabolism genes in the damaged-only samples compared with the natural-only and restored-only samples, with predicted functions involved in sulfur metabolism (K20034 3-(methylthio)propionyl-CoA ligase), methane metabolism (K16370 6-phosphofructokinase 2 and K15229 methylamine dehydrogenase heavy chain) and oxidative phosphorylation (K02107 V/A-type H+/Na+-transporting ATPase subunit G/H). This subtle shift may indicate functional adaptation, with viruses in damaged, oxygenated soils potentially playing a more active role in processes linked to electron transport in their hosts for their selfish benefit25,26. Altogether, viral proteins and AMGs are not distinct at high-level functions across EHSs, yet they are locally adapted to specific restoration contexts.

Virus–host infection dynamics change with EHS

Viruses are dependent on their hosts to replicate, but their modes and rates of replication vary27,28. To this end, we next investigated virus–host infection dynamics using genome abundances of our bioinformatically predicted virus–host pairs. Linear regressions between total virus abundance and total host abundance across EHSs reveal complex interactions that vary by phylum (Fig. 5a). Notably, while the slopes of these regressions change within each phylum depending on EHS, all slopes are consistently less than 1. For example, in Acidobacteriota, the slopes are 0.567 (R2 = 0.84, BH-adjusted P = 2.67 × 10−7, n = 18 soil samples) in natural soils, 0.812 (R2 = 0.55, BH-adjusted P = 3.81 × 10−4, n = 19 soil samples) in restored soils and 0.719 (R2 = 0.84, BH-adjusted P = 3.19 × 10−6, n = 15 soil samples) in damaged soils, indicating that host genomes are generally more abundant than their associated viral genomes across all EHSs at the phylum level. This pattern suggests chronic or non-lytic modes of infection at high host densities, known as ‘piggyback-the-winner’ dynamics28,29, where viruses coexist with their hosts through non-lethal replication strategies, such as lysogeny, involving integration into the host genome. This pattern was also observed for viruses and hosts of other dominant phyla, but the strengths of these relationships were susceptible to changes in EHS (see Supplementary Results).

Fig. 5: Dynamics of virus–host interactions across EHSs.
Fig. 5: Dynamics of virus–host interactions across EHSs.
Full size image

a, Linear regressions of virus and host abundances by EHS. For clarity, only the six host phyla with the highest number of observations are shown. Linear model statistics (two-sided) for each phylum × EHS combination are provided, where m is the slope of the best-fit line (shown as a solid line representing the fitted regression mean) and P gives BH-adjusted P values for the significance of each regression slope. Dotted lines represent a hypothetical slope of m = 1. b, Average trimmed mean genome coverage of lysogenic viruses per sample (normalized by the total trimmed mean coverage of all virus genomes; n = 60 soil samples) across all sites, grouped by EHS. Significant pairwise contrasts among EHSs are shown (estimated marginal means, two-sided, *P ≤ 0.01, BH-adjusted), determined from a linear mixed-effects model with sample site as a random intercept. Boxplots: centre line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, individual data points. c, Linear mixed-effects model predicting normalized lysogenic virus abundance per sample in b (n = 60 soil samples) from EHI, with site as a random intercept. The black line shows the marginal fitted values (population-level mean predictions) from the linear mixed-effects model, and the shaded band represents the corresponding 95% confidence intervals. The marginal (R2marg. = 0.18) and conditional (R2cond. = 0.42) R2 of the model fit are shown, and P = 6.9 × 10−4 (unadjusted) reflects the result of a Type II ANOVA assessing the significance of EHI as a fixed effect in the model.

Source data

Lineage-specific shifts in lysogeny and induction across EHSs

While ‘piggyback-the-winner’ dynamics prevailed in UK peatland soils, patterns of temperate (hereafter lysogenic) and actively replicating viral abundance across EHSs highlighted significant shifts in virus–host interactions. We identified 297 lysogenic viruses in total, 13% of all identified viruses, and analysed their abundances in each sample. Patterns of lysogenic virus abundance varied across sites, with no significant differences in their raw mean abundances when aggregating all sites (Extended Data Fig. 5). However, when normalizing lysogenic virus abundance by the total virus population in each sample (Fig. 5b), we found that the proportion of lysogenic viruses was significantly lower in natural and restored soils compared with damaged soils (estimated marginal means, BH-adjusted P = 0.0300 and P = 0.0398, respectively). This suggests that lysogenic viruses contributed more substantially to the overall viral community in damaged soils. Furthermore, when modelling normalized lysogenic virus abundance as a function of the EHI while accounting for site-level variation (Fig. 5c), we observed a significant negative relationship (marginal R2 = 0.18, conditional R2 = 0.42, χ2 = 11.52, BH-adjusted P = 6.9 × 10−4, Type II ANOVA, n = 60). This indicates that the relative abundance of lysogenic viruses increases with peatland degradation. Together, these findings suggest an increase in the replication of lysogenic viruses as peatlands shift from natural to damaged states.

We aimed to identify actively replicating viruses in our samples by calculating virus-to-host abundance ratios (also known as virus:microbe ratio, or VMR) (Extended Data Fig. 6). We considered a virus to be ‘active’ if the virus:host ratio exceeded 10. Using this threshold, we identified 51 active viruses across 46 samples. This represented 10% of all viruses with host predictions and non-zero virus and host abundances. Of the 51 active viruses, 27 (53%) were also predicted to be lysogenic, accounting for 9.1% of all predicted lysogenic viruses. Thus, these active lysogenic viruses probably underwent recent induction at the time of soil sampling. Among them, 26% were active in natural soils, 41% in restored soils and 67% in damaged soils (13 lysogenic viruses were active in more than one sample, explaining why the total exceeds 100%). We also found that EHS had a significant effect on virus:host ratios, but the effects varied by the host family (see Supplementary Results). In summary, these results support our observation that both overall viral genome abundance and the proportion of lysogenic virus genomes increase in damaged soils, with a subset of these viruses probably undergoing greater induction and replication compared with those in natural and restored peatlands.

Discussion

Peatlands are the world’s largest terrestrial carbon stores1,2,3,4 but are increasingly threatened by habitat destruction, shifting from being carbon sinks to becoming carbon sources5,6,7,8,9. Since carbon cycling in peatlands is primarily driven by soil microorganisms3,13, understanding how environmental damage and restoration affect soil microbiomes is crucial for managing peatlands and mitigating their carbon emissions. Here we show that restoration of peatland ecosystem health (1) significantly shaped viral community composition, (2) enriched viruses infecting specific microbial lineages and functional groups, (3) selected for distinct viral protein functions and (4) altered virus–host population dynamics, advancing our understanding of how environmental change impacts soil viruses and their roles in global carbon cycling (Fig. 6).

Fig. 6: Summary of dynamic viral communities across an ecosystem health gradient in peatland topsoils.
Fig. 6: Summary of dynamic viral communities across an ecosystem health gradient in peatland topsoils.
Full size image

Along a gradient from natural peatlands (high water table, anoxic, net carbon sink) to restored and damaged sites (low water table, more oxic conditions, net carbon source), viral relative abundance, infection strategy (lysogeny) and host associations shift. These changes lead to distinct viral communities and protein functions, and are met with enrichment of viruses of sulfate reducers in natural soils, aerobic hosts in restored soils and depletion of viruses that infect fermenters and carbohydrate degraders in damaged soils. Illustration created with BioRender.com.

We found that viral abundance and composition often diverge from those of their microbial hosts across EHSs, rather than mirroring host populations. This decoupling, particularly notable in carbon- and sulfur-cycling hosts, suggests that viral responses are influenced by factors beyond host availability, potentially by environmental stressors such as nutrient shifts30 or soil chemistry changes15, or by host physiological responses affecting susceptibility to infection30. In parallel, viral proteins showed local adaptation to ecosystem health, with distinct metabolic functions detected in damaged soils, including AMGs involved in methane metabolism, oxidative phosphorylation and sulfur metabolism. However, our metagenomic approach captures only potential functions, and functional assays or transcriptomics are needed to clarify the impact of these viral adaptations on ecosystem recovery.

Restoration also shifted viral replication strategies, with an increase in lysogeny and increased activity among a subset of viruses in damaged soils. This aligns with ‘piggyback-the-winner’ dynamics28,29, and is in line with microbial studies showing that damaged peatlands have higher microbial growth rates and population sizes18, and that environmental changes can induce switches in virus lifestyle20,31,32,33. These results highlight the sensitivity of soil viral communities to environmental disturbances, and suggest that shifts in viral replication strategies could serve as indicators of host population densities and EHS in peatlands undergoing restoration. Microbial metagenomic approaches are often biased towards viruses in an intracellular state34,35, but despite this limitation, current bioinformatic tools can recover most environmental viruses from metagenomes36,37 and increasingly offer reliable host range predictions38.

As peatlands are restored to mitigate climate change, understanding virus–host interactions is essential not only for predicting microbial responses, but also for identifying how viral regulation of host populations and metabolism may influence the recovery of carbon storage and other ecosystem functions. Our findings suggest that viruses do not simply track host populations but actively respond to environmental conditions associated with degradation and restoration. Such responses may influence microbial turnover rates, metabolic activity and biogeochemical cycling as they have in other soil ecosystems39,40,41, all of which are critical for peatland functioning as net carbon sinks. Therefore, viruses probably play an underappreciated role in shaping restoration trajectories. Future studies should verify these findings with functional and experimental approaches, particularly focusing on viral influences over key microbial functional groups. Integrating viral community dynamics into restoration monitoring will strengthen our ability to assess and enhance peatland ecosystem recovery.

Methods

Soil sampling and metagenome sequencing

Soil samples were collected between May and October 2021 from seven upland peatland sites across Britain covering a gradient of climatic conditions18. At each site, we sampled three areas with different ecosystem EHSs: a near-natural reference (natural), damaged by drainage or erosion (damaged) and restored by rewetting through drain blocking (restored). Using a standard soil corer or a Russian peat corer, three replicates were sampled per EHS, with the replicates being locally adjacent to minimize the impact of underlying geology and climatic conditions. Each replicate was sampled 5 m apart across a 10-m transect after removing surface vegetation. Most sites underwent restoration within the past 10 years. The exact duration since drainage is unknown, but it probably spanned several decades. Samples from damaged and restored areas were taken 2 m away from drainage features. Samples from each EHS were taken from areas with similar peatland lawns. Overlaying peatland land cover data were obtained from ArcGIS Hub at hub.arcgis.com/datasets/Defra::peaty-soils-location-england (England), hub.arcgis.com/datasets/theriverstrust::unified-peat-map-for-wales (Wales) and hub.arcgis.com/datasets/snh::carbon-and-peatland-2016-map (Scotland). See Supplementary Table 1 for sample metadata and locations.

Total community DNA was extracted from 0.25-g aliquots of homogenized soil collected from the upper 10-cm layer of samples, using the DNeasy PowerSoil Pro kit (Qiagen) following manufacturer instructions. DNA concentration and integrity were evaluated using Nanodrop spectrophotometry and Qubit fluorometric assays. Library preparation was performed with the NEBNext Ultra II FS DNA kit following manufacturer guidelines. Sequencing was conducted on an Illumina NovaSeq platform at the NERC Environmental Omics Facility18.

Analysis of soil environmental parameters

Oxygen concentrations were measured at depths of 0–5 cm and 5–10 cm using a fibre optic oxygen sensor (OXROB10, PyroScience); we then averaged the two measurements for each sample and used the mean in all subsequent analyses. Moisture content was determined gravimetrically and expressed as a percentage of the total mass. pH and soil conductivity were measured from a slurry prepared by mixing 5 g of peat with 25 ml of deionized water. Soil samples were dried, ball milled and subsampled, with 10–12 mg weighed into tin capsules for total carbon and total nitrogen measurements, obtained using an NA 2500 Series elemental analyser (CE Instruments). All environmental measurements were obtained from the same sample aliquots and consolidated into a single dataset keyed by sample identifier and annotated with sampling site and EHS (Supplementary Table 1). For subsequent analyses, soil environmental parameters were either globally scaled (centred and scaled across all sites) or site-specifically scaled (centred and scaled within each site), depending on whether we aimed to emphasize across-site variability or within-site variability.

Because some site × EHS combinations had missing values for at least one environmental parameter, we used a mixed-effects modelling approach to impute these data before PCAs. Specifically, for each variable with missing values, we fitted a linear mixed-effects model with EHS as a fixed effect and site as a random intercept, and used the resulting model to predict missing observations. This allowed us to retain samples that would otherwise be omitted while preserving site- and EHS-specific trends. Oxygen concentration (10/66 samples), pH (6/66) and conductivity (6/66) required imputation; the globally scaled dataset was used when imputing for the overall PCA, and the site-specific dataset was used for each site-specific PCA. No other parameters required imputation. PCAs were then conducted on the imputed datasets using the ‘prcomp’ function in R. We performed one PCA including all samples (globally scaled) and separate site-specific PCAs (site-specifically scaled).

Sequencing read quality control and metagenome co-assembly

Metagenome sequence reads underwent quality control, filtering, assembly and formatting using the Anvi’o v.8 metagenomics workflow42. Raw sequence reads were quality filtered with Illumina-utils (v.2.13)43. Filtered read libraries, generated in triplicate from the same sample site × EHS combinations (Supplementary Table 1), were co-assembled into metagenomes using MEGAHIT (v.1.2.9)44, utilizing the ‘meta-large’ preset to optimize k-mer selection for large complex metagenomes such as those found in soil. A minimum contig length of 1 kb was enforced. Metagenome assembly statistics were evaluated with metaQUAST (v.5.2.0)45, and filtered reads were mapped back to their respective metagenomes using Bowtie2 (v.2.5.1)46 to assess read recruitment (Supplementary Table 2).

Host genome binning, quality control and taxonomic assignment

For each metagenome co-assembly, contigs were binned into MAGs using MetaBAT 2 (v.2.15)47, utilizing the metagenome read-mapping files described above to aid in binning. A minimum percent identity of 97% and a contig length of at least 1 kb for mapped reads were required. Binning was performed with a minimum contig size of 2.5 kb, and default MetaBAT 2 parameters were applied. The completeness and contamination of the bins were assessed with CheckM (v.1.2.2)48, using the lineage workflow. On the basis of CheckM results, bins were categorized into high-quality (≥90% completeness and ≤10% contamination), medium-quality (≥50% completeness and ≤10% contamination) and low-quality (<50% completeness or >10% contamination) MAGs. Taxonomic assignments for medium- and high-quality MAGs were determined using the GTDB-tk v.2.3.2 de novo workflow49. Patescibacteria and Altiarchaeota were selected as the bacterial and archaeal outgroups for phylogenetic tree inference, as the CheckM marker gene lineage results indicated that these phyla were underrepresented among the medium- and high-quality MAGs. Only high- and medium-quality MAGs were included in subsequent analyses involving host genomes.

Viral sequence identification, binning, host prediction and species cluster formation

ViWrap (v.1.3.0)50 was used to process each metagenome co-assembly, running on the metagenome contigs along with their associated triplicate filtered read pairs. The parameters ‘–identify_method genomad’ and ‘–input_length_limit 2000’ were specified to identify viral contigs using GeNomad (v.1.7.4)37 and to enforce an initial minimum viral contig length of 2 kb. ViWrap utilized Bowtie2 v.2.4.5 in ‘end-to-end’ mode to map each filtered read pair to the viral contigs identified by GeNomad, generating the necessary coverage files for binning. ViWrap then binned viral contigs into vMAGs using vRhyme (v.1.1.0)51 with multisample read coverage statistics. Both binned viral contigs and unbinned single-contig viral genomes are hereafter referred to as vMAGs.

Upon completion of ViWrap for each co-assembly, vMAGs and their summary information were extracted and renamed using custom Python scripts. Host genomes and taxonomy for all generated vMAGs were predicted with iPHoP (v.1.3.3)38, using a custom host genome database that included both the default ‘iPHoP_db_Aug23_rw’ genomes and the high- and medium-quality host MAGs described earlier. To ensure that iPHoP did not treat individual contigs in multicontig vMAGs as separate genomes, these contigs were linked by sequences of 1,500 Ns using the vRhyme auxiliary script ‘link_bin_sequences.py’, and the iPHoP parameter ‘–no_qc’ was used to prevent N-linked vMAGs from being discarded. Host predictions with a minimum confidence score of 90% were retained using the default iPHoP parameter ‘–min_score 90’.

vMAGs from all co-assemblies were dereplicated into viral ‘species’-level clusters using dRep (v.3.5.0)52. A minimum representative genome size of 5 kb was enforced with the parameter ‘-l 5000’. The parameters ‘–ignore_genome_quality -pa 0.8 -comW 0 -conW 0 -strW 0 -N50W 0 -sizeW 1 -centW 0’ were applied as recommended by the dRep documentation for non-bacterial/archaeal genomes. In addition, the parameters ‘-sa 0.95’ and ‘-nc 0.85’ were used to form species clusters at 95% average nucleotide identity (ANI) with a minimum aligned coverage of 85%, employing skani (v.0.2.1)53 for genome comparisons.

vMAG genome clustering with soil viral genome databases

To assess how well the vMAGs generated in this study are represented among other described soil viral genomes, we obtained viral genomes from publicly available soil virus databases. To ensure a comprehensive collection, we selected three databases: PIGEON (v.1)17 (filtered to include only viral genomes assembled from soil samples), the Global Soil Virome20 and the Global Soil Virus Atlas19. We clustered the vMAGs with viral genomes from these databases on the basis of amino-acid identity (AAI). Protein-coding genes in all viral genomes were predicted and translated using pyrodigal-gv (v.0.3.1)37,54 (github.com/althonos/pyrodigal-gv). Pairwise AAI measurements were obtained by first creating a protein sequence database with MMseqs2 (v.15.6f452)55, followed by running mmseqs search with the protein database against itself. This was done with a minimum amino-acid sequence identity of 0% to retain all possible pairwise comparisons (mmseqs parameter –min-seq-id 0.0) and a minimum alignment coverage of 30% (mmseqs parameter -c 0.3). The resulting pairwise AAI measurements were computed and used to form approximate genus-level genome clusters using custom Python scripts. See Supplementary Methods for details.

vMAG and host MAG abundance, coverage estimation and presence/absence

To perform differential abundance and beta-diversity analyses of microbial communities from metagenomes, it is essential to use ‘species’ counts from a non-redundant set of taxa21. Filtered metagenome reads were mapped to the dereplicated, species-representative vMAGs using Bowtie2 v.2.4.5 with the ‘–sensitive’ parameter in ‘end-to-end’ mode. Read-mapping files were then sorted and indexed using SAMtools (v.1.17)56. Following community-established standards57, read-mapping files were filtered to remove reads with <90% identity using CoverM (v.0.6.1)58. CoverM was also used to generate three tables for the analysis of species-representative vMAGs abundance, for each metagenome read sample, as is common in other viral community studies17,35,41: (1) absolute mapped read counts, (2) trimmed mean genome coverages (with the top and bottom 5% of covered bases removed) and (3) genome coverage fraction (also known as ‘breadth’). A minimum coverage fraction of 0 was used in generating each table.

Host MAGs were dereplicated using dRep, with the only changes to default parameters being the use of ‘–ignoreGenomeQuality’ (since quality had already been assessed) and ‘–S_algorithm skani’ to use skani for genome comparisons. Filtered metagenome reads were mapped to the species-representative host MAGs, and abundance and coverage statistics were generated using the same tools and parameters applied to the vMAGs.

To assess the distribution of species-representative viral genomes across sample sites and EHSs, we used a minimum genome breadth of 50% to consider a viral genome as present in a given sample.

Statistical analyses of viral and host community composition

We assessed viral and host community composition across EHSs by calculating Bray–Curtis dissimilarities from normalized, species-representative genome coverage data, followed by PCoA. We restricted viral community analyses at the Langwell site to replicates with the longest post-restoration duration. A minimum genome breadth of 0.50 was used to filter genome abundances before analysis to avoid false positives. We tested for separation of samples by sampling site and by EHS using ANOSIM. To identify ecological drivers of viral community structure, we performed permutational multivariate analysis of variance (PerMANOVA) with site as a blocking factor, testing contributions from host community composition (via host PCoA axes), EHS and a continuous EHI. We confirmed homogeneity of dispersion before all PerMANOVA analyses. Variance partitioning and distance-based redundancy analyses (dbRDA) were used to quantify the relative contributions of host composition, site and EHS to viral community structure. See Supplementary Methods for more detail on statistical analyses of viral and host community composition.

Viral and host genome differential abundance and EHS group assignment

The table of absolute mapped read counts for species-representative viral genomes, generated as described above, was used as the input for differential abundance analysis. Normalization was not performed, following software recommendations21. The genome count table was then split by sample site and used for differential abundance analysis with the R package DESeq2 (v.1.44.0)21, performed separately for each sample site. Sample was included as a factor in the negative binomial generalized linear models fitted with DESeq2, using a likelihood ratio test to compare the full model (including EHS) to a reduced model (intercept only). P values from these tests were adjusted using the false-discovery rate (FDR) method, with a maximum FDR-adjusted P value of 0.05 to infer viral genomes that were differentially abundant across EHSs at each site. This workflow was also applied to host genome counts to identify differentially abundant host genomes.

To determine which of the differentially abundant viral and host genomes were enriched in soils corresponding to each EHS, we performed hierarchical clustering of their trimmed mean genome coverages in R, following similar approaches used in past viral community ecology studies that analyse abundance patterns across groups17,35. For each sample site, the same normalized trimmed mean genome coverages used in the community composition analyses were filtered to include only the differentially abundant viral/host genomes. These filtered trimmed mean coverages were converted into relative abundances (relative to the total abundance within each sample). Z-scores were calculated for the relative abundances of each viral/host genome in each sample, and the mean z-score was calculated for each genome across the different EHSs. Euclidean distances were calculated from the resulting mean z-scores, with ‘NA’ or missing values set to zero to maintain compatibility for clustering. The resulting viral and host distance matrices were hierarchically clustered using the R function ‘hclust’ with the ‘ward.D’ method. The cluster trees were cut into three groups, as there were three EHSs. The previously calculated mean z-scores for each viral/host genome in each EHS were plotted for the three groups (Extended Data Figs. 3 and 4). These plots for each sample site were then inspected to assign the viral/host genomes in each group to one of three ecosystem health trend groups (EHS groups): ‘Natural-enriched’, ‘Restored-enriched’ or ‘Damaged-enriched’, on the basis of their abundance patterns.

Host MAG metabolic function predictions

Putative metabolic functions encoded by host MAGs were predicted using METABOLIC (v.4.0)59. We focused on eight functions relevant to peatland soil ecosystems, including oxidative phosphorylation, methanogenesis, fermentation, carbohydrate degradation, aromatics degradation, assimilatory and dissimilatory sulfate reduction and thiosulfate oxidation. Function presence was inferred from KEGG module and functional annotations reported by METABOLIC-C. For full pathway definitions, the justification for their inclusion and KEGG module-level criteria, see Supplementary Methods.

Viral and host relative abundance across EHS groups

We quantified virus and host genome relative abundances across EHSs using normalized genome coverage data, filtered to retain differentially abundant genomes assigned to one of the three EHS groups. Relative abundances were compared across host class and groups of host genomes encoding the metabolic functions identified above, with enrichment of a metabolic function assessed by calculating relative abundance ratios of genomes encoding the function normalized to overall viral and host abundance in each EHS group. For full details on relative abundance calculations across EHS groups, see Supplementary Methods.

Total virus over total host abundance regressions

To analyse the relationship between the abundance of viruses and their hosts across EHSs, normalized trimmed mean genome coverages of vMAGs and host MAGs from all sample sites were filtered to include only predicted virus–host pairs and only values >0. In addition, we required that both viruses and hosts were assigned to the same EHS group within a given sample to ensure that abundance comparisons accurately reflected shared ecological contexts and trends, preventing potential bias from mismatched EHS dynamics. The remaining data were then summarized at the host phylum level to provide a broad overview of virus–host relationships for specific host lineages. Specifically, the total abundance of hosts within each phylum and their associated predicted viruses were calculated for each sample, site and EHS combination. Linear models of virus-to-host abundance relationships were fitted for each host phylum–EHS combination using the R function ‘lm’, adjusting the resulting P values using the Benjamini–Hochberg (BH) method.

Lysogenic virus abundance and statistical analysis of active viruses

To assess temperate (hereafter lysogenic) virus population dynamics across EHSs, we identified temperate phages using the classifications provided by ViWrap (see github.com/AnantharamanLab/ViWrap#notes and github.com/AnantharamanLab/vRhyme#interpreting-vrhyme-binsvmags-) and normalized their abundances by the total virus abundance per sample. Differences in normalized lysogenic virus abundance across EHSs and its relationship with a continuous EHI were evaluated using linear mixed-effects models. Viral replication activity was estimated by calculating virus-to-host abundance ratios, with active viruses defined as those exceeding a 10:1 ratio. See Supplementary Methods for more detail on our statistical analyses of lysogenic virus abundance.

vMAG protein clustering

To assess the distribution of homologous proteins across EHSs, we clustered the translated amino acid sequences of all protein-coding viral genes, obtained as described previously (see ‘vMAG genome clustering with soil viral genome databases’). Protein clustering was performed with MMseqs2 v.15.6f452 using the ‘mmseqs cluster’ command and the parameters ‘–cluster mode 0 –cov-mode 0 -s 7.5’ as well as ‘–min-seq-id 0.25 -c 0.5’ to enforce a minimum sequence identify of 25% and alignment coverage of 50% to ensure that alignments were representative of whole proteins rather than individual domains. The resulting protein cluster table was used to analyse the intersections of EHS membership for proteins within protein clusters as described below.

vMAG protein functional annotations, AMG prediction and curation

An HMMsearch60 was performed on all vMAG-encoded amino acid sequences using profile HMMs from multiple databases, including PHROGs (release 2022-01-17x)22 and KEGG KOfam (March 2019 release)24. See Supplementary Methods for a full list of databases used, their versions and details on HMM searches. To identify putative AMGs encoded by vMAGs, we employed a conservative approach that utilized functional annotations and genomic context statistics to avoid false-positive predictions, following community standards61. Briefly, we searched protein functional annotations for metabolic functions, removed proteins with functions that are commonly misannotated as AMGs17,61, and removed likely non-viral protein contamination. For more detail, see Supplementary Methods. Although AMG functional assignments were available from multiple databases, only the remaining filtered and curated AMGs with KEGG KOfam annotations were retained for analysis of intersections across EHSs. This decision was made to simplify visualization and because KEGG KOfams encompass broader functional categories.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.