Abstract
P2-like viruses (P2Vs), temperate phages of the Peduoviridae family, possess a lysogenic lifestyle that renders traditional isolation unsuitable, hindering our understanding of their diversity and ecological roles. Utilizing profile-hidden Markov models to analyze public databases, this study achieved four major advances: (1) Identified 5945 P2V genomes and built the P2V Genome Dataset (P2VGD), expanding known genomes 48-fold; (2) Phylogenetic classification into 13 superclades and 4671 genus-level clusters, a 106-fold diversity increase; (3) Identification of 757 auxiliary metabolic genes with biome-specific metatranscriptomic activities: human gut P2Vs encode antibiotic efflux and cell-wall remodeling enzymes, while oligotrophic marine P2Vs express glycan-scavenging enzymes; (4) Uncovering their widespread distribution across diverse biomes, dominated by host-associated environments but extending to under-sampled natural ecosystems. These findings greatly expand the genomic and functional understanding of P2Vs, highlighting their biological strategies in host adaptation, virus-host co-evolution, and microbiome dynamics.
Similar content being viewed by others
Introduction
Viruses, recognized as the most abundant biological entities in global oceans, are considered as important drivers of marine ecosystems1,2,3. Within these viral communities, bacteriophages constitute the dominant component and significantly shape microbial community structure, functional dynamics, and global biogeochemical processes3,4,5. Viruses can influence host diversity, evolution, and metabolism through multiple mechanisms, including horizontal gene transfer (HGT) among viral genomes and the expression of virus-encoded auxiliary metabolic genes (AMGs) that modulate host nutrient metabolism, transport, motility, and biofilm formation6. Temperate phages, such as P2-like viruses (P2Vs), can integrate into bacterial chromosomes during lysogeny7. Bacteriophage P2 is a temperate model phage that was first isolated from Escherichia coli in 19518. Phages sharing key genomic and phylogenetic features with bacteriophage P2, including a conserved set of hallmark genes, similar genome organization, and a contractile tail sheath, are collectively referred to as P2-like viruses (P2Vs)9, and have been reported to infect diverse hosts such as Serratia, Klebsiella, and Yersinia10. In 2021, the International Committee on Taxonomy of Viruses (ICTV) created a new family for P2Vs, named as Peduoviridae, which separated P2 viruses from the former Myoviridae family within class Caudoviricetes. P2Vs can enhance host fitness during lysogeny, transfer genes between strains and species11, and act as natural HGT vectors for antibiotic resistance and virulence genes12.
Currently, our knowledge about P2Vs is mainly from the P2V isolates (123 complete genomes deposited in NCBI database) from the clinical and terrestrial environments. However, our knowledge of P2Vs regarding the adaptation mechanisms, host-phage coevolution patterns, and their metabolic reprogramming capabilities across diverse microbiomes (such as human gut and environmental habitats) is still limited. Only recently, one marine P2V strain was confirmed to regulate host biogeochemical cycling13. Traditional plaque-based methods tend to bias against the isolation of temperate phages, as lysogenic phages frequently fail to form clear plaques, which hampers the characterization of marine P2Vs. Consequently, the genomic diversity, distribution and evolutionary dynamics of P2Vs on a global scale remain largely unexplored. During past decades, the culture-independent approaches (e.g., metagenomics) greatly expanded the genome diversity of global viruses14,15,16,17. Based on these viral genomes, the genomic diversity of several important viral families was significantly expanded (e.g., Inoviridae from 56 to 10,295 genomes; Nucleocytoplasmic large DNA viruses, NCLDVs from 205 to 2074)18,19.
Here we performed systematic genomic data mining of publicly available viral genome datasets to expand the diversity of P2-like viruses (P2Vs). A total of 5945 P2Vs were identified from publicly available datasets, including GenBank and Integrated Microbial Genomes with Viruses database (IMG/VR v4). These genomes expanded the number of P2V genomes by about 48-fold. This work provides a comprehensive database of P2-like viral genomes, providing a robust baseline for further investigation about their potential roles in multiple hosts and habitats.
Results
Expansion and global distribution of P2Vs
To construct the P2Vs genome dataset (P2VGD), data from several datasets, including GenBank and IMG/VR v4 database20 were extracted. This process yielded 5945 P2V genomes, including 123 viral isolates, 4853 HQ-UViGs (high-quality uncultivated viral genomes, completeness >80%, contamination < 5%, and including 4315 IMG/VR entries annotated as “Provirus” in the public metadata), and 57 genome-integrated P2-like proviruses identified from bacterial chromosomes, and 912 genome fragments that did not meet the HQ-UViG criteria, which represents a 48-fold expansion of their genomic diversity. The genome sizes of P2Vs showed a relatively constrained distribution, with high-quality genomes enriched around ~35–40 kb (Supplementary Fig. 1A). GC content spanned a broad range (~35–70%) and displayed multiple density peaks across genome types (Supplementary Fig. 1B), suggesting the presence of distinct genomic groups within P2Vs. In addition, genome size differed among isolates, proviruses and UViGs, with proviruses showing a wider spread and occasional larger genomes (Supplementary Fig. 1C). The average amino acid length of predicted proteins also varied across genome types (Supplementary Fig. 1D). Importantly, the distributions of both genome size and GC content were highly consistent between isolated P2Vs and mined P2Vs, supporting the robustness of our genome mining strategy21.
Building on previous reports, and incorporating genomes identified in this study, P2Vs have been found to be widely distributed across different geographical locations and biological communities, including host-associated, aquatic, terrestrial, engineered and air (Fig. 1A and Supplementary Data 1). Within the P2VGD dataset, host-associated environments contained the majority of the P2Vs genomes detected (genome count = 2389) (Fig. 1B), and most of these were human-associated (genome count = 1240). In addition, significant fraction of the P2Vs were found in engineered (genome count = 574), aquatic (genome count = 303), terrestrial (genome count = 331), and air (genome count = 7), emphasizing the habitat diversity of P2Vs. A large number of the P2V genomes (genome count = 113) were also detected in marine environments (Fig. 1C). The recruitment analysis found that both marine and host-associated habitats had a high normalized abundance of P2Vs, with host-associated environments contributing the majority of genomes detected (n = 2389) prior to this discovery (Supplementary Fig. 2, Supplementary Fig. 3).
A Global map showing the sampling locations of P2Vs, with symbol shape and color indicating ecosystem type and symbol size reflecting contig abundance. B Proportional distribution of P2Vs across five ecosystem categories. C Sub-environmental classification of P2Vs, illustrating their ecological breadth across both host-associated and environmental niches.
In addition to the original hosts in which the P2Vs were integrated, the host range of P2Vs was further quantitatively assessed by clustered regularly interspaced short palindromic repeat (CRISPR) spacer matching. Through this analysis, 4921 P2Vs were successfully matched to their putative hosts, accounting for 83% of the total P2V genomes. This CRISPR-based host assignment rate is comparatively high relative to many viral metagenomic surveys, which prompted further consideration of the biological features of P2Vs underlying this pattern. The majority of P2Vs were linked to members of the phylum Pseudomonadota (n = 4906), with only one P2V linked to the family Mycobacteriaceae within the phylum Actinomycetota (Fig. 2 and Supplementary Data 2).
P2Vs are grouped by host family and corresponding order (colored blocks). Panels show: (1) virus count per host family (log scale); (2) genome lengt h distribution (log₁₀ scale); (3) virus quality classification; (4) ecosystem source; and (5) host prediction method. Dominant host taxa include Enterobacteraceae and Burkholderiaceae, with substantial variation in genome completeness, length, ecological origin, and prediction approach. Box plots show the median (center line), interquartile range (box), and 1.5× IQR whiskers; points represent individual observations. Box-plot elements are defined as above for all box plots in this study.
Genomic characterization of P2Vs
To evaluate the global diversity of P2Vs, 123 publicly available P2V genomes were recovered and analyzed, resulting in the identification of seven core viral hallmark proteins: capsid completion/stabilization protein (P2-gpL, PeduoPF0001), terminase small subunit (P2-gpM, PeduoPF0002), major capsid protein (P2-gpN, PeduoPF0003), capsid scaffolding protein (P2-gpO, PeduoPF0004), terminase large subunit (P2-gpP, PeduoPF0005), portal protein (P2-gpQ, PeduoPF0006), and an additional P2-like viral hallmark protein, the tail tape protein (P2-gpT, PeduoPF0007). This finding not only updates the original list of six viral hallmark proteins proposed by ICTV in the family Peduoviridae, but also sets more precise criteria for the identification and classification of P2Vs.
Classification and phylogeny of P2Vs
Genome-based network and clustering analyses were conducted to examine the taxonomic diversity of P2-like viruses (P2Vs). A total of 5945 P2Vs were grouped into 169 viral clusters (VCs), which were further organized into 13 provisional superclades (SpCs) (Fig. 3A and Supplementary Data 1). The largest superclade, SpC_3, included 25 VCs and 1341 P2V genomes, comprising 11 previously isolated P2Vs and 1330 high-quality uncultivated viral genomes (HQ-UViGs) identified in this study (Fig. 4A and Supplementary Data 2). SpC_3 exhibited strong host specificity, with all members predicted to infect Enterobacterales, the dominant known host lineage for cultured P2Vs. In contrast, SpC_1 displayed significantly broader host diversity, including within the Pseudomonadales and Enterobacterale, with Enterobacterales encompassing a large number of distinct families, such as Vibrionaceae, Pasteurellaceae, Aeromonadaceae, Alteromonadaceae, and Shewanellaceae (Fig. 4B and Supplementary Data 2). Additionally, SpC_6, SpC_9, and SpC_13 also exhibited distinct patterns of host specificity similar to SpC_3.
A The genome-content-based network of P2Vs, where nodes represent virus clusters (VCs) and different colors represent different superclades (SpCs). B The number of species and genera in each SpCs analyzed using the VIRIDIC tool intuitively reflects the diversity of viruses at the species and genus level.
Panel A (SpC_3) and Panel B (SpC_1) show the genome similarity networks of P2V viral clusters. Each node represents either a viral cluster (VC) or an outlier P2V genome not assigned to any cluster (unlabeled nodes). Node size is proportional to the number of P2V genomes in the cluster. The colored outer ring of each node indicates the proportion of predicted host taxa at the bacterial family level. Edges represent pairwise genome similarity between VCs and outliers based on shared gene content. Host families are color-coded as shown in the legend.
The ICTV currently classifies the 123 isolated P2Vs in Peduoviridae into 44 genera, by inter-genomic relationship clustering through VIRIDIC (Supplementary Data 3). In this study, the VCs generated in each genome-content-based network were clustered using the same approach to provide a taxonomic view of the 5945 P2Vs. Using VIRIDIC, at the genus level, a total of 4671 genus-level units were generated (Fig. 4B), which increased the P2Vs genera by about 106 times. This underscores the limitations of a P2V classification system based solely on isolated strains.
Phylogenetic analysis of P2Vs
The phylogenetic tree based on the concatenation of seven core hallmark proteins of the identified P2Vs were constructed, and expanded the diversity of Peduoviridae from 123 to 5945 viral genomes, substantially increasing the phylogenetic diversity of Peduoviridae by 48-fold (Fig. 5A). The superclades (SpCs) identified from the VC–VC similarity network (Fig. 3A) were generally consistent with the major clades observed in the genome-based phylogenetic tree (Fig. 5A, Supplementary Fig. 4), although some SpCs showed partial overlap.
A Maximum-likelihood phylogenetic tree of 5,945 P2Vs constructed from concatenated hallmark proteins using IQ-TREE2 (model: Q.pfam+F + R 10). From outer to inner, concentric rings represent: viral clusters (SpCs), environmental origin and predicted host order. Node styles denote sequence types: isolates (green dots), proviruses (red lines), and UViGs (navy lines). B Co-variation analysis of hallmark proteins at the amino acid level using mfDCA. Protein pairs in the top 1% of co-variation scores are shown; those exceeding control thresholds are considered significantly co-variable. Negative controls: Lambdavirus integrase–terminase; positive controls: portal–terminase. Statistical significance was assessed by t-test.
The linkages within and between the viral hallmark proteins of Peduoviridae have not been well characterized. A detailed analysis of the phylogenetic coherence among the seven hallmark proteins of P2Vs was performed and it found that despite these genes potentially following different evolutionary paths in different P2Vs, they still exhibit a high degree of phylogenetic consistency (Supplementary Fig. 5). By calculating the single Reconstruction Fidelity Values (sRFVs) between pairs of core hallmark genes, it was found that all gene pairs had sRFVs lower than 0.6 (Supplementary Fig. 6A). Further analysis of the co-evolutionary trees uncovered topological similarity differences between core hallmark gene pairs. Notably, the P2-gpL and P2-gpP gene pair had the lowest topological similarity (sRFVs = 0.567), while the P2-gpM and P2-gpN gene pair displayed the highest topological similarity (sRFVs = 0.09) (Supplementary Figs. 6B, C). Furthermore, recombination analysis using the PHI test revealed no significant recombination signals (P > 0.05) in the analyzed markers, supporting the conclusion that the phylogenetic reconstruction was not confounded by recombination events.
A mean-field direct coupling analysis (mfDCA) algorithm, which identifies significant amino acid-level co-variation between the terminase large subunit (P2-gpN) and portal protein (P2-gpO) as well as between the major capsid protein (P2-gpP) and the capsid scaffold protein (P2-gpQ)22 was used to investigate constraints among core hallmark proteins of P2-like phages (P2Vs). These analyses revealed co-variation, suggesting structural and functional coupling between these protein pairs. Specifically, the co-evolution of P2-gpN and P2-gpO implies that they may evolve together to maintain the functional integrity of the genome packaging machinery in P2Vs. This is consistent with their spatial synergy in the virion structure, where gpN-gpO mediate the viral genome packaging process. Similarly, P2-gpP and P2-gpQ also show co-evolution, which supports their collaborative roles in capsid assembly. The median mfDCA values for these pairs surpassed both randomized permutations and non-interacting controls (Fig. 5B).
Putative AMGs of P2Vs
Bacteriophages can obtain AMGs from their hosts and reprogram the host’s metabolism by expressing AMGs during infection5. Understanding the metabolic potential of P2Vs remains limited. In this study, A systematic annotation and screening of AMGs across 5945 P2V genomes was performed to provide a comprehensive view of P2V-mediated host metabolic reprogramming. In total, 757 putative AMGs were identified, of which 655 could be assigned to functional modules based on DRAM-v categorization (Supplementary Data 4). These AMGs were classified into five primary modules: organic nitrogen, carbon utilization, transporters, energy, and miscellaneous (MISC) (Fig. 6). Given the large and uneven number of genomes screened across habitats and host groups, AMG counts should be interpreted relative to the underlying number of P2V genomes analyzed. When normalized to the number of genomes screened, P2Vs from host-associated, plant-associated, and terrestrial environments exhibited relatively higher AMG densities. Similarly, P2Vs infecting Enterobacteriaceae showed an enrichment of AMGs. Among the identified AMGs, genes involved in methionine degradation and cysteine biosynthesis were particularly frequent.
Bubble heatmap showing the distribution of P2Vs encoding AMGs across different functional categories (DRAM-v) and stratified by sequence types, derived environments, and predicted host taxa (order level). Bubble size indicates the log₂ number of contigs; bubble color corresponds to functional modules.
These five modules were further subdivided into 35 functional modules covering a wide range of aspects from amino acid metabolism to carbohydrate processing. Specific AMGs involved in nucleic acid modification, such as DNA (cytosine-5)-methyltransferase (e.g., dcm) and rRNA N6-adenosine-methyltransferase (mettl5), can help P2Vs evade host immunity and enhance replication by regulating gene expression and stability23. Regarding carbohydrate metabolism, genomic screening identified a specific repertoire of enzymes targeting energy generation and structural remodeling rather than a broad, non-specific collection, such as endorhamnosidase, oligogalacturonate lyase/oligogalacturonide lyase, sialidase or neuraminidase and β-1,4-mannosyl-glycoprotein β-1,4-N-acetylglucosaminyltransferase (Supplementary Data 4). Energy-generating enzymes, such as 6-hydroxyhexanoate dehydrogenase and pyruvate kinase, may support viral production by generating energy in the hosts24. Furthermore, structural remodeling enzymes involved in host cell wall or surface structure modifications were detected, specifically those categorized under carbon utilization modules. Additionally, proteins associated with the cobalamin transport system (BtuC permease; KO: K25027) and a putative ABC transport system (antimicrobial peptide resistance transporter permease, YknZ/YvrN family; KO: K24948) suggest that P2Vs may modulate host membrane transport and nutrient uptake25. Other AMGs, including antimicrobial peptide resistance transport proteins, and enzymes related to carbohydrate metabolism, may be involved in host recognition, resistance to antimicrobial stress, and regulation of energy metabolism, thus providing the energy necessary for P2Vs replication26.
An analysis of the transcriptional activity distribution of potential AMGs within the 189 human gut metatranscriptomic samples (Supplementary Fig. 7) was conducted. In the carbon utilization category, genes encoding L-Ala-D/L-Glu epimerase have demonstrated significant transcriptional activity. Regarding organic nitrogen metabolism, the gene encoding Guanidinoacetic acid N-methyltransferase (GAMT) exhibited high transcriptional activity. GAMT is a key enzyme in amino acid metabolism, facilitating nitrogen turnover27. This observation aligns with recent global studies on viral-encoded AMGs, which highlight the importance of such enzymes in viral-host interactions28. In the transport systems category, two genes associated with antibiotic resistance, macB and efrA, had high transcriptional abundance29,30.
The distribution of potential AMG transcriptional activity in the metatranscriptome samples from the TARA Ocean project (Supplementary Fig. 8) was analyzed. Within the carbon utilization category, transcripts corresponding to mannosylglycoprotein endo-β-mannosidase were consistently detected across multiple samples. This enzyme is a glycoside hydrolase that cleaves the Manβ1–4GlcNAc linkage in the core structure of N-linked glycoproteins and is involved in the degradation of mannose-containing glycans in cellular metabolic pathways31. Its recurrent transcription suggests that P2V-encoded AMGs may contribute to host carbohydrate metabolism during infection, potentially facilitating the utilization of complex glycan-derived carbon sources.
Discussion
Ecological expansion of P2Vs across diverse environments
While P2Vs in intestinal environments have been well-documented32, this analysis substantially expands their known ecological range (Fig. 1A). The overwhelming dominance of P2Vs in host-associated ecosystems (genome count = 2389, particularly within the human gut) underscores their profound biological intimacy with mammalian microbiomes. However, the observed distribution is strongly influenced by sampling bias in public resources. The number of datasets containing P2V signals varies markedly among habitat categories: host-associated datasets represent the largest share (n = 1825 positive datasets), followed by engineered (n = 470), terrestrial (n = 258), and aquatic (n = 235) environments (Supplementary Data 1). The apparent dominance of P2Vs in host-associated ecosystems (genome count = 2389) likely reflects greater sampling depth in this category (Fig. 1B). Diversity density (calculated as genome counts per P2V-positive dataset) was used to account for uneven sampling effort among habitat categories. This normalized metric remained consistent across major environments, with values of 1.31, 1.28, and 1.20 for host-associated, terrestrial, and aquatic ecosystems, respectively, indicating broadly similar P2V richness per positive dataset. Consequently, the lower absolute counts observed in aquatic and terrestrial habitats likely reflect their underrepresentation in current databases rather than true ecological scarcity. Importantly, while host-associated microbiomes serve as the primary reservoir for known P2Vs, their verified detection in terrestrial, atmospheric, and marine environments (n = 113) confirms that P2Vs possesses a highly versatile biological blueprint capable of adaptation across vastly different environmental stresses.
Although fewer P2V genomes were recovered from terrestrial and air environments (Fig. 1C), this likely reflects the scarcity of available datasets rather than a biological absence. Nevertheless, their detection in these niches suggests potential cross-environmental transmission via aerosols. Importantly, the identification of 113 marine-derived genomes—validated by recruitment plots and metatranscriptomics—confirms that the ocean represents a significant, yet historically overlooked, reservoir of P2Vs (Supplementary Figs. 1, 8). This 48-fold expansion of phylogenetic diversity (P2VGD) provides a critical counterweight to historical data skews, broadening the understanding of Peduoviridae beyond host-associated ecosystems.
Host specificity and evolutionary diversification of P2Vs
CRISPR spacer matching yielded an unusually high host assignment rate for P2Vs compared with many viral metagenomic surveys; however, previous large-scale analyses have shown that although CRISPR-based links are highly specific, they typically capture only a subset of viral populations due to uneven CRISPR–Cas distribution and spacer acquisition dynamics17. The elevated assignment rate observed here is likely associated with the predominantly temperate lifestyle of P2Vs, as temperate phages persist as prophages and engage in prolonged virus–host coexistence, increasing the likelihood of being recorded by host CRISPR arrays33.
In terms of host associations, the clustering of SpC_3 exclusively with Enterobacterales and the broader host range of SpC_1 suggest divergent evolutionary strategies in host adaptation. The highly specialized narrow-host range of SpC_3 implies strict receptor constraints and a deep evolutionary lock-in with Enterobacterales, optimizing infection efficiency within specific niches like the animal gut. In contrast, the broad-host promiscuity of SpC_1 highlights a flexible biological strategy, likely facilitated by highly adaptive tail fiber proteins, allowing them to cross species barriers and exploit diverse bacterial taxa.
These evolutionary patterns are further illuminated by phylogenetic comparisons (Fig. 5A). Further analysis of the phylogenetic tree found that P2Vs from five different habitats were broadly distributed across various branches, exhibiting intermingling in their phylogeny (Fig. 5A). This not only reflects the survival capabilities of P2Vs in diverse environments but also suggests their potential ecological roles in different habitats23. In the genome-based phylogenetic tree, most P2Vs were associated with hosts in the order Enterobacterales. This pattern likely reflects the over-representation of Enterobacterales genomes and metagenomes in public databases, especially host-associated datasets, rather than an intrinsic dominance of Enterobacterales-infecting P2Vs.
Further support for evolutionary adaptation is provided by co-evolutionary signals detected among core structural genes (Fig. 5B). The high mfDCA scores between interacting pairs, such as the terminase-portal (P2-gpN/O) and capsid-scaffold (P2-gpP/Q) complexes, suggest a model of modular protein co-adaptation. Biologically, this signifies that compensatory mutations are actively selected to maintain the structural integrity of protein-protein interfaces required for essential functions—specifically, the genome packaging motor and capsid assembly34,35. In the context of rapid viral evolution, such co-evolutionary constraints act as a mechanism of evolutionary robustness, ensuring that the stability of critical molecular machines is preserved even as individual gene sequences diverge to adapt to new ecological niches.
Metabolic reprogramming potentials via AMGs
Bacteriophages heavily rely on the metabolic machinery of active host cells for lytic replication. This dependency has led to the prevailing hypothesis that phage-encoded auxiliary metabolic genes (AMGs) serve as metabolic accelerators—reprogramming host metabolism to overcome physiological bottlenecks or nutrient depletion, thus favoring phage replication23. Global metagenomic surveys indicate that this strategy is widespread, with ~19% of marine virus populations carrying AMGs that target critical “hot spots” in microbial metabolism, such as nucleotide, amino acid, and carbohydrate pathways28. Notably, for phages with compact genomes, carrying non-essential genes incurs high fitness costs, suggesting strong selective retention only for AMGs conferring essential advantages36.
Previously, a systematic evaluation of P2Vs’ capacity for host metabolic reprogramming was lacking. To address this gap, we performed global annotation and screening across the entire P2V spectrum (5945 genomes) and identified 757 putative AMGs. Unlike the broad targeting seen in global ocean viromes, P2V-encoded AMGs show specific enrichment in organic nitrogen, carbon utilization, membrane transport (including BtuC permease, KO: K25027; and a putative ABC permease, KO: K24948), and energy metabolism modules (Fig. 6). Beyond these module-level patterns, we identified representative AMGs that may expand host resource acquisition and stress-adaptation capacity, including enzymes involved in aromatic compound catabolism (e.g., 3-methylcatechol/catechol 2,3-dioxygenase), iron-scavenging siderophore biosynthesis (e.g., petrobactin synthase), and secondary metabolite/toxin-associated pathways (e.g., dihydrophenazinedicarboxylate synthase and toxoflavin synthase). Together, these functions suggest that P2Vs may modulate host metabolic potential and ecological fitness, potentially benefiting viral proliferation under nutrient limitation or competitive conditions.
Beyond amino acid metabolism, P2Vs may modulate carbohydrate metabolism through a specific “dual strategy”: regulating central carbon flow to fuel viral production and expanding the host’s substrate range to adapt to environmental constraints. First, regarding energy production, the identification of pyruvate kinase—a rate-limiting enzyme in glycolysis that catalyzes the final step of ATP generation—suggests that P2Vs actively accelerate host glycolysis. Along with 6-hydroxyhexanoate dehydrogenase, these enzymes likely ensure a sufficient ATP supply required for the energetically costly processes of viral capsid assembly and DNA packaging. Second, P2Vs appear to employ habitat-dependent strategies to scavenge complex nutrients or remodel host structures. In marine environments, the recurrent transcription of mannosylglycoprotein endo-β-mannosidase highlights a specialized scavenging mechanism. This enzyme degrades N-linked glycoproteins often found in dissolved organic matter (DOM), empowering hosts to utilize recalcitrant glycan-derived carbon sources—a critical survival advantage in oligotrophic marine waters. Conversely, in the human gut, the high transcriptional activity of L-Ala-D/L-Glu epimerase points to involvement in peptidoglycan metabolism. By converting amino acids used in cell wall synthesis, P2Vs may remodel the host cell wall to facilitate viral assembly or modify the cell surface to influence host recognition and superinfection exclusion. This biome-specific AMG expression highlights a remarkable biological flexibility: while gut P2Vs reprogram hosts to survive antimicrobial stress, marine P2Vs reprogram hosts to overcome carbon limitation. Together, these functional traits demonstrate that P2Vs do not merely parasitize their hosts, but actively tailor their host’s metabolic and defensive arsenals to match the specific ecological demands of their shared habitat.
Furthermore, functional integration extends to transport and resistance mechanisms, as evidenced by transcriptional activity in human gut metatranscriptomes. Putative AMGs such as macB and methyltransferases showed active transcription in human gut metatranscriptomes, pointing to real-time modulation of host physiology (Supplementary Fig. 7). This modulation has significant implications for antimicrobial resistance. For instance, macB, a member of the ABC transporter family, is known to efflux macrolide antibiotics (e.g., erythromycin) in Escherichia coli to reduce intracellular drug concentrations37. Both macB and efrA exhibited high transcriptional abundance in gut samples, suggesting that P2Vs may act as reservoirs for resistance traits, actively conferring survival advantages to their hosts in antibiotic-laden environments, such as the human gut. The activity of AMG-encoded proteins detected in both human gut and marine metatranscriptomes demonstrates functional versatility in P2Vs (Supplementary Figs. 7, 8). Their potential involvement in nitrogen and carbon cycling, host physiological modulation, and microbial community structuring points to broader roles in shaping ecosystem dynamics. These findings provide a strong rationale for further investigation into P2Vs as active participants in biogeochemical feedback loops and microbial ecological networks.
Compared to obligately lytic phage groups (such as marine Autographiviridae or cyanophages), which frequently encode broadly acting AMGs for rapid resource extraction and immediate energy generation during cell lysis, the temperate P2Vs exhibit a distinctly different biological strategy. Because P2Vs are capable of long-term lysogeny (as evidenced by the 4372 provirus-associated contigs identified in our dataset), their metabolic reprogramming is highly targeted towards long-term host survival and competitive fitness. Rather than merely accelerating host metabolism for an immediate viral burst, P2V-encoded AMGs—such as the aforementioned antimicrobial efflux pumps (macB) in the gut or specific glycan-scavenging enzymes in the ocean—function to sustain the host lineage, thereby ensuring the stable vertical transmission of the integrated prophage.
Conclusion
In this study, we constructed a P2V genome dataset (P2VGD), which comprises 5945 P2V genomes, demonstrating a 48-fold increase in the genomic diversity of P2Vs and a significant expansion in their distribution, predominantly in host-associated microbiomes but extending globally. The 5945 P2V genomes were grouped into 13 superclades (SpCs) based on genome-content-based networks, and the majority exhibited significant host specificity. Based on taxonomic evaluations, we propose the classification of P2Vs into 4671 candidate genera. Additionally, a total of 757 putative AMGs. Importantly, metatranscriptomic data revealed that P2Vs deploy biome-specific metabolic reprogramming—conferring antimicrobial resistance and cell-wall remodeling in the human gut, while facilitating complex glycan scavenging in the ocean. In conclusion, this study not only provides a comprehensive P2V genome resource but also offers deep biological insights into how temperate phages evolve, adapt, and manipulate their hosts across diverse global ecosystems.
Materials and methods
Metagenomic retrieval of P2-like high-quality uncultured viral genomes (UViGs)
123 complete P2-like viral genomes were retrieved and downloaded from GenBank (download date: 7/2024) as the reference sequences dataset (Supplementary Data 3). The open reading frames (ORF) of these genomes were predicted by Prodigal (v.2.6.3)38. ORFs with a length of less than 50 amino acids were removed, and were subjected to Diamond (v.2.0.10)39 to perform all-to-all BLASTp with the parameters: 1e−10 as the E-value, 50% of the minimal number of query cover, 1 as the max number of high scored pair, 10,000 as the reporting targets. Protein sequences were clustered into orthologous groups using the OrthoMCL software (v2.0.9)40, rather than a precompiled database, to define P2-like viral protein families (PeduoPFs). Briefly, all-versus-all BLASTP results were used as input, and protein families were inferred using the Markov clustering algorithm implemented in OrthoMCL (inflation factor = 1.5, default parameters). The single-copy of PeduoPFs present in ~95% of P2Vs genomes is considered a viral hallmark protein of P2Vs. Protein sequences corresponding to each viral hallmark protein family were first aligned using Muscle5 with default parameters41, and the resulting multiple sequence alignments were then used to construct profile hidden Markov models (HMMs) using hmmbuild with default parameters42.
Following to the designed bioinformatics workflow (Supplementary Fig. 9), P2Vs were searched for in the IMG/VR v4 database. Initially, an iterative search was performed across the datasets using HMM models based on viral hallmark proteins. After identifying conserved motifs in these genes (“DxxxGW” between positions 181 and 186, and “TxxxNL” between positions 313 and 318), redundant protein sequences were removed to reduce overrepresentation of highly similar or near-identical sequences. To this end, protein sequences were dereplicated using MMseqs2 (easy-linclust) with a minimum sequence identity threshold of 95% and a minimum alignment coverage of 80% (cov-mode 1, cluster-mode 2), retaining one representative sequence per cluster for downstream analyses43. Subsequently, an iterative search was conducted in the NCBI prokaryotic nr database using viral hallmark proteins as queries. The search was performed using hmmsearch from HMMER, which enabled HMM-to-protein sequence comparisons. In each iteration, sequences were selected based on their E-value being below 1e−10, indicating a statistically significant match. Only genomes exceeding 100 kb in length were retained for further analysis, and these were confirmed as prophages using CheckV (v.0.8)44 validation, with criteria for completeness >80% and contamination <5%. Genomes that met these criteria were labeled as “Provirus.”
The global distribution base map was generated using the open-source visualization platform ObservableHQ (https://observablehq.com/d/7c95cdc90c25c100).
Genome-content-based clustering of P2Vs
All ORFs encoded by the high-quality genomes of the P2VGD that were greater than 50 amino acids in length were subjected to full-pair full BLASTp by Diamond (v2.0.10) with the following parameters: E-value < 1e−10, query cover >50%, percentage of identity >30%, and reporting maximum 1,000,000 target sequences. The results were filtered by all-against-all BLASTp and the default parameters were used by vConTACT2 to group the P2VGD45. The genome content-based correlations were visualized by Gephi (v.0.92)46 using Fruchterman Reingold in the layout and selected the edges with weights greater than 30. To summarize higher-level structure in the VC network, we defined superclades (SpCs) as groups of closely related VCs. A VC–VC similarity network (edge weights based on shared protein clusters) was constructed and imported into Gephi, where community detection was performed using the Louvain modularity algorithm (resolution = 1.0, edge weights enabled). Each resulting community was defined as an SpC. “Unclustered” genomes are those that vConTACT2 did not assign to any VC or SpC due to insufficient gene-sharing connectivity, even if they appear within broader clades in the phylogenomic tree.
To further examine the extension to the taxonomic breadth of P2VGD in this study, each related virus cluster delineated by vConTACT2 was further classified according to the genus-level classification principle currently used in ICTV. The genus/species-level units of high-quality P2Vs were auto-classified by VIRIDIC, a tool that calculates pairwise intergenomic similarity based on BLASTN to robustly group viral genomes into taxonomic units according to nucleotide similarity thresholds. All-to-all BLASTn parameters: seven as the word size, two as the reward, three as the penalty, five as the gap open, and two as the gap extend47. The result of all-to-all BLASTn was used to cluster the criteria: 95% as the species similarity threshold and 70% as the genus similarity threshold. The similarity was normalized by the total length of each genome.
Estimation of host range for P2Vs and auxiliary metabolic genes
For host prediction of P2VGD, information was obtained according to the origin of the viral genomes. For isolated P2Vs, host information was retrieved directly from the corresponding records in the NCBI database, where host assignments are provided based on experimental isolation. For P2Vs identified from prokaryotic genomes, host information was inferred from the bacterial host genome in which the P2V was integrated. The corresponding biome and metadata were obtained by extracting BioSample information associated with the host genomes from the NCBI database. For P2Vs derived from IMG/VR v4, host and biome annotations were retrieved from the JGI Genome Portal, where environmental and host metadata are provided as part of the IMG/VR framework20. To ensure consistency in taxonomic annotation, all host genomes were annotated using GTDB-Tk (version 2.2.0) with the GTDB database r22048,49.
To evaluate the potential of P2V genomes to reprogram host metabolism, viral functional annotation was performed using a two-step pipeline comprising VirSorter2 (v.1.0.6)50 and DRAM-v51. First, all P2V genomes were processed using VirSorter2 using the following parameters: --prep-for-dramv --min-length 5000 --min-score 0 --include-groups dsDNAphage, which produced an “affi-contigs.tab” file. Subsequently, genes were evaluated using DRAM-v, and those with auxiliary scores of 1, 2, or 3 were identified as putative auxiliary metabolic genes (AMGs). Additionally, to further investigate the functional characteristics of P2Vs, a DIAMOND BLASTp search (E-value < 1e−10; max target sequences = 1, 50% of the minimal number of query cover) was conducted against two reference databases: Virulence Factors of Pathogenic Bacteria (VFDB)52 and the Comprehensive Antibiotic Resistance Database (CARD)53, in order to identify sequences related to virulence factors and antibiotic resistance genes. Finally, the genomic neighborhoods of candidate AMGs were manually inspected to identify viral hallmark genes, virus-specific genes, hypothetical proteins, or microbial genes.
Phylogenetic analysis of P2Vs
This study aimed to comprehensively evaluate the phylogenetic characteristics of P2Vs by constructing a species evolutionary tree of P2Vs through the concatenation of seven hallmark proteins, including capsid completion/stabilization protein (P2-gpL, PeduoPF0001), terminase small subunit (P2-gpM, PeduoPF0002), major capsid protein (P2-gpN, PeduoPF0003), capsid scaffolding protein (P2-gpO, PeduoPF0004), terminase large subunit (P2-gpP, PeduoPF0005), portal protein (P2-gpQ, PeduoPF0006), and an additional P2-like viral hallmark protein, the tail tape protein (P2-gpT, PeduoPF0007). A multi-step protein sequence alignment and phylogenetic analysis method, combining Muscle541 and IQ-Tree254, was employed. The corrected seven MSA files were concatenated using SeqKit55, and the concatenated genes’ maximum likelihood protein tree was constructed using IQ-Tree2 based on the Q.pfam+F + R10 model with 1000 iterations.
To explore the co-phylogenetic relationships among the seven hallmark proteins of P2Vs, multiple single-gene maximum likelihood methods were combined to construct co-phylogenetic trees. Consistent with the above-mentioned multi-gene concatenation tree construction method, the seven hallmark proteins of P2Vs were analyzed using IQ-Tree2 based on LG + F + R10, LG + F + R10, LG + R10, LG + F + R10, LG + R10, LG + R10, and Q.pfam+F + R10 models, respectively, with 1000 iterations to construct single-gene maximum likelihood protein trees, and only support values ≥ 90% are indicated by circular markers on internal nodes to highlight well-supported clades.
The topological congruence between hallmark genes of P2-like viruses was quantified using single Reconstruction Fidelity Values (sRFVs) calculated by TreeKO56. The sRFV metric evaluates phylogenetic agreement by measuring the number of conflicting leaf splits between two compared trees. To visualize these comparisons, corresponding subtrees for high-quality P2Vs were extracted and plotted using an R script. To verify the robustness of the phylogenetic inference, the Pairwise Homoplasy Index (PHI) test was performed using PhiPack on the nucleotide alignments of the corresponding coding sequences57. A P-value of <0.05 was considered statistically significant evidence for the presence of recombination.
The interprotein Correlated Mutations Server (iCOMS) was used to study the co-mutational relationships at the amino acid level among the seven hallmark proteins of P2Vs58. First, the protein sequences of the hallmark proteins of were aligned using Muscle5 software, trimmed using the trimAL tool, and 21 pairwise protein sequence pairs were submitted to the server for quantitative analysis of their co-mutational characteristics by calculating the mfDCA values. During the analysis, the integrase and terminase of Lambda phage were used as negative controls, while the portal protein and terminase of Lambda phage were used as positive controls59. The top 1% of protein pairs with the highest mfDCA scores were selected as the final results and these were then visualized using an R script. If the median mfDCA value of a group of protein pairs exceeded the median of the control groups, they were considered to have high co-mutational relationships. A t-test was conducted to confirm the significance of differences between the protein pairs and the control groups.
Ecological distribution of P2Vs
CoverM (v0.3.1) (https://github.com/wwood/CoverM), with the following parameters: -m rpkm, -min-read-percent-identity 0.95, and -min-read-aligned-percent 0.75, was used to calculate the relative abundances of P2V genomes in the GOV 2.0 database15, Qingdao coastal virome (QDCV) datasets60, datasets from virioplankton overlaying the Challenger Deep (VOCD)61 and several human gut virome62,63,64,65,66. In this study, “genome count” refers to the contigs number of P2V genomes recovered from each habitat category. “Abundance” refers exclusively to normalized metagenomic read recruitment values (RPKM/TPM), as described below.
Metatranscriptome relative abundance estimates for P2Vs’ AMGs
To rigorously assess the transcriptional activity of P2V-encoded AMGs and distinguish viral expression from host homologs, read mapping was performed using two distinct metatranscriptomic datasets: (1) 189 human gut metatranscriptomic samples from NCBI (Supplementary Data 5), and (2) marine metatranscriptomes obtained from the TARA Ocean project (PRJEB6608). Reads were mapped specifically against the P2V genomes identified in this study. To ensure mapping specificity and preclude false-positive recruitment from host genomes, relative abundances were calculated using CoverM (v0.3.1) with stringent filtering parameters: -m rpkm, --min-read-percent-identity 0.95, and --min-read-aligned-percent 0.7544,67.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The datasets presented in this study can be found in the article/Supplementary Material.
Code availability
The custom code and models supporting the findings of this study are provided in Supplementary Data 6. This ZIP archive contains the seven HMM profiles used for identifying P2-like viruses and the R script developed to explore the co-phylogenetic relationships among the hallmark proteins of P2Vs.
References
Wommack, K. E. & Colwell, R. R. Virioplankton: viruses in aquatic ecosystems. Microbiol Mol. Biol. Rev. 64, 69–114 (2000).
Suttle, C. A. Viruses in the sea. Nature 437, 356–361 (2005).
Suttle, C. A. Marine viruses–major players in the global ecosystem. Nat. Rev. Microbiol. 5, 801–812 (2007).
Ackermann, H.-W. 5500 Phages examined in the electron microscope. Arch. Virol. 152, 227–243 (2007).
Zimmerman, A. E. et al. Metabolic and biogeochemical consequences of viral infection in aquatic ecosystems. Nat. Rev. Microbiol. 18, 21–34 (2020).
Luo, X.-Q. et al. Viral community-wide auxiliary metabolic genes differ by lifestyles, habitats, and hosts. Microbiome 10, 1–18 (2022).
Clokie, M. R., Millard, A. D., Letarov, A. V. & Heaphy, S. Phages in nature. Bacteriophage 1, 31–45 (2011).
Bertani, G. Studies on lysogenesis I: the mode of phage liberation by lysogenic Escherichia coli. J. Bacteriol. 62, 293–300 (1951).
Christie, G. E. & Calendar, R. Bacteriophage P2. Bacteriophage 6, e1145782 (2016).
Haggård-Ljungquist, E., Halling, C. & Calendar, R. DNA sequences of the tail fiber genes of bacteriophage P2: evidence for horizontal transfer of tail fiber genes among unrelated bacteriophages. J. Bacteriol. 174, 1462–1477 (1992).
Nilsson, A. S. & Haggård-Ljungquist, E. Evolution of P2-like phages and their impact on bacterial evolution. Res. Microbiol. 158, 311–317 (2007).
Boyd, E. F. Bacteriophage-encoded bacterial virulence factors and phage–pathogenicity island interactions. Adv. Virus Res. 82, 91–118 (2012).
Liu, X. et al. Symbiosis of a P2-family phage and deep-sea Shewanella putrefaciens. Environ. Microbiol. 21, 4212–4232 (2019).
Roux, S. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689–693 (2016).
Gregory, A. C. et al. Marine DNA viral macro-and microdiversity from pole to pole. Cell 177, 1109–1123. e14 (2019).
Zheng, K. et al. Diversity of reverse-transcriptase-containing viruses through global metagenomics. hLife 3, 52–56 (2025).
Paez-Espino, D. et al. Uncovering Earth’s virome. Nature 536, 425–430 (2016).
Roux, S. et al. Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes. Nat. Microbiol. 4, 1895–1906 (2019).
Schulz, F. et al. Giant virus diversity and host interactions through global metagenomics. Nature 578, 432–436 (2020).
Camargo, A. P. et al. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 51, D733–D743 (2023).
Nilsson, H., Cardoso-Palacios, C., Haggard-Ljungquist, E. & Nilsson, A. S. Phylogenetic structure and evolution of regulatory genes and integrases of P2-like phages. Bacteriophage 1, 207–218 (2011).
Wangchuk, J., Chatterjee, A., Patil, S., Madugula, S. K. & Kondabagil, K. The coevolution of large and small terminases of bacteriophages is a result of purifying selection leading to phenotypic stabilization. Virology 564, 13–25 (2021).
Hurwitz, B. L. & U’Ren, J. M. Viral metabolic reprogramming in marine ecosystems. Curr. Opin. Microbiol 31, 161–168 (2016).
Liu, Y. et al. Diversity and function of mountain and polar supraglacial DNA viruses. Sci. Bull. 68, 2418–2433 (2023).
Nijland, M., Martinez Felices, J. M., Slotboom, D. J. & Thangaratnarajah, C. Membrane transport of cobalamin. Vitam. Horm. 119, 121–148 (2022).
Chaturongakul, S. & Ounjai, P. Phage-host interplay: examples from tailed phages and Gram-negative bacterial pathogens. Front. Microbiol. 5, 442 (2014).
Libell, J. L. et al. Guanidinoacetate N-methyltransferase deficiency: case report and brief review of the literature. Radiol. Case Rep. 18, 4331–4337 (2023).
Tian, F. et al. Prokaryotic-virus-encoded auxiliary metabolic genes throughout the global oceans. Microbiome 12, 159 (2024).
Greene, N. P., Kaplan, E., Crow, A. & Koronakis, V. Antibiotic resistance mediated by the MacB ABC transporter family: a structural and functional perspective. Front. Microbiol. 9, 950 (2018).
Lerma, L. L. et al. Role of EfrAB efflux pump in biocide tolerance and antibiotic resistance of Enterococcus faecalis and Enterococcus faecium isolated from traditional fermented foods and the effect of EDTA as EfrAB inhibitor. Food Microbiol. 44, 249–257 (2014).
Ishimizu, T. et al. Endo-β-mannosidase, a plant enzyme acting on N-glycan: purification, molecular cloning, and characterization. J. Biol. Chem. 279, 38555–38562 (2004).
Nilsson, A. S., Karlsson, J. L. & Haggard-Ljungquist, E. Site-specific recombination links the evolution of P2-like coliphages and pathogenic enterobacteria. Mol. Biol. Evol. 21, 1–13 (2004).
Touchon, M., Bernheim, A. & Rocha, E. P. Genetic and life-history traits associated with the distribution of prophages in bacteria. ISME J. 10, 2744–2754 (2016).
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. 108, E1293–E1301 (2011).
Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. elife 3, e03430 (2014).
Breitbart, M., Bonnain, C., Malki, K. & Sawaya, N. A. Phage puppet masters of the marine microbial realm. Nat. Microbiol. 3, 754–766 (2018).
Davidson, A. L. & Chen, J. ATP-binding cassette transporters in bacteria. Annu. Rev. Biochem. 73, 241–268 (2004).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 1–11 (2010).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. methods 12, 59 (2015).
Chen, F., Mackey, A. J., Stoeckert, C. J. Jr & Roos, D. S. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 34, D363–D368 (2006).
Edgar, R. C. Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat. Commun. 13, 6968 (2022).
Arndt, W. Modifying HMMER3 to run efficiently on the Cori supercomputer using OpenMP tasking. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 239–246 (IEEE, 2018).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).
Bin Jang, H. et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat. Biotechnol. 37, 632–639 (2019).
Bastian, M., Heymann, S. & Jacomy, M. Gephi: an open source software for exploring and manipulating networks. Proc. Int. AAAI Conf. Web Soc. media 3, 361–362 (2009).
Moraru, C., Varsani, A. & Kropinski, A. M. VIRIDIC—A novel tool to calculate the intergenomic similarities of prokaryote-infecting viruses. Viruses 12, 1268 (2020).
Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50, D785–D794 (2022).
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38, 5315–5316 (2022).
Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 1–13 (2021).
Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res. 48, 8883–8900 (2020).
Liu, B., Zheng, D., Zhou, S., Chen, L. & Yang, J. VFDB 2022: a general classification scheme for bacterial virulence factors. Nucleic Acids Res. 50, D912–D917 (2022).
Alcock, B. P. et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res. 51, D690–D699 (2023).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PloS One 11, e0163962 (2016).
Marcet-Houben, M. & Gabaldón, T. TreeKO: a duplication-aware algorithm for the comparison of phylogenetic trees. Nucleic Acids Res. 39, e66–e66 (2011).
Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination. Genetics 172, 2665–2681 (2006).
Iserte, J., Simonetti, F. L., Zea, D. J., Teppa, E. & Marino-Buslje, C. I-COMS: Interprotein-COrrelated mutations server. Nucleic Acids Res. 43, W320–W325 (2015).
Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PloS One 6, e28766 (2011).
Han, M. et al. Spatiotemporal dynamics of coastal viral community structure and potential biogeochemical roles affected by an Ulva prolifera green tide. Msystems 8, e01211–e01222 (2023).
Gao, C. et al. Virioplankton assemblages from challenger deep, the deepest place in the oceans. Iscience 25, 104680 (2022).
Lim, E. S. et al. Early life dynamics of the human gut virome and bacterial microbiome in infants. Nat. Med. 21, 1228–1234 (2015).
Minot, S. et al. Rapid evolution of the human gut virome. Proc. Natl. Acad. Sci. 110, 12450–12455 (2013).
Zhao, G. et al. Intestinal virome changes precede autoimmunity in type I diabetes-susceptible children. Proc. Natl. Acad. Sci. 114, E6166–E6175 (2017).
Chehoud, C. et al. Transfer of viral communities between human individuals during fecal microbiota transplantation. MBio 7, e00322 (2016).
Zheng, K. et al. Genomic diversity and ecological distribution of marine Pseudoalteromonas phages. Mar. Life Sci. Technol. 5, 271–285 (2023).
Roux, S. et al. Minimum information about an uncultivated virus genome (MIUViG). Nat. Biotechnol. 37, 29–37 (2019).
Acknowledgements
This work was supported by the China Postdoctoral Science Foundation (2025M770867), the Natural Science Foundation of China (No. 42120104006 and 42176111), the Ocean Negative Carbon Emissions (ONCE) Program, the Fundamental Research Funds for the Central Universities (202172002 and 202072001), the National Key Research and Development Program of China (2022YFC2807500) and the Laoshan Laboratory (No. LSKJ202203201). We acknowledge the support of the Large Instrument Sharing Platform of College of Marine Life Sciences, Marine Microbial Resource Center, Ocean University of China. We appreciate the computing resources provided by Center for High Performance Computing and System Simulation,Laoshan Laboratory, IEMB-1, a high-performance computation cluster operated by the Institute of Evolution and Marine Biodiversity and Marine Big Data Center of Institute, the High-Performance Biological Supercomputing Center for Advanced Ocean Study of Ocean University of China.
Author information
Authors and Affiliations
Contributions
Y.T.L. and M.W.: conceptualization and project administration. Y.D.L.: methodology, writing, and original draft preparation. Z.Y.W. and K.Y.Z.: software. R.Z.L.: Data curation. H.B.S.: validation. A.M.: review and editing. Y.T.L., M.W., and A.M.: funding acquisition. All authors contributed to the article and approved the submitted version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks Asier Zaragoza-Solas, Gamaliel López-Leal and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Tobias Goris.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, Y., Liu, R., Zheng, K. et al. Global genomic diversity of temperate P2-like viruses. Commun Biol 9, 554 (2026). https://doi.org/10.1038/s42003-026-09823-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s42003-026-09823-4








