Introduction

As the smallest and most abundant biological entities in the ocean, viruses play essential roles in shaping the structure, function, and evolution of the microbial communities, thereby influencing the biogeochemical cycling of carbon and nutrients1,2,3,4. The enormous diversity of marine viruses has been investigated using culture-dependent and culture-independent approaches. Over the last decade, an unexpected diversity of marine viruses has been discovered through extensive culture-independent viromic studies5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20. Billions of uncultivated viral populations have been reconstructed, promoting the understanding of the diversity and population composition of marine viruses as well as their potential to regulate the metabolism of their hosts5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21. Moreover, marine viromic datasets are of great value in providing the basis for elucidating the geographical distribution characteristics of phages22,23,24,25,26,27,28,29. However, the majority of marine viral populations have not been well characterized because they belong to novel viral species without cultured representatives, and accurately determining the host of most viral populations is challenging. To characterize the huge amount of viral dark matter, the combined use of culture-dependent and metagenomic mining approaches, which elucidate the voluminous information contained in metagenomic datasets, has been applied to identify and characterize some important marine phage groups26,28,30,31,32,33,34. These highly abundant marine phage groups have been isolated in the laboratory, and culture-independent approaches have further enhanced the understanding of their diversity and ecological distribution.

Within class Caudoviricetes, Autographiviridae is the largest family of double-stranded DNA (dsDNA) viruses, containing at least 9 defined subfamilies and 132 defined genera. This family was initiallyĀ termed ā€œT7-like phagesā€ after the first isolate Escherichia coli phage T7 was discovered35. T7-like phages are characterized by the possession of an RNA polymerase (RNAP) gene and were initially classified under the subfamily Autographivirinae within the family Podoviridae36. In 2019, Autographivirinae was updated to the family Autographiviridae based on comparative genomic, proteomic, and marker gene phylogenetic analyses37. Elevation in the taxonomy of T7-like phages to the level of a family reflects their great diversity and increasing importance in bacteriophagology. Currently, members of Autographiviridae infect a myriad of bacterial hosts from diverse environments (marine, freshwater, and terrestrial ecosystems). The genomes of more than 70 marine Autographiviridae isolates, most of which were isolated from Cyanobacteria (Synechococcus and Prochlorococcus), SAR11 (Pelagibacterales order), and Roseobacter strains, have been reported22,25,38,39,40,41,42,43. A novel Autographiviridae cyanophage group without DNA polymerase (DNAP) genes was discovered recently, thereby expanding the known diversity of marine Autographiviridae phages44. Currently, studies on the diversity and evolutionary relationships of marine Autographiviridae phages are restricted to the above-mentioned isolates22,41,42,43, and limited understanding has been gained regarding their geographic distributions22,25,27,44. In addition, several other marine Autographiviridae phages infecting Marinomonas, Citrobacter, Alteromonas, Stappia, and Vibrio are also available in NCBI, highlighting the broad host range and important ecological significance of Autographiviridae phages in the ocean. Metagenomic-based analysis has also provided evidence for the dominance and prevalence of Autographiviridae phages in the ocean. For example, in the GOV (Global Ocean Virome) study, 358 metagenomic viral populations were classified into a viral cluster (VC_9) with Autographiviridae isolates7. VC_9 is among the most abundant and ubiquitous marine viral clusters7.

Despite previous studies, a comprehensive understanding of the diversity, evolution, and distribution of the Autographiviridae family in marine environments has been hindered by the scarcity of representative genomes. To address these gaps, we recovered 1253 Autographiviridae uncultivated viral genomes (UViGs) using metagenomic data mining and performed a suite of phylogenomic and comparative genomic analyses. Phylogenetic analyses uncovered substantial diversity and identified several previously unrecognized groups within the Autographiviridae family. Genome comparison revealed both conservation and divergence across distinct Autographiviridae groups. Furthermore, metagenomic read-mapping provided a detailed view of their global distribution, highlighting widespread and clade-specific biogeographic patterns.

Materials and methods

Autographiviridae UViGs retrieval

To retrieve marine-derived Autographiviridae genomes, approximately 7 million UViGs were downloaded from IMG/VR v421, Global Ocean Viromes (GOV and GOV 2.0)7,9, MedDCM fosmid library5, Station ALOHA assembly-free virus genomes11,Ā ALOHA 2.0 virome14, San Pedro Ocean Time-series Viromes13, Red Sea Viromes15, and viromes fromĀ the oxic surface (10 m) and oxygen-starved basin (200 m) waters of Saanich Inlet6.

The open reading frames (ORFs) of the UViGs were downloaded from the databases or predicted using Prodigal v2.6.3 (-p meta)45. Three Autographiviridae core genes, including genes encoding RNAP, phage capsid, and terminase large subunit (TerL) were used as baits to retrieve Autographiviridae UViGs. The RNAP, capsid, and TerL genes of the Autographiviridae phages (from RefSeq V215) were aligned using MAFFT v7.50546, and their hidden Markov model (HMM) profiles were constructed using hmmbuild in HMMER v3.3.247. Afterwards, the hmmsearch in HMMER47 was used to identify the three genes from the UViGs (e-value ≤10āˆ’3 and score ≄50). A total of 2,839 UViGs containing all three marker genes were retrieved as Autographiviridae UViGs. CheckV v0.9.0 was used to estimate the completeness and quality of the UViGs48. The UViGs with genome end redundancy were self-aligned to identify the redundant regions. Based on the alignment, the redundant regions were manually trimmed to avoid potential biases in subsequent analyses. A total of 1253 UViGs with 100% completeness were utilized for subsequent analyses.

To identify Autographiviridae populations (roughly correspond to species-level taxonomy), the 1253 Autographiviridae UViGs and 98 known marine Autographiviridae isolates were clustered using CD-HIT program (nucleotide identity of ≄95%, ≄80% alignment of the short genome) (-c 0.95 -aS 0.8)49, resulting in a total of 1143 Autographiviridae populations.

Genome annotation and comparative genomic analysis

Prodigal45 was used to predict the ORFs of the Autographiviridae genomes. The translated ORFs were annotated using BLASTP against the NCBI non-redundant and NCBI Refseq v215 databases (BLASTP option: E-value ≤10āˆ’3, amino acid identity ≄25%, and alignment length ≄50%). The ORFs were searched against the Pfam database using HMMsearch to identify conserved Pfam domains (-E 1e-3 -T 50)47,50. OrthoFinder v2.5.4 was used to identify the orthogroups based on sequence similarities (BLASTP option: E-value ≤10āˆ’3, amino acid identity ≄25%, alignment length ≄50%)51. Representative Autographiviridae genomes were compared and visualized using Easyfig v2.2.252.

Autographiviridae prophage and host prediction

Marine bacterial genome databases, including manually curated metagenome-assembled genomes (MAGs) from Tara Oceans, MarDB (MAR databases)53, GORG (Global Ocean Reference Genomes)54, GEM (Genomes from Earth’s Microbiomes)55, the Genome Taxonomy Database (GTDB)56, and approximately 25,000 newly reconstructed draft genomes57 were downloaded to detect Autographiviridae prophage. The integration sites of Autographiviridae prophages in the phage genomes are commonly located between genes encoding integrase and RNAP43,58. When phage integration occurs, these two genes are adjacent to the genes of their hosts. To identify the putative integration sites, the nucleotide sequences of integrase and RNAP genes were extracted from marine Autographiviridae genomes. These sequences were used as queries for BLASTN against the marine bacterial genome sequences and MAGs (e-value ≤10āˆ’5, nucleotide identity ≄95%, and coverage ≄50%). The recovered sequences were subjected to BLASTN against the NCBI-nt database (e-value ≤10āˆ’5, nucleotide identity ≄80%, and query match length ≄500 bp). The sequences that also contain bacterial genes were considered bacterial genomes containing Autographiviridae prophages.

The potential hosts of Autographiviridae UViGs were predicted using the RaFAH tool59 with default settings. The training and validating random forest model for RaFAH was built with 4269 host-known phages downloaded from the NCBI RefSeq (v215).

Phylogenetic analyses

A genome-wide proteomic tree was constructed using ViPTree60 based on whole-genome sequence similarities calculated by tBLASTx. Seven core genes from marine Autographiviridae UViGs and isolates were used for the phylogenomic analysis. The amino acid sequences of the seven core genes were aligned using MAFFT v7.505 (--maxiterate 1 000 --localpair) and trimmed using trimAl v1.4.rev1561. Thereafter, alignment was used to construct the phylogenomic tree using IQ-TREE v2.2.0.362 under the LG + F + I + I + R10 substitution model with 1000 bootstrap replicates. The phylogenetic trees were visualized and annotated using the Interactive Tree Of Life (iTOL) v663.

Viromic read recruitment analysis and statistical analysis

The relative abundance of the Autographiviridae phages and UViGs was estimated using the viromic read-mapping analysis. A total of 220 marine viromic datasets, including Global Ocean Viromes9, Pearl River Estuary Virome19, Mariana Trench Virome18, Eastern Tropical North Pacific Virome20, Delaware Bay and Chesapeake Bay Viromes17, Black Sea Virome16, Red Sea Virome15, and South China Sea DNA Virome10, were used for viromic read-mapping analysis.

Viromic reads were mapped to the non-redundant set of analyzed Autographiviridae genomes using CoverM (-p bwa-mem --min-read-percent-identity 95 --min-read-aligned-length 50 --min-read-aligned-percent 80, https://github.com/wwood/CoverM). The relative abundance was normalized by mapped reads per kilobase pair of genomes per million reads (RPKM). A heatmap of the RPKM values was generated using the pheatmap package in R. Linear-regression analysis was performed using R to test the relationship between environmental parameters and the relative abundance of these phages. Statistical significance was set at p < 0.05.

Reporting summary

Further information on research design is available in theĀ Nature Portfolio Reporting Summary linked to this article.

Results and discussion

Identification and phylogenomic analyses of marine-derived Autographiviridae UViGs

Metagenomic mining analysis identified 1253 Autographiviridae UViGs with 100% genome completeness. These UViGs originated from diverse marine environments, including coastal waters, open oceans, deep oceans, estuaries, and marine sediments (Supplementary DataĀ 1). The genome size of these UViGs ranged from 35.6 to 86.8 kb, and their G + C content ranged from 22.3% to 70.8% (Supplementary DataĀ 1). Genomic comparison revealed that Autographiviridae UViGs are largely syntenic with Autographiviridae phages, and the taxonomic classification based on the ViPTree showed these UViGs are located among known Autographiviridae phages (Supplementary Fig.Ā S1). In combination with known marine Autographiviridae phages, a total of 1143 species-level Autographiviridae populations were identified by using the ≄95% nucleotide identity threshold.

Marine Autographiviridae are phylogenetically highly diverse

OrthoFinder protein clustering analysis identified 4560 orthologous protein groups (≄ 2 members) in marine Autographiviridae genomes; however, only 435 proteins have been assigned putative biological functions. The accumulation curves of the pan protein groups reached saturation, indicating that our study captured the majority of the genetic diversity within marine Autographiviridae phages (Supplementary Fig.Ā S2A). Core genome analysis revealed that Autographiviridae phages share a common set of seven core genes. The accumulation curve of core genes leveled off (Supplementary Fig.Ā S2B). These core genes include those involved in phage replication and development, such as genes encoding RNAP, portal protein, scaffolding protein, capsid protein, tail tubular protein A and B, and TerL. Highly conserved core genes and flexible pan genes have also been observed in other phage groups, such as HMO-2011-type and HTVC010P-type phages26,28. In the Autographiviridae family, the core genes are involved in DNA metabolism and replication, morphogenesis, and DNA packaging, suggesting a conserved framework underpinning infection and propagation mechanisms. By contrast, distinct subgroups tend to encode specific accessory genes, highlighting pronounced pan-genomic divergence across subgroups. The conservation of core genes likely arises from their indispensable roles in the phage life cycle, while the high diversity observed in the accessory genome is potentially driven by long-term phage-host interactions and environmental adaptation, facilitated through mechanisms such as horizontal gene transfer and mutation.

To resolve the genetic diversity and evolutionary relationships among marine Autographiviridae phages, a phylogenomic analysis based on the concatenated alignment of seven core genes was performed. Phylogenomic analysis revealed that marine Autographiviridae phages are remarkably diverse (Fig.Ā 1A). Based on the tree topology and genomic features, Autographiviridae phages formed at least 14 well-supported groups, referred to as Autographiviridae groups AG-1 to AG-14 (Fig.Ā 1A). The remaining 59 UViGs did not form well-separated groups and thus remained unclassified. Most marine Autographiviridae groups were distinct from previously defined non-marine Autographiviridae subfamilies, except for AG-9, which clustered with Studiervirinae, and AG-14, which grouped with multiple non-marine subfamilies (Supplementary Fig.Ā S3). We further compared the genomic features of these groups, and found significant variations in their G + C content and genome sizes (Fig.Ā 1B and Supplementary DataĀ 1).

Fig. 1: Phylogenomic analyses of all known marine Autographiviridae phage isolates, as well as Autographiviridae UViGs identified in this study.
figure 1

A A maximum-likelihood phylogenetic tree was constructed using concatenated sequences. Phylogenetic inference was performed using the maximum-likelihood method implemented in IQ-TREE. Marine Autographiviridae phages were clustered into 14 groups based on the phylogeny and genome content. Shading indicates the distinct groups. Reference isolates are shown with colored dashed lines. Gray circles on the nodes indicate bootstrap values of >80%. B Boxplots showing the genome size and G + C content of each Autographiviridae group along the x-axis.

The number of members in these groups varies from 8 to 350 (Supplementary DataĀ 1). Eight (AG-1, 2, 4, 6, 8, 9, 13, and 14) of the 14 groups comprised cultivated representatives, and the remaining 6 groups comprised only UViGs. AG-1 is the largest Autographiviridae group with 350 members. Their G + C content ranges from 32.4% to 63.2% (Fig.Ā 1B). All previously isolated Autographiviridae cyanophages (Synechococcus and Prochlorococcus phages) were classified into this group and most AG-1 UViGs were predicted to infect Synechococcus or Prochlorococcus (discussed below). Most AG-1 members originated from the upper ocean ( < 200 m) (Supplementary DataĀ 1), where marine cyanobacteria are highly abundant. AG-1 can be further divided into four closely related subgroups (AG-1.1 to 1.4). AG-1.1 corresponds to the previously defined cyanophage clades MPP-A and MPP-B41, whereas AG-1.3 and 1.4 contained the newly discovered MPP-C cyanophages without the DNAP gene (Fig.Ā 1A)44. These results suggest that Autographiviridae cyanophages are evolutionarily closely related and form a monophyletic group. AG-2 consists of 24 UViGs and 5 cultivated representatives that infect SAR11 bacteria. The G + C content of AG-2 members ranges from 31.3% to 35.0%, similar to the G + C content of SAR11 and previously reported pelagiphages22,26,27,28,43,64,65. AG-4 included nine UViGs and six cultivated representatives that infect Roseobacter strains. AG-6 contained 27 UViGs and 3 cultivated representatives that infect Roseobacter strains. AG-8 consisted of 69 UViGs and 25 cultivated Autographiviridae pelagiphages. The G + C content of AG-8 members ranged from 31.9% to 47.5%; more than 65% of the AG-8 members had a G + C content below 36%, similar to that of pelagiphages22,25,26,27,28,30,43,66. Most AG-2 and AG-8 members were predicted to infect SAR11 (discussed below). AG-9 included 63 UViGs and Citrobacter phage phiCFP-1. AG-13 contained 20 UViGs and roseophage CRP-143. AG-14 contained 54 UViGs and 14 cultivated representatives that infect Alteromonadaceae, Vibrio, Marinomonas, and Stappia.

The six Autographiviridae groups—AG-3, AG-5, AG-7, AG-10, AG-11, and AG-12—comprise 8, 240, 102, 54, 23, and 32 UViGs, respectively, all of which lack cultured representatives. AG-3 is the smallest group with only 8 UViGs. AG-5 is the sister group of AG-6 and is the second-largest group. AG-7 is the third-largest group, containing 102 UViGs. AG-10 contained 54 UViGs, and was further separated into two subgroups (AG-10.1 and AG-10.2), which varied in genome size and G + C content (Fig.Ā 1B).

Genome variation among Autographiviridae groups

The examination of the distribution of the orthologous groups among the Autographiviridae subgroups revealed clear core-genome conservation and pan-genome differences among various groups. In the DNA metabolism and replication modules, approximately half of the Autographiviridae phages contain a set of T7-like DNA replication related genes located downstream of the RNAP gene, including genes encoding single-stranded DNA-binding proteins (SSB), endonuclease, DNA primase/helicase, DNAP, and exonuclease. The remaining Autographiviridae phages lack at least one of these genes (Fig.Ā 2). Phage-encoded DNAP, DNA primase, and SSB are essential components of the DNA replication machinery. These phages may rely on host genes to compensate for the absence of these genes (Fig.Ā 2). These results suggest that Autographiviridae phages have differences in DNA replication machinery. In the structure and packaging module, all Autographiviridae phages possess a set of conserved genes, including genes encoding head-tail connector, scaffolding protein, capsid protein, tail tubular protein A and B, and TerL, suggesting that they all have a conserved T7-like neck-tail module (Fig.Ā 2). Other structural genes were more varied between the groups.

Fig. 2: Distribution of key genes identified across genomic groups.
figure 2

Each row represents an Autographiviridae group (with parenthetical numbers indicating total group members), while each column corresponds to a specific gene. Color intensity reflects the proportion of gene carriers within each group, with numerical values showing detection counts for each gene in each group. PAPS Reductase: 3’-phosphoadenosine-5’-phosphosulfate reductase.

Genome content and comparative genomic analyses revealed that several groups possess specific genomic features, such as the presence of group-specific genes, absence of DNAP genes, and large genome size, that differentiate them from other groups. These unique features are discussed below.

Autographiviridae subgroups lack DNAP

DNAP was considered the core gene of Autographiviridae until the DNAP-lacking Autographiviridae cyanophages were discovered44. All members of the three AG-1 subgroups (AG-1.2, AG-1.3, and AG-1.4) lack DNAP (Figs.Ā 2 and 3A), with AG-1.2 being a newly identified subgroup without cultured representatives. Except for members of AG-1, DNAP is also absent in AG-11 (Figs.Ā 2 and 3A). As Autographiviridae phages evolved, the phage-encoded DNAP gene may have been lost on several occasions. It is possible that these DNAP-lacking Autographiviridae phages exclusively employ the DNAP of their hosts for DNA replication. AG-11 members also lack homologs of Autographiviridae SSB, exonuclease, and endonuclease (Figs.Ā 2 and 3A). In addition, their DNA primase and helicase are located upstream of the RNAP and share limited identity (< 30% amino acid identity) with those in other Autographiviridae genomes. These results suggest that the DNA replication modules in AG-11 members are evolutionarily distant.

Fig. 3: Comparison of marine Autographiviridae groups.
figure 3

A Comparison of genetic maps of representative Autographiviridae phages that lack the DNA polymerase (DNAP) gene from AG-1.2, AG-1.3, AG-1.4, and AG-11. Predicted open reading frames are represented by arrows and colored based on their putative functions. The scale color bar indicates amino acid identities between homologous genes. B Comparison of genetic maps of representative Autographiviridae phages that possess two RNA polymerase (RNAP) genes from AG-14 and AG-10.1. All RNAP genes are indicated in red and the additional RNAP genes are indicated with red asterisks. C Unrooted maximum-likelihood phylogenetic tree of RNAP in Autographiviridae phages. The typical Autographiviridae RNAPs (located upstream of DNA replication genes) are colored according to groups, and the additional RNAP genes in AG-10.1 and AG-14 are indicated with arrows.

Autographiviridae phages encoding two RNAP genes

Apart from the typical Autographiviridae RNAP gene (approximately 800 amino acids) located upstream of the DNA replication and metabolism module, 23 members in AG-10.1 and 17 members in AG-14 were found to encode an additional RNAP of approximately 650 amino acids in length (Fig.Ā 3B). The additional RNAP genes in AG-10.1 genomes are all located upstream of the typical Autographiviridae RNAP and are surrounded by small, functionally unknown genes. Meanwhile, the additional RNAP genes in AG-14 genomes are all located between the DNA metabolism and the structure modules. Sequence analysis revealed that the additional RNAPs share limited sequence identity with other Autographiviridae RNAPs and are classified into two distinct groups in the phylogenetic tree (Fig.Ā 3C). These results suggest that they may have originated from horizontal gene transfer from other phages. The presence of two RNAPs implies that these phages may require different RNAPs for the transcription of different phage genes. Notably, unlike the concatenated core gene phylogeny, the typical RNAP genes of AG-14 are divided into two distinct clusters (Fig.Ā 3C), with the smaller cluster showing closer evolutionary relationships with the AG-1 members. This incongruence suggests that the RNAPs in AG-14 exhibit greater evolutionary diversity and highlightsĀ their unique evolutionary dynamics.

A unique Autographiviridae group with large genome sizes

Notably, AG-10.1 members have a significantly larger genome than the other subgroups (Fig.Ā 1B). A closer analysis revealed that AG-10.1 members harbor several interesting features. AG-10.1 members possess a set of genes upstream of RNAP genes. This region is more varied than other regions. Interestingly, all members in AG-10.1 possess a gene that encodes a large protein (3274 to 5566 amino acids) in this region (Fig.Ā 3B). Most of these large proteins exhibit little or no homology, with very limited similarity in some regions. Sequence analysis revealed the presence of conserved domains within some of these large protein sequences. For example, eight genes contain an RNase_H domain (PF13482), seven genes contain the N terminal domain of SMC (structural maintenance of chromosomes) (PF02463), and four genes contain the RNA_pol domain (PF00940). The RNase_H domain was found in many proteins with unknown functions, such as the hypothetical protein YqgF of Escherichia coli67. The SMC N-terminal and C-terminal domains constitute SMC proteins, but the SMC C-terminal has not been identified in these genes. Therefore, although some domains have been identified in these sequences, the specific functions of these large protein genes remain unclear. AG-10.1 members originate from various marine environments (Supplementary DataĀ 1). It is possible that as the AG-10.1 members evolved, they gained this large gene via gene transfer to adapt to specific hosts. Another notable feature of the AG-10.1 members is the absence of the SSB gene. In addition, several AG-10.1 members possess an additional RNAP gene (discussed above), located downstream of the large protein gene (Fig.Ā 3B).

Variation in metabolic potential

Through the functional annotation of the protein groups, we identified various host-derived auxiliary metabolic genes (AMGs) and revealed that different groups varied in their metabolic potentials.

Photosynthetic-related genes, including genes encoding photosystem II D1 protein (psbA), high light-inducible protein (hli) and phycobilisome proteolysis adapter (nblA) are exclusively present in Autographiviridae cyanophages from AG-1.1 (Fig. 2). psbA and hli are widely distributed in marine cyanophage genomes, and their expression contributes to the maintenance of the host’s photosynthetic machinery during the latent phase of infection68,69. NblA is a key regulator of phycobilisome degradation in cyanobacteria and is highly expressed during nutrient deprivation70,71. Recent studies have shown that marine Autographiviridae phages commonly encode the nblA gene72. NblA may facilitate the degradation of host phycobilisomes, thereby releasing amino acids that serve as building blocks for virion assembly and accelerating the phage replication cycle72. Genes encoding transaldolase are also exclusively present in AG-1.1. The transaldolase in cyanophages is thought to enhance the pentose phosphate pathway of the host, thereby producing more NADPH and ribose 5-phosphate to support phage replication73,74. In addition to the aforementioned AMGs, within the AG-1 group, AG-1.1 contains more AMGs than AG-1.2, 1.3, and 1.4, including those encoding ribonucleotide reductase (RNR), phosphohydrolase, thioredoxin, thymidylate synthase (ThyX), and transglycosylase. This suggests that AG-1.1 may employ a distinct survival strategy by leveraging these AMGs to maximize host metabolic resources utilization, whereas AG-1.2, AG-1.3, and AG-1.4 likely prioritize replication efficiency by minimizing host metabolic burden during virion replication. These divergent evolutionary strategies drive various Autographiviridae subgroups to exploit divergent ecological niches, thereby enabling holistic optimization of resource utilization efficiency.

AG-2 and AG-8 were both predicted to infect SAR11 (as discussed below). Notably, AG-8 contains more AMGs compared to AG-2. AMGs encoding RNR, phosphohydrolase, S-adenosylmethionine decarboxylase (AdoMetDC), Hsp20 heat shock proteins, and DNA methyltransferase (DNMT) were frequently detected in AG-8 but were absent in AG-2. AdoMetDC is crucial for polyamine biosynthesis as it catalyzes the formation of the aminopropyl group donor75,76. Polyamines are crucial for bacterial physiology and are known to be important substrates for the growth of SAR1177,78. We speculate that this enzyme provides advantages for the physiology and survival of the host during phage infection. Previous studies demonstrate that in SAR11 and other bacteria, Hsp20 expression is significantly upregulated under stress conditions79. Hsp20 may help stabilize host protein homeostasis, thereby prolonging the window for efficient viral replication. In addition, studies suggest that Hsp20 might play an important scaffolding role during capsid maturation80.

Lysogenetic life cycle and host taxonomic diversity of marine Autographiviridae phages

The integrase gene (int) is prevalent in marine Autographiviridae phages, with 625 out of 1143 populations encoding a tyrosine integrase gene (Fig. 2 and Supplementary DataĀ 2). Phage-encoded integrase catalyzes the site-specific recombination between phage and host genome81,82. Autographiviridae phages that encode int have been reported previously in cyanophage, pelagiphage, and roseophage genomes39,40,41,43. Autographiviridae pelagiphages have been reported to integrate into their SAR11 host genomes at various tRNA genes43. In addition, it has been reported that Autographiviridae S-TIP37 cyanophage could perform unstable integration with its host58. The int gene was detected in all Autographiviridae groups except for the AG-10 and AG-13 groups, indicating that members of these groups may adopt a strictly lytic lifestyle (Fig.Ā 2). In AG-1, the int gene was present in only 2 of the 50 members in AG-1.3, whereas 35–97% of the members in other subgroups encode this gene. This suggests that AG-1.3 may employ a different lifestyle compared to other AG-1 subgroups. These results suggest that most marine Autographiviridae groups have the capability to perform a lysogenic lifecycle, potentially playing an important role in the evolution of their hosts.

To confirm the lysogenic life cycles of some Autographiviridae phages and identify their potential hosts, we searched for the hybrid phage-host integration sites from marine MAGs and have identified several sequences containing integration sites (Supplementary DataĀ 3). Among these identified MAGs, two contain an RNAP gene that are 95.5% and 97.8% identical to the RNAP sequences of two AG-7 phages (IMGVR_UViG_3300020411_000001 and AFVG_25M217), respectively. These two MAGs also contain sequences highly identical to a Chloroflexota bacterium (93.6 and 99.1% identity), suggesting that these two AG-7 phages infect Chloroflexota bacteria. Chloroflexota is a widespread bacterial phylum abundant in deep ocean83,84.

The host information for eight Autographiviridae UViGs is provided in IMG/VR 4.0. Five are prophages originating from bacterial genomes, including one AG-1.1 phage identified from the genomes of a Synechococcus strain and four AG-9 members identified from the genomes of three Gammaproteobacteria strains (Spartinivicinus, Vibrio, and Marinobacter). Using the CRISPR spacer match method, it was inferred that one member in AG-5 might infect phylum Bacillota. Additionally, using the k-mer match method, it was speculated that one member each in AG-7 and the unclassified group potentially infect Rhizobiales within Alphaproteobacteria and Desulfobacteraceae within Desulfobacteria, respectively.

The potential hosts of Autographiviridae UViGs were then predicted using the RaFAH tool based on protein content. This analysis revealed the broad taxonomic diversity of the hosts (Supplementary DataĀ 4). Most AG-1 members were predicted to infect cyanobacteria, whereas most AG-2 and AG-8 members were predicted to infect SAR11. We found some members in AG-9 were also predicted to infect SAR11, and these members have significantly lower genomic G + C content (30.4–32.2%) than other AG-9 members. Most AG-4 and AG-6 members were predicted to infect Roseobacter. Several AG-7, AG-10, AG-12, and AG-13 members were also predicted to infect Roseobacter; however, the prediction scores were low. The hosts of most AG-5 members were not determined, whereas some were predicted to infect Burkholderiaceae with low prediction scores. Some AG-7 members were predicted to infect Chloroflexota. Some members in AG-9 were predicted to infect Enterobacteriaceae with high prediction scores. Some AG-14 members are predicted to infect Vibrionaceae, Alteromonadaceae, and Oceanospirillaceae with high prediction scores.

Biogeography of Autographiviridae phages in global oceans

In this study, 220 marine viromic datasets were used to elucidate and compare the distribution and relative abundance of different Autographiviridae populations (Supplementary DataĀ 5).

We first examined the distribution of the Autographiviridae populations in global ocean. Of the 1143 Autographiviridae populations, 1083 were detected in marine viromes. TheseĀ 1083 populations were detected in 93.6% (206 of 220) of the analyzed viromes, covering various marine environments (Fig.Ā 4A, Supplementary Fig.Ā S4 and Data 5). Each of these 206 viromes contained at least 20 Autographiviridae populations. Collectively, Autographiviridae phages were detected from tropical to polar stations, and from coastal to open ocean stations (Fig.Ā 4A, B). They were also detected in distinct water layers, from the surface (SRF) (0–10 m) to the bathypelagic (BATHY) zones (1000–4000 m) (Fig.Ā 4B and Supplementary Fig.Ā S4). These results suggest that Autographiviridae phages are widespread in oceans worldwide. More Autographiviridae populations were detected at the trade (0°–30° latitude) and westerlies (30°–60° latitude) stations than at the polar stations (60°–90° latitude) (p < 0.01, Mann–Whitney U tests) (Fig.Ā 4A and Supplementary Fig.Ā S4). The number of detected Autographiviridae populations was highest at open ocean stations (p < 0.05, Mann–Whitney U tests) (Fig.Ā 4B). Vertically, the overall population richness was highest at the deep chlorophyll maximum (DCM) stations followed by the MES and SRF stations (p < 0.01, Mann–Whitney U tests) (Fig.Ā 4B). This pattern could be linked to the richness of their hosts, as some studies have indicated that bacterial richness peaked at the DCM and its adjacent depths85,86. The DCM and SRF zones contain populations from various Autographiviridae groups, with AG-1 that infect cyanobacteria contributing to the majority of the population richness (Supplementary Fig.Ā S5).

Fig. 4: The biogeographic distribution of marine Autographiviridae phages across the global ocean.
figure 4

A Map of the number of Autographiviridae populations detected in each virome. The size of each dot represents the number of Autographiviridae populations detected in that virome. Right: area plot showing the relationship between the population size (x-axis) and latitude (y-axis). Boxplot showing the numbers of Autographiviridae populations (y-axis) in different climate zones. B Box plots of the number of Autographiviridae populations (y-axis) detected in viromes from different ecological zones. The significance of pairwise comparisons calculated using t-test was shown with the asterisk corresponding to the p-value (*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001). C Box plots showing the number of stations (y-axis) where each Autographiviridae group was detected. The significance of pairwise comparisons was calculated using t-test and is indicated by the different letters above the boxes (p < 0.05).Ā SRF surface,Ā DCM deep chlorophyll maximum; MES Mesopelagic, BATHY bathypelagic.

Among distinct groups, AG-2 and AG-8, whose members are primarily predicted to infect SAR11, were the most prevalent globally (Figs.Ā 4C, Ā 5A and Supplementary Fig.Ā S6). This wide distribution of AG-2 and AG-8 phages can be readily explained by the ubiquity of their putative host, the SAR11 clade, which is the most abundant bacterioplankton group in the ocean. This pattern also aligns with that of previously analyzed pelagiphages26,27,28. AG-1 members, particularly AG-1.1 and AG-1.4, were widely distributed globally except in the polar regions, and they were mostly distributed in SRF and DCM waters (Fig.Ā 5A, Supplementary Fig.Ā S5 and Fig. S6). This biogeographic pattern closely mirrors that of their putative hosts, the cyanobacteria Synechococcus and Prochlorococcus, which dominate the upper, warmer oceans87,88. Prochlorococcus is predominantly distributed between 40°S and 40°N latitude, whereas Synechococcus is distributed from the equator to subpolar regions, with a marked decline in abundance at high-latitude regions87,88. Some AG-1 populations were also detected in the mesopelagic (MES) (200–1000 m) and BATHY zones (Supplementary Figs.Ā S5 and S7), possibly due to the sinking of cyanophages. This is consistent with findings that cyanophages can be exported from the photic to abyssal ocean via associating with sinking particles89. AG-10.1 members with large genome sizes were also ubiquitous globally (Figs.Ā 4C, Ā 5 and Supplementary Fig.Ā S6). In addition, several other groups were also detected worldwide, such as AG-7 and AG-5 (Figs.Ā 4C, Ā 5A and Supplementary Fig.Ā S6). Some AG-7 members were predicted to infect Chloroflexota and Roseobacter, which are also ubiquitously distributed in the ocean. Although the hosts of most AG-5 members remain unclear, their cosmopolitan distribution suggests that these phages may infect dominant and ubiquitous marine bacterial groups. The other groups displayed relatively narrow distributions (Figs.Ā 4C, 5A and Supplementary Fig.Ā S6). For example, AG-13 members were mostly detected at coastal, estuarine and polar stations; however, they were rarely detected in the open ocean, which may be related to the low abundance of their putative Roseobacter hosts in the open ocean90 (Fig. 5A and Supplementary Fig.Ā S6). AG-3 members infecting unknown hosts were mostly detected at trade and westerlies stations, and were rarely detected in polar regions (Fig.Ā 5A,Ā Supplementary Fig.Ā S6Ā and Fig. S7).

Fig. 5: The biogeographical distribution of marine Autographiviridae phages across global ocean viromes.
figure 5

A Heatmap showing the relative abundance of each Autographiviridae phage in different marine viromic datasets. The relative abundance was normalized as reads per kilobase of genome per million mapped reads (RPKM). Environmental metadata associated with each station are shown above the heatmap using color bars. Box plots showing the relative abundance of Autographiviridae phages across different oceanic zones (B), biome types (C), and depth layers (D). The significance of pairwise comparisons calculated using the two-tailed Mann–Whitney U test is indicated with asterisks corresponding to the p-value (***p < 0.001, ****p < 0.0001).

Next, we examined the relative abundance of Autographiviridae phages in the global viromes (Fig.Ā 5). Overall, the RPKM values of Autographiviridae phages were highest at the trade stations followed by westerlies stations (p < 0.01, Mann–Whitney U tests) (Fig.Ā 5B). Autographiviridae phages had a higher RPKM at the coastal stations followed by open ocean stations (p < 0.01, Mann–Whitney U tests) (Fig.Ā 5C). Vertically, the RPKM values of Autographiviridae phages in the DCM and SRF zones were significantly higher than those in the MES and BATHY zones, indicating that Autographiviridae phage were predominant in the upper ocean (Fig.Ā 5D). Among the distinct groups, AG-1.1, AG-5, and AG-8 members showed the highest RPKM followed by AG-1.4, AG-7 and AG-1.3 (Fig.Ā 6A). AG-1.1 and AG-1.4 members exhibited the highest RPKM at most trade stations (Fig.Ā 6B, C and Supplementary Fig.Ā S7). Linear regression analysis revealed that the relative abundance of AG-1.1 and AG-1.4 members was positively correlated with temperature (AG-1.1: p < 0.001, R2 = 0.66; AG-1.4: p < 0.001, R2 = 0.56) and negatively correlated with absolute latitude (AG-1.1: p < 0.001, R2 = 0.32; AG-1.4: p < 0.001, R2 = 0.42) (Supplementary Fig.Ā S8). Similar correlations were observed in AG-1.2 and 1.3 (Supplementary Fig.Ā S8). This is consistent with the distribution of their predicted hosts, Synechococcus and Prochlorococcus.

Fig. 6: The dominance of Autographiviridae groups.
figure 6

A Box plot showing the relative abundance of Autographiviridae groups in marine viromes. The relative abundance of each group at each virome was calculated by summing the RPKM values of all phages in each group. Autographiviridae groups were sorted according to their median RPKM value in marine viromes. The significance of pairwise comparisons was calculated using the two-tailed Mann–Whitney U test and is indicated by the different letters above the boxes (p < 0.05). B Map of the most abundant Autographiviridae groups detected in each virome. C Barchart showing the number of stations where each group displays the highest abundance. The number of stations with the highest abundance in different environments for each group is shown.

Further, we found that the top 10 Autographiviridae cyanophages with the highest RPKM were exclusively UViGs, suggesting that the most abundant Autographiviridae cyanophages have not yet been isolated (Supplementary DataĀ 5). In the polar regions, AG-8, AG-7 and AG-5 were among the most abundant groups, exhibiting the highest relative abundance at most polar stations (38/42) (Fig.Ā 6B, C). Host prediction analysis indicated that the potential hosts of AG-8 members were SAR11, and some AG-7 members might infect Roseobacter. Cold-adapted ecotypes of SAR11 and Roseobacter are metabolically active and constitute a significant portion of the microbial community in polar waters91,92,93,94,95, explaining the dominance of AG-8 and AG-7 clusters in polar environments. The hosts of AG-5 members remain unidentified, but it was speculated that they are highly abundant in polar environments due to their phages being prevalent in these regions. In AG-14, eight phages showed remarkably high relative abundance in Delaware Bay stations (Fig.Ā 5A and Supplementary DataĀ 5). They were predicted to infect members of the family Moraxellaceae (Predicted host score > 0.99). Comparative genomic analysis revealed that these eight phages exhibited high homology to Autographiviridae phage vB_AbaP_Indie (46.35–47.42% average amino acid identity), which infects Acinetobacter within Moraxellaceae. Phage vB_AbaP_Indie was originally isolated from influent wastewater at a treatment plant 108. This suggests that these eight phages may also originate from terrestrial environments.

Conclusions

Autographiviridae phages are important and dominant components of the marine virosphere and are notable for their wide distribution, high abundance, and diverse potential hosts. Previous studies addressing the genomic diversity of marine Autographiviridae phages were mostly limited to cyanophages, pelagiphages, and roseophages. In this study, we performed metagenomics-based analyses to assess the diversity and biogeography of marine Autographiviridae phages. Our analysis revealed an unprecedented diversity of marine phages in this family, as it comprises at least 14 subgroups that possess substantial genomic variation. The significant difference in the number of AMGs carried by phages may reflect their divergent survival strategies. The Autographiviridae UViGs discovered in this study substantially increase the known phylogenetic diversity of marine Autographiviridae phages and highlight how their infectivity and metabolic capabilities have influenced marine ecosystems. Furthermore, read-mapping analysis revealed that marine Autographiviridae groups were enriched in the upper ocean and that several groups infecting cosmopolitan marine bacteria were more prevalent. These distribution patterns mirror host ecology, emphasizing top-down control on microbial communities and biogeochemical cycles. Taken together, our findings reveal that marine Autographiviridae can infect various bacterial hosts and have a wide geographical distribution. Their ecological impacts warrant further investigation.

Limitations of the study

In this study, we report a comprehensive study of the genetic diversity and biogeography of marine Autographiviridae phages. Our analyses revealed that marine Autographiviridae phages harbor great genetic diversity and show considerable variations in genetic and ecological features. However, this study has several limitations. First, this analysis relies on Autographiviridae genomes retrieved from currently available marine metagenomic datasets. The investigation in this study may not include the Autographiviridae diversity from unexplored marine ecosystems. Second, reliable phage-host relationships were only predicted for limited UViGs. Hosts of marine Autographiviridae phages remain largely unknown. Finally, due to the lack of cultured representatives from different Autographiviridae groups, the understanding of biological characterization and ecological application of marine Autographiviridae phages is still limited. In conclusion, while our study provides extensive insights into the diversity and ecology of marine Autographiviridae phages and highlights their ecological importance, it also underscores the need for further study to address the above-mentioned limitations.