Figure 4: Inference of PGSR phage host-range.
From: Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences

PGSR sequences were compared with a wide range of bacterial chromosomes and phage genomes, using both tetranucleotide profiles and alignment-based methods (Blast). (a) Phylogram showing relationships between PGSR sequences, human gut-associated chromosomes (n=324) and all large contigs from assembled gut viral metagenomes (n=188, 10 kb or over), based on tetranucleotide profiles. Clusters I–IV indicate regions populated by PGSR phage and driver sequences, and associated pie charts provide the proportion of total PGSR phage sequences in each cluster, designated by black segments. NT (nucleotide): shows genus-level taxonomic assignments for PGSR phage in each cluster based on Blastn searches, and figures in parentheses show total number of PGSR phage affiliated with each genus (≥75% identity, 1e−5 or lower, alignment length of 1 kb or more). ORF: shows genus-level taxonomic assignments for PGSR phage in each cluster based on tBlastn alignments of individual PGSR phage ORFs with 1,700 complete bacterial chromsomes (≥75% identity, 1e−5 or lower). Figures in parentheses show total number of PGSR phage ORFs affiliated with each genus listed. (b) Phylogram showing relationships between PGSR phage sequences, large fragments from gut viral metagenomes, and complete phage genomes (n=647 genomes, 10 kb or over), based on tetranucleotide profiles. For phage genome sequences assigned phylogeny reflects that of host species where known. Scale bars for parts a and b show distance in arbitrary units, and all phylograms represent the most probable topologies based on 200 bootstrap replicates. (c) Total proportion of PGSR sequences and viral metagenome contigs represented in part a affiliated to phylum-level taxonomic groups based on alignments against 1,821 bacterial and archaeal chromsomes. Nucleotide: shows the proportion of sequences affiliated to each phylum based on valid Blastn hits (minimum 75% identity over 1 kb or more, 1e−5 or lower). Amino acid: shows affiliation of all putative protein encoding genes from each data set based on tBlastn searches (minimum 75% identity or over, 1e−5 or lower). See also Supplementary Data 2. The source and further details of sequences used in the analyses presented in a–c is provided in Supplementary Table S1, Supplementary Data 3–6.