Figure 3: Recovery of PGSR phage sequences from metagenomic data sets.
From: Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences

Commonly used alignment-driven approaches to analyse metagenomes were evaluated for their ability to identify PGSR phage sequences. The same metagenomic data sets surveyed using the PGSR approach were also subjected to a range of alignment-based searches, including gene-centric searches with unambiguous phage-encoded ORFs (capsid and terminase genes). In addition, 991 non-redundant phage contigs also identified in searches of these datasets by Stern et al., using the recently developed CRISPR strategy, were compared8. Pie charts depicted show the proportion of PGSR phage sequences captured by each strategy, as well as the total proportion of PGSR phage identified by all strategies in combination (percentages shown). Blastn, Megablast, Discontiguous Megablast: show the proportions of PGSR phage captured in alignments with different blast algorithms when metagenomes were queried at the nucleotide level using whole-PGSR phage driver sequences (1e−3 or lower considered significant and retained). tBlastn: shows proportion of PGSR phage sequences identified using gene-centric surveys of metagenomes with all capsid and terminase genes encoded by driver sequences (1e−3 or lower considered significant). CRISPR: proportion of PGSR phage sequences identified in the 991 phage-like contigs identified by Stern et al.8, in recent surveys of the same metagenomes using CRISPR spacer regions. All searches: shows the total proportion of PGSR phage identified in the combined output of all searches conducted above.