Fig. 6 | The ISME Journal

Fig. 6

From: Resolution of habitat-associated ecogenomic signatures in bacteriophage genomes and application to microbial source tracking

Fig. 6The alternative text for this image may have been generated using AI.

Simulation and modelling of virome-based source tracking using ɸB124-14 ecogenomic signatures. To evaluate the potential for the ɸB124-14 ecogenomic signature to be used in MST, we undertook more extensive Monte Carlo-based simulations of pollution using randomly permuted and polluted environmental viromes, and specific detection of human pollution using ɸB124-14 ORF relative abundance profiles. a nMDS and ANOSIM analysis of uncontaminated and 'polluted' permutations of environmental viral metagenomes. Symbol shape for polluted data sets (human, bovine or porcine) represents the strength of contamination as indicated by the associated key. ANOSIM shows the separation of groups of data sets with varying ranges of human or animal contamination, from uncontaminated environmental viromes (**P = 0.001). ENVU—uncontaminated environmental virome permutations; ENVHGV—environmental virome permutations contaminated by human gut ecogenomic signature; ENVBOV—environmental virome permutations contaminated by bovine gut ecogenomic signature; ENVPORC—environmental virome permutations contaminated by porcine gut ecogenomic signature; b ROC curves were constructed from randomly permuted and polluted data sets displayed in a, based on relative abundance profiles from all ɸB124-14 ORFS, or a subset of ORFS exhibiting significantly different  mean relative abundance in human gut viromes than other data sets (see Fig. 5c). Subset 1 ORFS = 5, 16, 18, 20, 21, 22, 23, 25, 34, 36, 43, 44, 59, 61 and 67; subset 2 ORFS = 16, 34 and 56. The area under curve (AUC) for each ROC curve indicate the diagnostic potential for cumulative relative abundance of each ORF combination to distinguish different groups of data sets, with values approaching 0.5 indicating little or no diagnostic power. All AUC were statistically significant at P ≤ 0.002. c Histograms show the proportion of data sets of each type (ENVU; ENVHGV; ENVBOV; ENVPORC) accurately identified by a two-step classification approach using threshold values indicative of either pollution in general (step 1) or human pollution more specifically (Step 2), selected based on sensitivity and specificity values generated by ROC analyses (a minimum sensitivity of 0.91). This pipeline was evaluated using threshold values for binning derived from either subset 1 ORFS, subset 2 ORFS or a combination in which subset 1 values were applied to step 1, and subset 2 values were applied to step 2. ****P < 0.0001. Error bars show standard error of the mean from 100 iterations with 100 new randomly permuted and polluted data sets of each type per iteration

Back to article page