Fig. 2: Enhancements to the SURPI+ bioinformatics pipeline for pathogen identification.

A Schematic diagram of modifications made to the SURPI+ bioinformatics pipeline to enhance its pathogen detection capabilities. The modifications include (1) calculation of the estimated viral load for each detected virus in the sample using a quantitative internal spiked ERCC control (top row), (2) incorporation of reference-grade databases such as the FDA-ARGOS database by”tagging” of GenBank accession numbers in the SURPI+ database (middle row), and (3) identification of novel, sequence-divergent viruses using de novo viral genome assembly and translated nucleotide (amino acid) alignments to a viral protein database (bottom row). B Pairwise and overall comparisons of viral load medians among groups stratified by severity: asymptomatic (n = 24), mild (n = 53), moderately (n = 20), or severe (n = 8) respiratory infection. For the box and whiskers plots, the solid line within each box represents the median log viral load, while the dashed line indicates the mean log viral load. The interquartile range (IQR) is shown by the height of the box, with whiskers extending to the minimum and maximum values within 1.5 times the IQR. Each point corresponds to a detected virus, with different colors representing different virus species or genera. Mann-Whitney U and Kruskal-Wallis H tests were used for pairwise and overall significance testing, respectively. All tests were two-sided with Bonferroni correction for multiple comparisons, and the significance level was set at 0.05.