Fig. 3: Performance of GuaCAMOLE for experimental mock community data.
From: Genomic GC bias correction improves species abundance estimation from metagenomic data

The mock community of Tourlousse et al.16 contains 19 species and was sequenced in triplicate using 28 different library preparation protocols (Table 1). We re-analzyed the reads with GuaCAMOLE, Bracken, MetaPhlAn4, SingleM, Sylph and MOTUS. GuaCAMOLE was set to report taxonomic abundances, Bracken results were manually adjusted for genome length. Relative estimation errors are \(\frac{1}{19}{\sum }_{j}| {a}_{j}-{A}_{j}| /{A}_{j}\) where Aj is the true and aj the estimated abundance of species j = 1, …, 19. A Estimated GC-dependent sequencing efficiencies of the 28 protocols by GuaCAMOLE. Highlighted protocols GH, DH, IH, FH, IL were found by Tourlousse et al. to exhibit the strongest dependency of efficiency on GC content. B Relative estimation errors per protocol, averaged over all replicates. Protocols GH, DH, IH, FH, IL are highlighted, see (A). Each boxplot represents 19 datapoints (one per protocol), and shows the median (center line), 25% and 75% quantiles (hinges) and the furthest point less than 1.5 IQRs (inter-quartile ranges) from the nearest hinge. C Relative estimation errors for the three replicates of each protocol. (D) Relative estimation error vs. number of PCR cycles used in each protocol. Lines show quadratic best fit, colors indicate the algorithm as in (C). E Relative abundance estimation error of each taxon averaged across protocols vs. genomic GC content. Lines show quadratic best fit, colors indicate the algorithm as in (C).