Extended Data Fig. 8: Distribution of sequencing coverage and contamination used to determine quality control cut-offs.
From: Pneumococcal within-host diversity during colonization, transmission and treatment

The distribution of the depth of sequencing coverage (a) and fraction of reads (b) that aligned to S. pneumoniae using the Kraken2 metagenomics read classification algorithm. The vertical red lines indicate the minimum thresholds chosen for samples to be included in the main analysis. (c) Boxplots indicating the distribution of the fraction of reads assigned to each species in each of the 3761 samples by the Kraken2 metagenomics read classification algorithm. Due to the large sequence diversity within, and similarity between, S. pneumoniae and S. pseudopneumoniae, a large fraction of reads assigned as ‘unclassified’ and as S. pseudopneumoniae may actually belong to S. pneumoniae genomes. The median and interquartile range is given by the horizontal lines with the whiskers indicating the largest and smallest values excluding those outside 1.5 times the interquartile range.