Fig. 4: Squeegee performance on HMP metagenomic datasets.
From: De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee

a Left panel depicts the Genus level precision, recall, and F-score using previously reported kit contaminants as the ground truth. Unweighted precision is calculated as the ratio between the number of predicted contaminant taxa found in the ground truth and the total number of predicted contaminant taxa. An unweighted recall is calculated as the ratio between the number of predicted contaminant taxa found in the ground truth and the total number of taxa in the ground truth. While weighted by samples, the measurements are weighted by the mean proportion of the reads assigned to each taxon in the non-control experiment samples. b The right panel highlights the correctly predicted genera marked in orange with stripes, and the genera that Squeegee failed to predict are marked in gray. Genera with relative abundance below 1% are combined. Source data are provided as a Source Data file.