Fig. 2: Determination of filtering thresholds using artificial communities of known composition in vitro (mock; n = 9 different types; 21 replicates in total) and in mice (gnotobiotes; n = 4 different communities; 28 mice in total).

a Example of the occurrence of all molecular species detected without filtering in the gut of a gnotobiotic mouse [49]. The arrow indicates the position of the first spurious molecular species, all following taxa being considered as having a high risk of being spurious (light gray bars in the enlarged inset). b Distribution of the relative abundances of first occurring spurious molecular species (as shown in panel a) across all mock communities and samples from gnotobiotes. The orange dashes on the y-axis indicate the consensus threshold of 0.25% relative abundance, above which no spurious taxa occurred with the exception of one outlier in a mock community at a relative abundance of 0.44%. c Comparison of various standard filtering cutoffs (see explanations in the text) in terms of spurious taxa (i.e., those molecular species not matching sequences of the known species contained in the artificial communities). d Corresponding percentages of positive hits retained by the different filtering strategies, with positive hits being defined as the reference sequences found in the respective amplicon datasets. e Percentage of spurious taxa and positive hits in the same reference communities using the DADA2 pipeline for analysis based on amplicon sequence variants (ASVs) [6]. f Effect of filtering thresholds at increments of 0.05% relative abundance on the detection of spurious taxa and positive hits in all mock and gnotobiotic datasets for OTUs (upper panel) and ASVs (lower panel). Lines correspond to mean values; ribbons represent standard deviations.