Fig. 2: Novel sequences and de novo taxa defined in the MiDAS 4 reference database.

The phylogenetic trees are based on a multiple alignment of all MiDAS 4 reference sequences, which were first aligned against the global SILVA 138 alignment using the SINA aligner, and subsequently pruned according to the ssuref:bacteria positional variability by parsimony filter in ARB to remove hypervariable regions. The eight phyla with most FL-ASVs are highlighted in different colours. Sequence novelty was determined by the percent identity between each FL-ASV and their closest relative in the SILVA_138_SSURef_Nr99 database according to Usearch mapping and the taxonomic thresholds proposed by Yarza et al.22 shown in Table 1. Taxonomy novelty was defined based on the assignment of de novo taxa by AutoTax20.