Fig. 2: Scalable indexing with MetaGraph. | Nature

Fig. 2: Scalable indexing with MetaGraph.

From: Efficient and accurate search in petabase-scale sequence repositories

Fig. 2

a, The size of evaluated index data structures for representing a set of microbial whole-genome sequencing (WGS; BIGSI dataset) experiments of increasing size, shown for both lossy indexing methods COBS and kmindex, and for lossless Mantis, Bifrost, Themisto, Fulgor and MetaGraph with the SuccinctDBG and RowDiff<Multi-BRWT> compression schemes to encode the graph and the annotation, respectively. The dashed lines indicate lossy methods. b, The times for querying human gut metagenome AMPLICON sequencing reads (SRA: DRR067889) against indexes constructed with MetaGraph and other state-of-the-art tools from sets of microbial WGS experiments of increasing size. All curves show the performance of exact k-mer matching, except for the dotted MetaGraph curve, which shows the query performance with the more sensitive search strategy involving alignment. c, Overview of all MetaGraph indexes. For all datasets, we show the total number of input characters on the x axis and index size (given as the total number of unique k-mers) on the y axis. The marker size represents the size of the index. The solid portion of each marker represents the fraction of the total size taken by the graph and the translucent portion represents the fraction taken by the annotation (Table 1). Asterisk indicates that the inputs of the UniParc dataset are amino acid sequences, not base pairs.

Back to article page