Extended Data Fig. 2: Accuracy of sequence search approaches. | Nature

Extended Data Fig. 2: Accuracy of sequence search approaches.

From: Efficient and accurate search in petabase-scale sequence repositories

Extended Data Fig. 2

Accuracy of sequence search approaches for queries of a) Illumina-type, b) PacBio HiFi-type, and c) ONT-type simulated reads. All graphs (indexes) were constructed from Chromosome 21 of the CHM13 v2.0 Homo sapiens reference genome, or its simulated Illumina-type reads at different sequencing depths, applying our usual indexing workflow with graph cleaning. Accuracy is measured as the mean RMSE between the logarithm of the edit distance computed by each method (COBS commit 1cd6df2 and GraphAligner v 1.0.17b) and gold-standard edit distances computed with edlib (commit 931be2b), measured across 1000 bootstrap samples of 500 simulated Chromosome 21 query reads. The query reads are simulated from all Chromosome 21 assemblies accessible on GenBank as of October 21, 2024. Error bars represent 95% confidence intervals of the mean. Bar hatching indicates a method that uses sequence-to-graph alignment instead of exact k-mer matching. Illumina-type reads were simulated using ART (v2.5.8). PacBio HiFi-type subreads and ONT-type reads were simulated using pbsim v3.0.0. HiFi reads were generated using PacBio CCS v6.4.0.

Back to article page