Fig. 3: MetaGraph completeness and accuracy.
From: Efficient and accurate search in petabase-scale sequence repositories

a, Complementary estimator of the cumulative distribution function (ECDF) curve, showing the fraction of reads that are retrieving the correct label from the index (label recall, y axis) at different sequence identity cut-offs (x axis) for a range of indexed SRA datasets. When calculating recall at a given sequence identity cut-off, a read mapped to a graph sequence (that is, the spelling of a graph walk) with a given label is only considered if the percentage of the read’s nucleotides exactly matching the sequence is above the cut-off. We define the realignability, or average recall, of a graph–query pair as the area under this curve. b, Realignability of each graph for increasing error rates in the query (an evaluation of realignability at higher error rates is shown in Supplementary Fig. 5). c, Coverage plot for the human transcript SFTPB-207 in GTEx sample SRR599154 determined by STAR alignment against the hg38 reference genome (top, blue) and retrieved from the MetaGraph GTEx index with counts (bottom, orange).