Figure 2
From: Linked-read sequencing for detecting short tandem repeat expansions

Size estimation of genomic intervals and STR loci using Jaccard index of barcode sharing in BLRS. (a) Inverse relationship between Jaccard index and genomic interval size observed in NA12878 of each of the three BLRS platforms. The colored bands correspond to the 95% confidence intervals for each platform. (b) Schematic of a hypothetical example illustrating the concept and terminology in computing the Jaccard index of barcode sharing for a given genomic interval. (c–e) Scatter plots of estimates (y-axis) vs. truths (x-axis) for ~ 700 arbitrary genomic intervals (black) and the target STR (red) in the simulation (c), NA12878 (d), FXN (e), and FMR1 (f) datasets. Only estimates of the expanded allele at the target loci were shown. Confidence intervals of the estimates of the target loci were shown as red error bars. Dotted red diagonal lines were added to help visualize the amount of deviation of the estimates from the true values. “Truths” (x-axis) for the ~ 700 genomic intervals in all plots were calculated based on hg38 genomic coordinates. “Truth” for the target locus is the size of the ATXN10 repeat we replaced the reference with in the modified genome to generate the simulated datasets (c); size of the CCAT allele we determined from the NA12878 assembly (d); sizes of the FXN (e) and FMR1 (f) repeats in the Coriell samples according to on-line information of the respective cell lines (Table S1).