Figure 1
From: Linked-read sequencing for detecting short tandem repeat expansions

IRR extraction using barcodes in BLRS. (a) Steps in using barcodes for IRR extraction and STR size estimation. (b–d) IRR counts (left) and repeat count estimates (right) of the target loci within the three groups of datasets: (b) heterozygous ATTCT expansion in ATXN10 10 × and stLFR simulations; (c) homozygous larger-than-reference CCAT polymorphism in NA12878 from 10x, stLFR, and TELL-Seq BLRS platforms; and (d) FXN GAA expansions in 10 × data of four Coriell cell lines. The methods in comparison were EH without OTS (EH_noOTS, blue), EH with OTS (EH_OTS, olive), and barcode-based IRR extraction (barcode, red). For the FXN samples, EH results (with or without OTS) from standard Illumina data (EH_OTS(S), light blue and EH_noOTS(S), light olive) were also included. Only results of the expanded alleles in the samples were shown (therefore two separate tallies for the homozygous FXN GM15850 sample). “Expected” or “ground truth” IRR counts (for the simulations) and repeat counts were plotted as orange horizontal bars together with the exact numbers. Exact IRR or repeat counts were shown on top of each bar. Confidence intervals of the estimates reported by EH were shown as error bars. For the custom barcode-based method, the error bars reported for 10 × data corresponded to the range of estimates calculated independently using each of the two read lengths (see the Methods section). “NA” indicates that results were unavailable for certain samples because of a segmentation fault in EH runs.