Figure 7

Saturation curves and differences in coverage for the 962 miRNAs in the Miltenyi miRXplore miRNA reference set for TGIRT-seq with or without different bias correction compared to published datasets for established small RNA-seq methods. For published datasets containing additional miRNAs, in silico subsamples containing only the 962 reference set miRNAs were used for the comparisons. (A) RNA-seq saturation curves. The curves show the number of reference set miRNAs with at least 10 reads at bins of 200 reads. As additional reads were included, the number of miRNAs with at least 10 reads increased. Curves were truncated at 3 million reads. The dotted red line at the top indicates the number of miRNAs in the Miltenyi miRXplore reference set. Each curve represents combined datasets, color-coded by the sequencing method as shown in the Figure for the best (4N ligation/NEXTflex; n = 24) and worst (NEBNext; n = 12) methods from the comparison of Giraldez et al.36, as well as TGIRT-seq (n = 3 for libraries prepared with the NTT, MTT, and NTC adapters), TGIRT-seq with the NTTR adapter (n = 3), TGIRT-seq with the NTT adapter and an R1R adapter containing six randomized 5′-end positions (NTT/6N; n = 1), and the TGIRT-CircLigase method (n = 1; Mohr et al.6). Other library preparation methods (gray lines) include NEBNext, TruSeq and CleanTag. (B) Violin plots of miRNA abundance in datasets obtained by different methods. The plots show the distribution of log10CPM for each miRNA in the reference set for each library preparation method (miRNA count = 2,886 for NTTc, 2,885 for NTCc, 23,088 for 4N ligation, 961 for TGIRT-CircLigase, 2,886 for NTTR, 5,522 for NEXTflex, 2,886 for MTT, 2,886 for NTC, 2,886 for NTT, 962 for NTT/6N, 30,757 for TruSeq, 3,815 for CleanTag, and 11,452 for NEBNext). NTTc and NTCc denote TGIRT-seq datasets obtained using the NTT or NTC adapters that were computationally corrected using the random forest regression model trained with the combined NTT datasets (Fig. 5C,D). The black horizontal line indicates the expected CPM values (CPM = 1,039.5) for each miRNA for a uniform distribution of 1,000,000 reads to 962 miRNAs (i.e., unbiased sampling for each miRNA). The library preparation and correction methods are ordered from the lowest to highest deviation between the median CPM (white point within the violin) and the expected CPM. The black boxes in the violins indicate the interval between first and third quartiles, and the vertical lines indicate the 95% confidence interval for each method.