Figure 5 | Scientific Reports

Figure 5

From: Improved TGIRT-seq methods for comprehensive transcriptome profiling with decreased adapter dimer formation and bias correction

Figure 5

Effect of 5′- and 3′-end sequences on the representation of miRNAs in TGIRT-seq datasets. (A) Principal component analysis biplot for over- and under-represented miRNAs in TGIRT-seq of the Miltenyi miRXplore miRNA reference set in combined datasets for the three technical replicates obtained using the NTT adapter. The first three bases from the 5′- and 3′ ends of over- and under-represented miRNAs (defined as those whose log2CPM was at least one standard deviation higher or lower, respectively, than the mean log2CPM for all miRNAs in the reference set; Supplementary Fig. S6) were subject to principal component analysis. The first two principal components are shown. Each point indicates a miRNA, with over- and under-represented miRNAs colored as indicated in the Figure. (B) Relative importance of features of the first principal component. The fitted values from the first principal component are plotted for each base at each nucleotide position (feature) in ascending order. 5′- and 3′-end nucleotides are color coded as indicated in the Figure. (C) Random forest regression modeling of miRNA-seq quantification errors. A random forest regression model (R2 = 0.81) based on the first three 5′- and 3′-end positions was trained on the 962 miRNAs in the combined datasets for the 3 technical replicates obtained using the NTT adapter, and the predicted measurement errors (∆log10CPM predicted by the model) were plotted against the observed measurement errors (∆log10CPM obtained directly from sequencing data) for each miRNA. The blue line shows the fitted linear regression between the observed and predicted measurement errors, and the red line indicates hypothetical perfect prediction with slope = 1 and y-intercept = 0. (D) Relative importance of the position-specific preferences in TGIRT-seq. The relative importance of the 5′- and 3′-end positions from the random forest regression model were plotted in descending order. Each bar represents the relative importance of the indicated position color coded as indicated in the figure.

Back to article page