Figure 2

Predictive power of the identified lncRNAs. (A) Workflow used to assess the predictive power of differentially expressed lncRNAs (B) Box plots showing the distribution of accuracy, sensitivity and specificity of logistic regression (LR), random forest (RF), and Support Vector Machine (SVM) multivariate classification models across 100 iterations; for each box plot, the exact values of mean ± standard deviation are displayed (C) Bar plots representing the ‘feature stability’ of 34 lncRNAs identified to be differentially expressed in 50 iterations or more. The feature stability is defined as the proportion of runs that a lncRNA is identified to be differentially expressed and thus selected as a feature for the multivariate predictive models. More stable features are less sensitive to data partitioning. Log fold change and p-values of lncRNAs in validation sets were estimated and averaged across 100 iterations. Dashed line on p-value bar chart is the indicator of the 0.05 cutoff.