Figure 4 | Scientific Reports

Figure 4

From: Computational identification of putative lincRNAs in mouse embryonic stem cell

Figure 4

Sequence feature selection for prediction model.

(a–e) show five representative motifs of the prediction model, including KLF5, SP2, GC-box, SP1 and EGR1. (f) SCAD feature selection. K-mers selection with the best performance to distinguish the lincRNA TSS regions from the random regions. (g–k) show the sequence features between the positive and negative sets, including CpG o/e (g) CGI coverage (h) and repeat element (without simple repeats, low complexity regions and satellite repeats) coverage (k). The red line represents the positive set, and the blue line represents the negative set. Shadow region represents the 5–95% bootstrap confidence intervals of the statistics.

Back to article page