Fig. 2: DNA sequence and RNA-sequencing (RNA-Seq) features are individually associated with polyadenylation (polyA) sites.

a The percent of 100 base bins containing each of the three strong polyA signals stratified by the bin not containing (blue) or containing (orange) a polyA site. b Distribution of the inter-bin RNA-Seq features for each 100 base bin stratified by the bin not containing (blue) or containing (orange) a polyA site (RNA-Seq ratio features were standardized using the training set). c RNA-Seq features and DNA sequence features display little correlation (two-sided Pearson Product-Moment) across omics type. The combination of RNA-Seq information and DNA sequence information improves d average precision, and e, precision and recall at a specific prediction threshold (probability >0.50) over each separately. For both d and e, data are presented as mean values ±standard deviation on the test set (n = 5 random train-validate-test splits). Data shown are from the Human Brain Reference data set.