Fig. 2: RF classification.
From: Mechanistic analysis of enhancer sequences in the estrogen receptor transcriptional program

a Venn diagram of datasets used in this study. Regions colored orange (3245) and green (489) make up the universe set of our classification examples. The green region contains positive examples (449 out of 489, see “Methods”), and orange region contains negative examples (1669 out of 3245, see “Methods”). b Boxplot illustrates pairwise adjacency scores for a selection of TF pairs with statistically significant difference between positive and negative sets (p value < 0.05). Blue and red boxes represent enhancers in the positive and negative sets, respectively. Y-axis represents the z-score normalized pairwise adjacency score, which reflects the frequency of co-occurrence of motifs of the TF pair within 50 bp of each other. c Boxplot shows TF affinity scores for four TFs with significantly different affinity scores between positive (blue) and negative (red) sets of enhancers (p value < 0.05). Y-axis represents z-score normalized sum of Likelihood Ratios under the PWM model versus the background model. d Receiver operating characteristic curve indicating the performance of Random Forest classifier on the test set. e Precision-Recall curve shows performance of Random Forest classifier on the test set. Note that for all the boxplots in this manuscript center line indicates median; box limits are upper and lower quartiles; whiskers show 1.5x interquartile range, and outliers are removed to aid visualization.