Extended Data Fig. 4: Comparison of different statistical models for predicting all STARR-seq peaks using a 6-feature model.
From: Supervised enhancer prediction with epigenetic pattern recognition and targeted validation

The performance of the different statistical models to integrate the information from six epigenetic features is shown. a) The numbers within the parentheses refer to the AUROC and AUPR for predicting the STARR-seq peaks (single core promoter) with histone peaks while the numbers outside the parentheses refer to the AUROC and AUPR for predicting STARR-seq peaks identified after combining multiple core promoters. b) The individual ROC and PR curves for each statistical model. c) The contribution of the matched filter score for each epigenetic feature to the different integrated models. The mean value is displayed in the bar plot while the error bars show the standard deviation of feature weights measured by ten-fold cross validation. d) We evaluated the accuracy of the models using different amounts of training data. The AUPR of the model increases with increasing amount of training data until it starts to saturate around 70% of the data. The mean value is displayed in the bar plot while the error bars show the standard deviation of feature weights measured by ten-fold cross validation.