Figure 2

TF binding prediction performance and key TF prediction in cell conversion strategies. (a) TF binding prediction evaluation by Precision-Recall (PR) Area Under the Curve (AUC) in a boxplot. The performance of 230 TFs in five cell lines is cross-validated and plotted. Performance is compared to random sampling. All individual models perform significantly better than the random sampling method: TF motif z-scores (P Wilcoxon < 7.08e-07), average ReMap coverage (P Wilcoxon < 4.82e-18), bidirectional TPM (P Wilcoxon < 7.79e-05), and H3K27ac ChIP-seq (P Wilcoxon < 3.25e-03). Combined models are represented by the black dots. Similarly, all combined models perform significantly better than random sampling: TF motif scores, average ReMap coverage, and bidirectional TPM (P Wilcoxon < 3.08e-21); TF motif scores, average ReMap coverage, and H3K27ac ChIP-seq (P Wilcoxon < 2.30e-20); TF motif scores, average ReMap coverage, bidirectional TPM, and H3K27ac ChIP-seq (P Wilcoxon < 6.79e-21). The whiskers represent standard deviation, edges depict the inter-quartile ranges, and the black centre line illustrates the median. The model with the highest median score is depicted in red. (b) PR AUC for generalised models to predict TF binding for all other TFs in a boxplot. Performance is compared to random sampling as in (a). All general models perform significantly better than the random sampling method: TF motif scores, average ReMap coverage, and bidirectional TPM (P Wilcoxon < 1.02e-20); TF motif scores, average ReMap coverage, and H3K27ac ChIP-seq (P Wilcoxon < 7.98e-21); TF motif scores, average ReMap coverage, bidirectional TPM, and H3K27ac ChIP-seq (P Wilcoxon < 6.14e-21), visualized as in (a). (c) Summary of experimentally validated TFs in cell conversion strategies: skin fibroblasts to five different target cell types. TF predicted by ANANSE-CAGE are highlighted in bold. (d) Boxplot representing the fraction of (experimentally validated) TFs that are ranked in the top 10 per cell conversion strategy. Y-axis depicts the various TF prediction methods that are able to rank predicted TFs based upon their respective algorithm. Individual conversion types are depicted as dots. ANANSE-CAGE is depicted in red and other TF prediction methods are shown in grey. The whiskers represent standard deviation and edges depict the inter-quartile ranges. The plus sign represent the mean and the black centre line the median. (e) Boxplot representing the average inferred TF rank per cell conversion strategy for each of the TF prediction methods, visualized as in (d). (f) The PR AUC of the five cell conversions shown as a boxplot, visualized as in (d).