Extended Data Fig. 5: Classification of ChIP–seq data by genomic footprinting.
From: Global reference mapping of human transcription factor footprints

a, Precision-recall (PR) curve for predictions of CTCF motif occupancy (that is, overlap CTCF ChIP–seq peak) based on footprint posterior probabilities in CD20+ B cells. Black dot indicates precision and recall at posterior footprint probability threshold of >0.99. Blue, PR curve computed after shuffling ChIP–seq peak labels. b, Area under precision-recall curve (AUPR) computed for 21 ENCODE cell types and/or replicates (n = 33 total datasets). c, Distribution of MOODS scores stratified by motif overlap with a ChIP–seq peak and/or a genomic footprint in CD20+ B cells. d, ChIP–seq signal intensity at peaks overlapping a footptrinted CTCF motif vs. non-footprinted motifs in CD20+ B cells. e, Relative ChIP-seq signal at footprinted and non-footprinted CTCF peaks containing a motif across 21 cell types (n = 32 total datasets). f–m, PR curves and relative ChIP–seq intensities for ATF3 (f, g), GATA1 (h, i), GABPA (j, k) and NFE2 (l, m) in K562 cells. For all TFs analysed, only motifs overlapping DHSs were considered. DHS, ChIP-seq and motif models are described in Supplementary Table 3. Boxes indicate median and IQR. Whiskers, 5th and 95th percentiles.