Fig. 4: Genomic features of PU.1-induced accessible sites.

a The PU.1 motif log odds score distribution of the consensus MAC PU.1 motif, generated earlier, is shown for each cluster. The median of the specific distribution across all clusters is depicted inside the bean with a conventional boxplot. b De novo-derived co-associated motifs enriched across ATAC-seq-derived PU.1 peak clusters. The best motifs corresponding to known factor families derived from individual clusters are shown. Significance of motif enrichment (hypergeometric test) and the fraction of motifs in peaks (background values are given in parenthesis) correspond to their distribution across all PU.1 peak regions. c Motif co-enrichment networks for individual clusters are shown. The size of each node represents the motif enrichment (fraction of peaks) and co-associated TF motifs (PU.1 masked) are indicated by coloring. The second PU.1 node corresponds to the fraction of peaks containing at least two PU.1 motifs. Edge thickness indicates the frequency of motif co-association within the PU.1 peak. The fraction of PU.1 peaks overlapping with co-associated (PU.1 masked) TF motifs is given below each network. d Evolutionary conservation of the PU.1 motif across the K-means clusters, color coded as in Fig. 3 as illustrated by the PhastCons score. Corresponding sequence-matched random control sets of non-bound motifs are shown in the bottom histogram (in dark gray to light gray). e Genome ontology analysis across the K-means clusters is shown in a stacked bar chart. The association with the individual regions is given as the fraction of peaks in each cluster. f Predictability of PU.1 binding from motif score, conservation score, nearby presence of co-associated motifs, and chromatin accessibility before and after PU.1 induction. Shown are ROC curves of logistic models with different sets of predictors (AUC in parentheses), trained and evaluated on separate subsets of the data. g Representative ATAC-seq footprints across motif-centered, cluster-associated peaks (as indicated by coloring). Corresponding footprints of control cells (mutPU.1) are shown in gray. Smaller histograms in the upper right corner zoom into the central part of the main graph. The position of the PU.1 motif is indicated by two vertical dashed lines. a–g Source data are provided as a Source Data file.