Extended Data Fig. 5: Analysis of homotypic motif clusters within the keratinocyte epigenome.
From: The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation

(a) Analysis of motif counts on chromatin accessibility using synthetic sequences. Synthetic scrambled background sequences were embedded with varying number of instances of each predictive motif. The neural network was used to predict chromatin accessibility for each synthetic sequence. Left: Each curve summarizes the predicted accessibility with increasing motif density for each motif averaged over 100 random synthetic backgrounds. Middle/right: Predicted chromatin accessibility for increasing density of FOSB, and CEBPD motifs. Each black curve represents a specific random synthetic background sequence, while the red curve is the average pattern across all backgrounds. (b) Relationship between motif affinity and motif density in CREs containing predictive motif instances. Motif affinity is estimated as the average motif PWM match log-odds scores of all predictive instances in a CRE. Motif density is the number of predictive motif instances in each CRE. We observe a striking tradeoff between motif density and the upper limit of average motif affinity. Right: CEBPD motif instances. Left: GRHL motif instances. (c) Motif PWM match scores as a function of distance from the ATAC-seq summit. Left: motif PWM match scores from all motif instances for CEBPD and GRHL motifs. Right: motif PWM match scores for predictive motif instances for CEBPD and GRHL motifs. (d) Proposed principles of cell-type specific homotypic motif clusters. As number of motif instances increases or as motif affinities in a region increase, accessibility increases. The suboptimization of motif sites, particularly when there are more motif instances within a region, acts as an upper limit to prevent ectopic accessibility. Motif affinities are strongest near the accessibility summit.