Extended Data Fig. 4: Integrative analysis of MARCS with ENCODE ChIP-seq datasets.
From: Decoding chromatin states by proteomic profiling of nucleosome readers

a, Schematic representation of the integrative NGS dataset analysis. Briefly, the peak data for the datasets was binned at 1 kb resolution. For each pair of datasets, the pairwise co-occurrence matrix was recorded, tracking the number of bins in which the peaks overlap. The marginal and joint entropies, together with the mutual information (MI), were computed from the co-occurrence matrices. Note, as the mutual information measures the entropy shared by the two proteins (venn diagram) it can be normalized via the entropy of one of the two factors. Since in MARCS we are interested in the explanatory power of chromatin features on protein binding, by convention we always normalized by the entropy of the protein. The normalized mutual information estimates are therefore interpretable as the fraction of uncertainty in protein localization that can be explained by the feature. For details see online methods. b, Summary of the relationships between MARCS feature effect estimates and NGS datasets for the Tier 1 ENCODE K562 cell line. The ChIP-seq, ATAC-seq, and DNase-seq experiments from ENCODE30 are plotted in columns together with the chromatin state annotations from the NIH Roadmap1. The rows represent MARCS protein groups subdivided by their feature effect estimates, only groups with ≥5 proteins are shown. Each cell of the heatmap indicates two measurements that contrast the normalized MI (see a) for proteins that MARCS predicts to be strongly recruited or excluded by the feature to the normalized MI of other proteins (i.e. proteins neither strongly recruited nor strongly excluded by the feature, including proteins with no feature effect estimate at all). The colour indicates the difference between the mean log2 of the normalized MI estimates in the feature-associated group versus the mean of the log2 estimates of other proteins. The size and the border shading of the square indicates the statistical significance of the difference (Mann-Whitney U test, two-sided, Benjamini/Hochberg-adjusted). See the colour bar and the legend. Significant red colours indicate that a given chromatin feature ChIP-seq experiment is more predictive of ChIP-seqs of proteins associated with a given MARCS-feature than ChIP-seqs of an average protein. Significant blue colours indicate the opposite. The rows and columns were clustered hierarchically to highlight similarities. c, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to H3K4me1 and H3K4me3 ChIP-seq peaks. The fraction of entropy of a protein or feature explained by the information about H3K4me3 and H3K4me1 peaks is plotted on the x and y axes, respectively. Larger values indicate stronger mutual information between the peak distributions. The dotted x = y line indicates where H3K4me1 and H3K4me3 have exactly the same explanatory power. The shaded area corresponds to ± 0.2 radians from this line. MARCS feature estimates for H3K4me3 are indicated in red (strong recruitment) or blue (strong exclusion). Proteins without strong recruitment or exclusion are shown in grey, no feature effect estimate is marked by “X”. d, Integrative analysis as in c performed for NIH Roadmap promoter (x axis) and enhancer (y axis) chromatin states. Note, that MARCS H3K4me3 readers again share higher mutual information with the promoter chromatin state than the enhancer state. Only a few BAF complex subunits (SMARCE1, ARID1B) show a weak preference for enhancers. Data representation is as in c. e, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to one of the H3K4me3 ChIP-seq replicates (highlighted in b). Normalized MI (i.e. fraction of entropy of proteins/chromatin features explained by the H3K4me3 ChIP-seq) is plotted on the X axis, while the Kendall correlation coefficient of overlapping peak heights is plotted on the Y axis. Protein datasets are plotted in grey, while chromatin feature and accessibility datasets are plotted in green and yellow, respectively. Proteins strongly recruited to H3K4me3 based on their MARCS feature effect estimates are highlighted in red, and strongly excluded proteins are highlighted in blue. Note that proteins strongly recruited to H3K4me3 have, on average, higher normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.01). f, Data in e plotted with proteins strongly recruited to H3K27me3 based on MARCS feature effect estimates highlighted in red. Note that these proteins have on average lower normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.05). g, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to one of the H3K4me1 ChIP-seq replicates (highlighted in b). Data presented as in e. Proteins strongly recruited to H3K4me3 based on MARCS feature effect estimates are highlighted in red, and strongly excluded are highlighted in blue. Note that there is no statistically significant difference between these proteins and other proteins (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.05). h, Data in g plotted with proteins strongly recruited to H3K27me3 based on MARCS feature effect estimates highlighted in red. Note that these proteins have on average lower normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.01). i, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to the H2A.Z ChIP-seq (highlighted in b). Data presented as in e. Proteins strongly recruited to H2A.Z based on MARCS feature effect estimates are highlighted in red. Note that these proteins have on average higher normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.05). j, Data in i plotted with proteins strongly recruited to H4ac based on MARCS feature effect estimates highlighted in red. Note that these proteins have on average higher normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.01).