Fig. 1

Control ChIP-seq data reveals extensive genetic variation between functionally equivalent ENCODE cell lines. a Pearson's correlation coefficient (PCC) of TF binding and DNase hypersensitivity profiles between pairs of erythroid cell lines (E = erythroblast, G = G1E-ER4, M = MEL) at commonly called peaks. PCC ± 95% CI. b PCC of CTCF binding between indicated tissues and erythroid tissues (G1E, G1E-ER4, MEL). Mean ± SEM, number of comparisons listed in figure. c Precision and recall of using input ChIP-seq data in GM12878 cells to identify homozygous variants relative to the hg19 reference genome. Vertical lines denote the number of input ChIP-seq reads available for the murine erythroid cell lines. d Number of discriminatory SNP (discSNP) variants between each pair of erythroid cell lines. e Median percent signal loss (relative to stronger binding signal) at TF peaks or DNase hypersensitivity (DHS) peaks between erythroid cell lines, separated by the number of discSNPs located within the TF/DHS peak. TF binding loss % with 0 discSNPs reflects the background level of variation in TF peak intensities between cell lines despite identical underlying DNA sequences. DNase percentages are normalized to the 0 discSNP data point within peaks of identical length. *Wilcoxon's p < 0.05 for comparison to peaks lacking discSNPs. Vertical bars represent 95% confidence intervals by bootstrapping. f Schematic of the overall analysis approach that uses genetic variants to probe determinants of TF binding and chromatin accessibility