Extended Data Figure 1: ChIP-seq uniform processing pipeline and quality controls.
From: Regulatory analysis of the C. elegans genome with spatiotemporal resolution

a, ChIP-seq raw read data were processed using a uniform processing pipeline with identical alignment, filtering criteria, and standardized IDR binding site identification using SPP. b, Comparison of conservative (replicate) and pooled (pseudo-replicate) binding site calls from the cross-replicate and rescue thresholds, respectively. c, Distribution of NSC scores across 323 ChIP-seq experiments. Experiments are classified as high (blue, NHI = 181), medium (green, NMD = 60) and low quality (yellow, NLO = 82), and the relative fractions of each are indicated in the inset. High- and medium-quality experiments were approved for downstream analysis. d, The fraction of binding sites shared between duplicate, approved ChIP-seq experiments with (NU = 22) unique factor and stage combinations is shown. The fraction shared between the best-overlapping pairs of experiments with matched factor, stage combinations is shown in the light blue distribution. The fraction shared among all duplicates experiments (NP = 24) with matched factor, stage and promoter-driven transcription factor expression is shown in dark blue. The range of fractions shared between true biological duplicates (ND = 2) with matched factor, stage, promoter and ChIP protocol is indicated in dashed lines. For comparison, the fraction shared between randomly sampled pairs (NS = 500) of approved experiments from distinct factors is shown in grey. The median fractions for each distribution are shown. e, Binding site histogram for 187 embryo and larval ChIP-seq experiments with unique factor-stage combinations, and a common ChIP protocol, selected for analysis in this work. The fraction of high- (blue, NHI = 138) and medium-quality (green, NMD = 49) ChIP-seq experiments selected is indicated (inset). f, Analysis of sequence preferences for 21 C. elegans factors (NO) with human orthologue binding data7. The fraction of C. elegans factors for which sequence preferences could be determined (NM = 15, 71.4%) is shown (left). The fraction of factors with conserved sequence preferences (NC = 8, 66.7%, P < 0.05) from NX = 12 human–worm orthologues with determined sequence preferences is shown (right). g, The distribution in the fraction of binding sites with matches to the discovered preferred sequence (motif) is shown for 15 factors. The prevalence of the preferred sequence is evaluated among the top 200, 400, 600, 800 and 1,000 binding sites for each factor (see Methods). h, Discovered sequence preferences for 12 human or worm orthologues. Factors with similar (P < 0.05) and distinct sequence preferences are indicated in dark blue and light blue, respectively. The consensus sequence preference for the ONECUT3 homeobox factor was obtained from ref. 51. i, Saturation analysis of regulatory binding data. Using either binding data from embryonic and larval (L1–L4) stages or L2 larvae only (inset), k ChIP-seq experiments were randomly sampled (50 times each), collapsing overlapping binding sites into binding regions. For each k ChIP-seq experiment, the number of binding regions from 50 iterations is plotted (red points, ± 1 s.d.). For each series, an exponential curve (blue, dashed line) was fit to the data and used to estimate the total number of binding regions. The percentage of binding regions (CBP) observed in the acquired data are reported for each series. j, Amongst genes with annotated TSSs, the fraction of genes with binding observed within the specified window upstream of a TSS is shown. Promoter regions examined correspond to the windows (1) 1,000/100 bp, z(1) 2,000/200 bp, (3) 3,000/300 bp, (4) 4,000/400 bp and (5) 5,000/500 bp upstream or downstream of the TSS, respectively.