Figure 2 | Scientific Reports

Figure 2

From: Sandcastle: software for revealing latent information in multiple experimental ChIP-chip datasets via a novel normalisation procedure

Figure 2

Representation of the normalisation procedure.

(a) Raw density profiles of datasets from two experimental conditions (red and blue), each with three replicates. Differences in the shapes of the profiles indicate experimentally induced biologically relevant changes, but these cannot be compared in their raw state. (b) Quantile normalising all datasets together (i) removes much of the experimentally induced, biologically relevant differences between them. This is not desirable, as these differences cannot then be investigated. Sandcastle quantile normalises the datasets from each experimental condition separately, to maintain these biological differences. Quantile normalisation makes each of the datasets follow the same distribution, meaning all density profiles from each experimental condition overlap each other (ii). This reduces intra-condition – but not inter-condition – technical variations. (c) Each dataset consists of two overlapping sub-populations (dashed lines), background (BG) and enriched (EN). These cannot be fully discerned in the data and only the overall population (solid lines) is known. Sandcastle performs inter-condition normalisation based on estimated background sub-populations. This requires the central (modal) point of the background sub-populations to be identifiable (marked with triangles). If this central point cannot be discerned (for example, if the background sub-population is too small) then the Sandcastle normalisation cannot be applied. (d) Data are first shifted to centre the modal point of the estimated background sub-population on zero (indicated by arrows). (e) To estimate the properties of the whole background sub-population all negative values (the left-hand side of the estimated background sub-population following the shift step) are mirrored into the positive (indicated by arrow; dashed lines show mirrored data). This allows the standard deviation of the estimated background sub-population to be calculated. (f) Data are scaled to the make the calculated standard deviation of the estimated background sub-population 1 (indicated by arrows). (g) The resulting fully normalised datasets have estimated background sub-populations with the same mean (0) and standard deviation (1). Comparisons of data between conditions can now be made relative to this common background. For clarity axis labels are only shown in (a) - all other x- and y-axes are ratio and density values respectively. Vertical grey lines indicate 0, which are only labelled in (f).

Back to article page