Figure 2: Statistical analysis of whole-genome amplification bias and coverage uniformity.
From: Calibrating genomic and allelic coverage bias in single-cell sequencing

(a) Autocorrelation in the genome coverage of a two-cell RPE-1 DNA library (RPE#1) amplified by multi-strand displacement amplification (MDA). The same library independently sequenced to 0.1 × (open triangles) and to 8 × (solid triangles) and exhibits a correlation above 1 kb that is invariant at intermediate depths (shaded triangles) from downsampling of the 9 × sequencing data. Black-dashed curve represents exponential fitting of the autocorrelation in the 1–100 kb range as 2+0.17e−Δ/lc with a correlation length lc=33 kb (95% confidence interval: 27–42 kb). This correlation is absent in the bulk library sequenced to different depths. Both the bulk and the MDA-generated libraries show a sequencing-fragment-level correlation (lc=100 bp) that decays with the sequencing depth. (b) The identical normalized cumulative coverage at bin size 1/2 × lc evaluated from the 9 × (solid) and from the 0.1 × sequencing (dashed) reflects the same amplicon-level variation due to MDA. The agreement between bin-level (dashed and solid lines) and base-level (red dots) depth-of-coverage curves further suggests that the bin-level variation contributes the dominant amplification bias. See Supplementary Figs 2 and 4–8 for more examples of the correlation (a) and coverage (b) analysis of single-cell sequencing data from different studies. (c) Relationship between genome coverage (% covered at 1 × mean sequencing depth) and amplification bias (measured by the amplitude of the amplicon-level correlation) of single-cell libraries from different studies. Coverage is evaluated at Chr. 1 for both haploid sperms and diploid cells, as well as the SW480 tumour cells (disomic in Chr. 1), and at Chr. 10 (monosomic), Chr. 12 (disomic) and Chr. 13 (disomic) for glioblastoma nuclei. The inverse dependence is fitted with an empirical formula, y=0.86/(1.2+√x) (R2=0.98). (d) Comparison of the cumulative coverage in the most uniform single-cell library from each study. Data were directly evaluated from high-depth sequencing of all samples except the neuron library for which the curve was interpolated from 0.5 × sequencing as in b.