Figure 2: Imputed data are a close match to observed datasets. | Nature Biotechnology

Figure 2: Imputed data are a close match to observed datasets.

From: Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues

Figure 2

(a) Visualization of one of the randomly selected 200-kb regions illustrates high-resolution concordance between observed (blue) and imputed (red) signal tracks. Imputed tracks are generated at 1-bp resolution for DNA methylation and 25-bp resolution for all other marks and trained without using the observed track. For each mark (row), we show a randomly selected sample (EID from Fig. 1a), which also contains observed data for comparison (light purple entries in Fig. 1a). This region was chosen among nine randomly selected 200-kb regions (Supplementary Fig. 3) as the one with the most signal across all marks. Larger 1.5 Mb context, and example 5-kb close-up are shown in Supplementary Figure 3c, illustrating concordance at multiple resolutions. (b) Visualization of 2,000 randomly selected 25-bp regions (columns), and their signal (yellow, high; blue, low) across up to 127 samples (rows, colored as in Fig. 1a), for tier 1 marks (yellow sidebar) and RNA-seq and DNA methylation (green sidebar) (tier 2 and tier 3 marks are shown in Supplementary Fig. 4). Rows and columns are clustered for each mark independently to highlight structure based on observed data (top), and imputed data (generated without using the corresponding observed dataset) are shown below, in the same order, showing clear similarity. WGBS, whole genome bisulfite sequencing. (c) Quantitative comparison of observed signal correlation for ChromImpute (red), averaging the mark signal from all other samples (green), and the best-case for selecting a single sample (blue), which is not a realistic method when the target mark signal is not known, as it would be needed to determine the single-best sample. Average correlation is computed based on all samples for which both observed and imputed signals are available. ChromImpute shows consistently higher correlation of observed signals than the two alternate methods (including the unrealistic best case) for all marks. For additional comparisons see Supplementary Figures 5–7. (d) Average AUC for recovering bases covered by a narrow peak call on observed data10 when ranking based on predicted signal.

Back to article page