Fig. 2: Deconvolution of tissues on Illumina 450 K array. | Nature Communications

Fig. 2: Deconvolution of tissues on Illumina 450 K array.

From: Benchmarking of methods for DNA methylome deconvolution

Fig. 2

a Matrices of marker CpGs (n = 400) used for building the tissue methylation reference (left panel) and in silico mixtures (right panel) of 450 K data. Samples for four tissues were included: blood (reference: n = 49, validation: n = 6), kidney (reference: n = 21, validation: n = 85), liver (reference: n = 25, validation: n = 52), and small intestine (reference: n = 17, validation: n = 4). b Boxplots showing F-statistics for tissues at their respective marker CpGs, between reference (n = 112 biologically independent samples) and validation datasets (n = 147 biologically independent samples). The boxplots present median values and quartiles, whiskers the minimum and maximum values, and dots the individual data points. c Scatter plots showing true proportions (x-axis) and predicted proportions (y-axis) in percentages for all tissues for the best-performing (left-upper) and worst-performing (right-lower) algorithm-normalization combinations on 200 in silico mixtures. R2 and p-values were calculated using Spearman’s rank correlation test. d Deconvolution accuracy represented as boxplots showing accuracy scores for the different deconvolution methods, normalization methods, and cell types on 200 in silico mixtures. Black diamond shapes represent median values, colors represent tissues. The boxplots present median values and quartiles, whiskers the minimum and maximum values, and dots the individual data points. P-values were determined using two-tailed FDR-adjusted Dunn’s tests. e Performance of deconvolution on 200 in silico mixtures. Algorithm-normalization combinations are visualized as circles. Spearman’s R2 is represented by color, and root mean squared error is represented by size. Rows show deconvolution algorithms, columns show normalization methods. f Boxplots showing Spearman’s R2 and RMSE values for deconvolution on 200 in silico mixtures using markers selected by either custom or IDOL algorithm. The boxplots present median values and quartiles, whiskers the minimum and maximum values, and dots the individual data points. P-values were calculated using a two-tailed Wilcoxon rank-sum test. P-values for R2: small intestine = 5.2 × 10−14; blood = 2.2 × 10−16; liver = 2.2 × 10−16; kidney = 2.2 × 10−16. P-values for RMSE: small intestine = 4.2 × 10−6; blood = 1.9 × 10−15; liver = 2.2 × 10−16; kidney = 2.2 × 10−16. ‘X’ symbol represents missing values. Source data are provided as a Source Data file. Exact p-values are added in the Source Data file.

Back to article page