Fig. 3: Investigation of misclassified samples for gene expression subtype predictions analysis in test set and gene expression subtype model validation with NOA-04 and TCGA LGG.
From: DNA methylation-based epigenetic signatures predict somatic genomic alterations in gliomas

A Confusion matrix of the test set (n = 72) based on gene expression subtype of transc-based prediction and methyl-based prediction. (L: left; R: right). B “Transc-based CL and methyl-based CL” samples (light yellow) were compared with “transc-based CL and methyl-based MES” samples (dark yellow) for gene expression level (EGFR, NF1) and copy number (CN) segmentation level (EGFR, CDKN2A). All “transc-based CL and methyl-based MES” samples (dark yellow) show lower EGFR expression (P = 0.08, two-sided Wilcoxon rank sum test, the same for all following P values), lower EGFR amplification (P = 0.0056), higher CDKN2A amplification (P = 0.015), and lower NF1 expression level (P = 0.029) than “transc-based CL and methyl-based CL” samples (light yellow). Data is available for n = 20 values of methyl-based CL-subtype, for n = 5 values of methyl-based MES-subtype in EGFR and NF1 expression panels, and for n = 4 values of methyl-based MES-subtype in CN seg value panels. Box plot center line represents median value, lower and upper hinges represent 25th and 75th percentiles, and lower and upper whiskers represent 1.5 interquartile ranges above and below box limits or maximum/minimum, whichever is closest to median. C “Transc-based MES and methyl-based CL” samples (light green) were compared with “transc-based MES and methyl-based MES” samples (dark green) for gene expression level (EGFR) and CN segmentation level (EGFR, CDKN2A). “Transc-based MES and methyl-based CL” samples (light green) show higher EGFR expression (P = 0.04), higher EGFR amplification (P = 1.9 × 10−4), and lower CDKN2A amplification (P = 0.02) compared to “transc-MES and methyl-based MES” samples (dark green). Data is available for n = 9 values of methyl-based CL-subtype in EGFR and NF1 expression panels, for n = 8 values of methyl-based CL-subtype in CN seg value panels, for n = 10 values of methyl-based MES-subtype in EGFR and NF1 expression panels, and for n = 12 values of methyl-based MES-subtype in CN seg value panels. Box plots are drawn as in (B). D Binary genetic alteration prediction results in the external validation set (NOA-04). E Confusion matrix of TCGA LGG samples based on gene expression subtype of transc-based prediction and methyl-based prediction. F Heatmap of TCGA-LGG samples (n = 486) with gene expression subtypes, histology, chr1p19q codel, MGMT promoter methylation, somatic mutations, and CNV. Statistical comparisons between genomic alterations and methyl- and transc-based gene expression is presented in Supplementary Data 3.