Extended Data Fig. 4: Association of CGI clusters with genomic and transcriptomic features.
From: Acute lymphoblastic leukemia displays a distinct highly methylated genome

a) Schematic of CGI clustering: Commonly covered CGIs across precursor T cells and T-ALL patients were used. Stably highly or lowly methylated CGIs across all samples were excluded from the clustering. The remaining variably methylated CGIs were clustered based on their average methylation across the samples using consensus clustering with partitioning around medoids (PAM) distance measure and 100 repetitions yielding four clusters of CGIs. b) Density of CGI characteristics per cluster: Number of CpGs (top), length in base pairs (middle) and GC content (bottom). Consistently lowly methylated CGIs and cluster 1 exhibit the highest number of CpGs, GC content and length compared to other, more methylated clusters. c) Log2-transformed expression of genes associated with a promoter CGI per CGI cluster in healthy precursor T cells. Lines denote the median, edges denote the IQR and whiskers denote either 1.5 × IQR or minima/maxima (if no point exceeded 1.5 × IQR; outliers were omitted). The number of genes per boxplot is indicated at the top. Genes associated with the low group/cluster 1 are mostly expressed while genes associated with the remaining clusters are mostly already silenced in precursor T cells indicating that more methylation at their promoters will in many cases not influence their expression status. d) Overrepresentation analysis of genes associated with promoter CGIs of the low group and cluster 1 in biological processes. Genes with a promoter CGI of the low group are mainly involved in cell maintenance functions while genes with a promoter CGI belonging to cluster 1 show enrichment in the MAPK and JNK cascades. e) Boxplot of CGIs across all CGI clusters for ALL subtypes, other hematopoietic malignancies and solid tumors (low (n = 6,795), cluster 1 (n = 3,304), cluster 2 (n = 2,034), cluster 3 (n = 2,240), cluster 4 (n = 1,285), high (n = 2,827), samples per subtypes were averaged per CGI). Lines denote the median, edges denote the IQR and whiskers denote either 1.5 × IQR or minima/maxima (if no point exceeded 1.5 × IQR; outliers were omitted). Despite being defined on T-ALL subtypes, the trend of increasing methylation from low to high can be observed in B-ALL and other tumors as well.