Abstract
The mechanistic link between the complex mutational landscape of de novo methyltransferase DNMT3A and the pathology of acute myeloid leukemia (AML) has not been clearly elucidated so far. Motivated by a recent discovery of the significance of DNMT3A-destabilizing mutations (DNMT3AINS) in AML, we here investigate the common characteristics of DNMT3AINS AML methylomes through computational analyses. We present that methylomes of DNMT3AINS AMLs are considerably different from those of DNMT3AR882 AMLs in that they exhibit increased intratumor DNA methylation heterogeneity in bivalent chromatin domains. This epigenetic heterogeneity was associated with the transcriptional variability of developmental and membrane-associated factors shaping stem cell niche, and also was a predictor of the response of AML cells to hypomethylating agents, implying that the survival of AML cells depends on stochastic DNA methylations at bivalent domains. Altogether, our work provides a novel mechanistic model suggesting the genomic origin of the aberrant epigenomic heterogeneity in disease conditions.
Similar content being viewed by others
Introduction
Recent sequencing efforts of acute myeloid leukemia (AML) genomes and exomes have identified DNMT3A as one of the most recurrently mutated epigenetic modifiers whose mutation is associated with adverse patient outcome1. DNMT3A encodes a de novo DNA methyltransferase that establishes DNA methylation patterns during the development of mammalian stem cells2, but the precise molecular mechanism underlying the initiation and progression of AML mediated by mutant DNMT3A has not been clearly elucidated. One of the characteristics that obscures the identification of the mechanistic role of mutant DNMT3A in AML is its intricate mutational landscape. In AML, about 60% of the DNMT3A mutations cause amino acid substitution of arginine at position 882 (R882) and the remaining ~40% of mutations occur at many different positions throughout the functional domains of DNMT3A3. Thus, much attention so far has been primarily drawn on the significance of DNMT3A R882 mutations in AML due to their prevalence. The results of such studies are gradually reaching at the consensus that mutant DNMT3AR882 elicits dominant negative effect by hampering wildtype DNMT3A from forming catalytically active homotetramers4. On the contrary, for DNMT3A mutations other than the R882 mutation (non-R882 mutations), much of their clinical implication and mechanistic role in AML pathogenesis still remain to be elucidated. Recently, a comprehensive biochemical characterization of 253 variants across DNMT3A gene suggested that a subset of disease-associated DNMT3A variants trigger the destabilization of the protein followed by its proteasomal degradation5. Intriguingly, these variants inducing the instability of DNMT3A (DNMT3AINS), and perhaps reduced intracellular concentration of intact DNMT3A, seemed to confer high fitness advantages to the cells of hematopoietic lineage, but the underlying molecular mechanism linking DNMT3AINS and the progression of hematological disorders has not been clarified thoroughly.
Meanwhile, the epigenetic diversity of cancer cells, primarily in terms of the heterogeneity of DNA methylation patterns, is increasingly acknowledged as an important factor that contributes to the increased adaptive potential of the tumor, which leads to adverse outcome, treatment resistance, or shorter interval to relapse rate in a variety of cancer types6,7,8. In chronic lymphocytic leukemia, it has been reported that locally disordered methylation patterns at promoter regions are associated with increased transcriptional variability as well as adverse patient outcomes6, and its implication for the treatment resistance and disease relapse has been reported in diffuse large B-cell lymphoma9. The role of DNA methylation heterogeneity in AML has also been studied recently10. Given these broad clinical implications of DNA methylation heterogeneity, it has been widely accepted that the increased fitness of cancer cell population conferred by the epigenetic diversity is pivotal. However, the connection between a specific subset of DNMT3A variants and the extent of disorder of DNA methylation patterns have not been characterized so far.
Here, we investigate the molecular-level impact of DNMT3AINS variants on the DNA methylomes and transcriptomes of AML patients. Particularly, we explore the association between DNMT3AINS and the concordance of DNA methylation states, in addition to the DNA methylation features that are routinely analyzed, such as promoter methylation levels or differentially methylated regions (DMRs). For the direct and robust examination of the methylomes of AML patients with DNMT3AINS, we extensively reanalyzed publicly available methylation profiles of AML patients from two large independent cohorts7,11. Furthermore, we performed reduced-representation bisulfite sequencing (RRBS) on our own cohort for validation. Through these analyses on diverse cohorts, we show DNMT3AINS AMLs exhibit decreased local methylation concordance as well as epigenetic cellular diversity that are associated with the transcriptional heterogeneity of genes having roles in determining the leukemic stem cell niche. Given the previous studies showing the oligomerization-dependent shift of catalytic processivity of DNMT3A and the concentration-dependent oligomerization preference of DNMT3A, this study suggests an interesting model of pathogenesis having DNMT3AINS variants as the genetic origin of epigenetic instability.
Results
Definition of DNMT3A INS variants
To obtain a predefined set of DNMT3AINS variants, we utilized previous experimental results of the protein stability assay measuring the stability scores of mutant DNMT3A protein in terms of the stability ratio normalized to WT DNMT3A5. From the stability ratios for 253 disease-associated variants affecting 248 unique amino acid residues, we could obtain stability scores for each of the 248 residues by assigning average stability ratios for all substitutions associated with that residue. Of note, we observed stability ratios for variants occurring at the same residues are more similar with each other (Supplementary Fig. 1, p = 0.023, chi-squared test). Since the resulting stability scores displayed a bimodal distribution, we could naturally divide them into two groups, namely destabilizing (n = 125) and non-destabilizing (n = 123) residues, based on the score 0.75 (Fig. 1a, Supplementary Data 1). To further justify this grouping, we investigated the full-length structure of DNMT3A (obtained from AlphaFold Protein Structure Database12, Uniprot ID Q9Y6K1) and found that destabilizing residues are enriched in \({{{\rm{\beta }}}}\)-sheets behind the helical tetramer interface compared to non-destabilizing residues (Supplementary Fig. 2a–e). Furthermore, destabilizing residues showed higher predicted local distance difference test (pLDDT) values, which generally represent greater evolutionary conservation and structural importance of the residues (Supplementary Fig. 2f). Given these biochemical, structural and evolutionary grounds, we defined a DNMT3AINS variant as a point mutation occurring at destabilizing residues as well as nonsense and frameshift mutations occurring at any position of the protein to cover a broader spectrum of instability-inducing variants. Meanwhile, point mutations occurring at non-destabilizing positions other than R882 were defined as DNMT3AOther variants.
a Distribution of DNMT3A stability scores for 248 residues across DNMT3A protein. The dotted line denotes the threshold value 0.75 dividing destabilizing and non-destabilizing residues. b Description of local pairwise methylation discordance calculation. c Comparison of genomewide LPMD between different DNMT3A mutation subclasses in diagnosis and relapse AML samples from Li2016 cohort. Statistical significances from two-sided Mann–Whitney U tests between DNMT3AWT and DNMT3AINS subclasses are shown. d Schematic diagram illustrating the multiple linear regression analysis predicting LPMD values based on mutation status, age and gender. e Accuracy of LPMD values predicted by multiple linear regression analysis. f Coefficients and significances of regression coefficients. g Correlation between DNMT3A stability score and genomewide LPMD in Li2016 cohort. Pearson’s correlation coefficient and corresponding p value is shown. h Array-based LPMD of TCGA-LAML samples. Bars denote 95% confidence intervals. P values from two-sided Mann–Whitney U tests are shown. i Correlation between DNMT3A stability score and genomewide LPMD in TCGA-LAML cohort. Pearson’s correlation coefficient and corresponding p value is shown. j Genomewide LPMD comparison in SNUH cohort. In (c, f), ***p < 0.001, **p < 0.01, *p < 0.05, two-sided Mann–Whitney U test; In (f), CEBPA-sm, CEBPA with single mutation; CEBPA-dm, CEBPA with double mutation; In (g, i, j), shades denote 95% confidence intervals; For boxplots, the center line denotes the median, the upper and lower box limits denote upper and lower quartiles, and the whiskers denote 1.5× interquartile range.
DNMT3A INS AMLs show locally disordered DNA methylation patterns
DNMT3A exerts its catalytic activity by forming oligomers. Intriguingly, the mechanism of DNMT3A-mediated de novo methylation is shown to be dependent on its oligomeric state13. A homotetrameric complex exhibits processive catalysis in which the addition of methyl group occurs consecutively on CpGs within a local stretch of DNA, whereas a dimeric complex shows distributive catalysis in which the complex rapidly dissociates from the DNA after a catalysis. Since the oligomeric state of DNMT3A was shown to be dependent on the intracellular concentration of the protein14, we hypothesized that the distributive de novo methylation mediated by dimeric DNMT3A will be prevalent in DNMT3AINS AMLs. To quantify the extent of the processive or distributive de novo methylation from the traces left on the methylomes of AML patients, we utilized a computational measure called local pairwise methylation discordance15 (LPMD; Fig. 1b). LPMD is a per-sample measure that represents the extent to which a pair of nearby CpGs have different methylation states. Since the processive methylation will make a pair of CpG sites at a close distance both methylated, LPMD in turn reflects the processivity of DNMT3A, even though we cannot simply rule out the effects of other factors including TET-driven demethylation. We note that the calculation of LPMD does not involve normalization by methylation level even though the average levels of LPMD depends on it. This is primarily because we wanted to measure absolute levels of local DNA methylation concordance, not the relative levels given average methylation levels. We also note that all the LPMD calculations were controlled by using the same set of CpG dinucleotides for each cohort.
We conducted a reanalysis of the enhanced reduced-representation bisulfite sequencing (eRRBS) data provided by Li et al.7 (hereafter called Li2016 cohort) for 94 paired diagnosis and relapse samples from 47 AML patients. We first identified somatic mutations for all the 94 AML samples and compared their LPMDs altogether according to their DNMT3A mutation states. As expected, LPMD steadily increased as the distance between CpG pairs increased, reflecting the local homogeneity of DNA methylation states (Fig. 1c). Surprisingly, we observed that DNMT3AINS AMLs showed significantly higher genomewide LPMD than any other DNMT3A subclasses (p = 0.001, two-sided Mann–Whitney U test between WT and DNMT3AINS for 2bp-away CpG pairs; Fig. 1c), suggesting the dysregulation of local correlations of DNA methylation states in DNMT3AINS. To ensure that the association between DNMT3AINS and locally disordered methylation states remains significant even after accounting for other co-occurring mutations, ages, and genders, we built a multivariate linear regression model predicting LPMD (Fig. 1d, e) and found that the association between DNMT3AINS and high LPMD value was indeed significant after adjusting for such factors (Fig. 1f). Notably, DNMT3AINS was shown to be the only DNMT3A mutation subclass that was positively associated with LPMD (multiple linear regression coefficient of 0.0095, p = 8.5 × 10−3), which was in stark contrast to the negative association of the other DNMT3A mutation subclasses (multiple linear regression coefficient of −0.0093 and −0.0083 and p values of 3.8 × 10−3 and 1.7 × 10−3 for DNMT3AR882 and DNMT3AOther, respectively) on LPMD. Since LPMD values are expected to be higher with greater extent of methylation levels, we examined whether the observed increase of LPMD can be decoupled from DNA methylation levels and found that DNMT3AINS AMLs still showed higher ‘normalized’ LPMD (Supplementary Fig. 3). We verified that bisulfite conversion rates were greater than ~99.7% for all the examined eRRBS data (median 99.87%) and also were not correlated with LPMD values (Pearson’s r = −0.01, p = 0.94; Supplementary Fig. 4a, b). Also, LPMD did not show correlation with mapping qualities (Supplementary Fig. 4c), thus excluding the possibility that the high LPMD occurring due to experimental artifacts.
We next examined whether the extent of the destabilization of DNMT3A induced by DNMT3AINS mutation correlates with LPMD. We found that the stability scores showed marked negative correlation with LPMD values (Pearson’s r = −0.71, p = 3.36 × 10−5; Fig. 1g). In other words, more severe instability of DNMT3A was associated with greater local discordance of DNA methylation states. This result corroborates the putative relationship between the instability-driven reduction of intracellular DNMT3A concentration and decreased DNA methylation concordance.
To verify whether these findings can be reproduced in an independent AML cohort, we conducted similar analysis for the TCGA-LAML cohort (n = 140). Since we only had methylation BeadChip array profiles for this cohort, we could not make use of the phasing information of methylation states as in the bisulfite sequencing data from Li2016 cohort. To circumvent this problem, we instead devised an array-based LPMD as an alternative of bisulfite sequencing-based LPMD (Methods) and computed it for the TCGA-LAML cohort. Of note, array-based LPMD serves as a theoretical lower bound of sequencing-based LPMD (Methods). As a result, we observed that DNMT3AINS AMLs had significantly high levels of local disorder of DNA methylation (p = 9.6 × 10−4, two-sided Mann–Whitney U test between WT and DNMT3AINS, Fig. 1h). Furthermore, the array-based LPMD levels were also negatively correlated with the stability scores of corresponding DNMT3A variants (Fig. 1i; Pearson’s r = −0.58, p = 0.001), reproducing the results from Li2016 cohort.
Additionally, we newly performed RRBS on our own cohort comprised of 20 AML patients (SNUH cohort; Supplementary Data 2). There were two patients with DNMT3AINS variants at position 754 (stability score 0.386) and 736 (stability score 0.316). Of note, these variants were among the highly critical variants impacting the stability of the protein (top 17% and 7%, respectively). Again, those two DNMT3AINS AML patients showed markedly high genomewide LPMD values (Fig. 1j). We confirmed that variant at 736 position is provoking decreased tetramerization at protein level with prominent formation of dimerization (Supplementary Fig. 5).
Methylation landscape of DNMT3A INS AMLs, in terms of methylation levels, is similar to that of DNMT3A WT AMLs, but not DNMT3A R882 AMLs
In general, it is widely known that the alteration of DNA methylation in cancer cells accompanies focal hypermethylation of CpG-dense regulatory regions including CpG islands, as well as a global loss of DNA methylation. AML cells are no exception to these epigenomic alterations. Beyond these malignancy-associated alterations, DNMT3AR882 AMLs are shown to have distinct hypomethylation patterns compared to DNMT3AWT, which arise from the attenuated AML-associated hypermethylation and loss of methylation at regions normally maintained at high methylation level16. On the other hand, the characteristic of the global methylation landscape of DNMT3AINS AMLs has not been clearly elucidated so far.
To characterize the methylation landscape of DNMT3AINS AML in terms of methylation levels, we first examined whether DNMT3AINS AMLs also show the hypomethylation patterns observed in DNMT3AR882 AMLs using Li2016 cohort, thereby seeking the similarities and differences of DNMT3AINS and DNMT3AR882 methylomes. To determine the genomic regions subjected to DNMT3AR882-associated hypomethylation, we identified differentially methylated regions (DMRs) between DNMT3AR882 and DNMT3AWT samples using an established method17. As expected, the identified DMRs predominantly consisted of hypomethylated DMRs (hypo-DMRs) in DNMT3AR882, accounting for 88% (465 of 527) of them (Fig. 2a). Strikingly, we observed DNMT3AINS AMLs showed comparable DNA methylation level to that of DNMT3AWT at those identified DNMT3AR882-associated hypo-DMRs (Fig. 2b and Supplementary Fig. 6a). The direct comparison between DNMT3AINS and DNMT3AWT AML methylomes also highlights the similarity of their methylomes in terms of methylation levels (Supplementary Fig. 7). Additionally, these significant differences between DNMT3AINS and DNMT3AR882 were also observed in TCGA-LAML and SNUH cohort (Supplementary Fig. 8a, b). These results show that methylomes of DNMT3AINS AMLs are devoid of DNMT3AR882-associated hypomethylation patterns and underscore the clear difference between DNMT3AINS and DNMT3AR882 in terms of their methylomes.
a Distribution of average methylation level difference in DMRs identified between DNMT3AR882 and DNMT3AWT AMLs. b Average methylation levels of different DNMT3A mutation subclasses of AMLs around the hypo-DMRs identified between DNMT3AR882 and DNMT3AWT AMLs. c Distribution of average methylation level difference in DMRs identified between different DNMT3A mutation subclsses and normal bone marrow cells using RRBS. d Average methylation levels surrounding the hyper-DMRs in DNMT3AINS (vs normal bone marrow cells) for each DNMT3A mutation subclass. Hypo-DMRs, Hypomethylated DMRs; Hyper-DMRs, Hypermethylated DMRs. In (b, d), shades denote 95% confidence intervals.
We were curious whether DNMT3AINS AMLs harbor any regions having altered DNA methylation levels uniquely for them, so we identified and compared the characteristics of DMRs between each DNMT3A subclass and normal bone marrow (NBM) samples. As a result, DNMT3AWT AMLs had 4703 (82%) hyper-DMRs and 1013 (18%) hypo-DMRs (Fig. 2c). We note that the extreme bias toward hyper-DMRs may be due to a high specificity of eRRBS experiment for CpG-dense regions, which thus exaggerates cancer-associated hypermethylation events. Nevertheless, DMRs in DNMT3AR882 AMLs were less skewed toward hyper-DMRs. They were associated with fewer hyper-DMRs (n = 3067, 72%) and more hypo-DMRs (n = 1186, 28%; Fig. 2c), recapitulating the attenuated hypermethylation in DNMT3AR882. DMRs identified in DNMT3AINS AMLs were even more skewed toward hyper-DMRs (n = 7965, 97%; Fig. 2c). However, those hypermethylation events do not occur specifically in DNMT3AINS, as every DNMT3A subclasses of AMLs showed significant hypermethylation within the hyper-DMRs identified in DNMT3AINS (Fig. 2d and Supplementary Fig. 6b) and even within the hyper-DMRs that were exclusive to DNMT3AINS (Supplementary Fig. 8c). The hyper-DMRs were also similarly distributed across genomic contexts (Supplementary Fig. 8d). These observations indicate that the majority of hypermethylation in DNMT3AINS-associated hyper-DMRs originates from hypermethylation events that are generally observed in AML.
Altogether, these results suggest two conclusions for the methylation landscape of DNMT3AINS AML. First, since DNMT3AINS AMLs did not show DNMT3AR882-associated hypomethylation patterns, the current leukemogenic model for DNMT3AR882 may not directly apply to DNMT3AINS AMLs. Next, the methylome of DNMT3AINS showing comparable levels of DNA methylation to DNMT3AWT implies that there are underlying molecular aberrations associated with DNMT3AINS other than the absolute DNA methylation level changes. This underscores the importance of the increased intratumoral DNA methylation heterogeneity, including the local disorder of DNA methylation, in DNMT3AINS AML.
Local disorder of DNA methylation in DNMT3A INS AML occurs predominantly at bivalent domains
Even though the precise molecular mechanism still remains obscure, previous experimental validation demonstrated that DNMT3A-dependent hypermethylation in AML cells occurs mostly at bivalent chromatin domains16. To provide additional line of evidence supporting that the observed hyper-DMRs in DNMT3AINS truly resulted from the catalytic activity of DNMT3A, we took advantage of a reference epigenome of CD34+ myeloid progenitor from ENCODE18 and analyzed the epigenetic context of the hyper-DMRs. The resulting aggregated signals of several epigenomic marks surrounding the hyper-DMRs in DNMT3AINS are shown in Fig. 3a and Supplementary Fig. 9. We observed that these regions colocalized with both active (H3K4me1/3) and repressive (H3K27me3) histone marks, which indeed are indicative of bivalent chromatin domains. We additionally validated that the hyper-DMRs in DNMT3AINS were strongly enriched for bivalent chromatin states inferred by ChromHMM19 (Fig. 3b). Of note, the observed hypermethylation patterns enriched at bivalent domains are not restricted to DNMT3AINS, but also shown in all the other DNMT3A subclasses (Fig. 3c, Supplementary Fig. 10a), whereas hypo-DMRs were enriched for enhancer-related genomic contexts (Supplementary Fig. 10b). Altogether, these data collectively indicate that the identified hyper-DMRs, primarily located at bivalent domains, represent the genomic regions where the de novo methylation by DNMT3A takes place.
a Average histone modification levels around hyper-DMR identified between DNMT3AINS and normal bone marrow cells. Shades denote standard errors. b ChromHMM chromatin context enrichment of hyper-DMR identified between DNMT3AINS and normal bone marrow cells. c Average methylation level surrounding bivalent domains. d LPMD distribution across different genomic contexts for Li2016 cohort. Parenthesized values denote the proportion of the analyzed CpGs within each genomic context, except for that next to ‘All’, which denote the total number of analyzed CpGs. P-values from two-sided Mann–Whitney U tests between DNMT3AWT and DNMT3AINS are shown. LPMD distributions in (e) bivalent domain and (f) SINE. P values from two-sided Mann–Whitney U tests between DNMT3AWT and DNMT3AINS are shown. n.s. p > 0.05. g LPMD comparison in bivalent or non-bivalent promoters. P-values from two-sided Mann–Whitney U tests between DNMT3AWT and DNMT3AINS are shown. Throughout (d–g), the center line denotes the median, the upper and lower box limits denote upper and lower quartiles, and the whiskers denote 1.5× interquartile range.
Given that the bivalent domains are the putative hotspots of de novo methylation in DNMT3AINS AMLs, we hypothesized that the DNA methylation disorder within those samples will be highly concentrated in those regions. To address this question, we computed LPMDs separately for 12 different genomic contexts. Surprisingly, we found that the difference of LPMD between DNMT3AINS and the other DNMT3A subclasses was almost exclusive at bivalent domains and regulatory regions including promoters, CpG islands, shores, and methylation canyons (Fig. 3d). This high specificity of DNA methylation disorder toward bivalent domain (Fig. 3e) is notable when compared with the LPMD distributions for CpGs located at SINEs (Fig. 3f). Note that those two genomic contexts harbor a comparable number of analyzed CpGs (223,428 and 189,338 CpGs for bivalent domains and SINEs, respectively). We also observed the specificity of LPMD increase at bivalent domains in receiver operating characteristic (ROC) analysis (Area under ROC curve = 0.76; Supplementary Fig. 11). Further, categorizing promoters into bivalent and non-bivalent promoters revealed that the difference of LPMD was restricted to bivalent promoters, whereas non-bivalent promoters showed only marginal absolute difference of LPMD (Fig. 3g). Taken together, we concluded that the disordered methylation in DNMT3AINS AMLs is highly specific to bivalent domains, where the DNMT3A-driven de novo methylation potentially takes place. For convenience, we hereafter refer to the LPMD at bivalent domains as bivLPMD.
DNA methylation discordance in DNMT3A INS AMLs leads to increased epigenetic diversity of leukemic cell population
Our observations so far demonstrate that DNMT3AINS AMLs were associated with the corruption of the local concordance of DNA methylation states. However, it should be interpreted with caution since it does not indicate the increase of the population-wise epigenetic diversity. LPMD is an intra-molecule measure20 that individually accounts for each read originated from a single cell, so it is not suitable to discern whether the erosion of local correlation of DNA methylation states occurs in a coordinated or stochastic manner throughout the malignant cells.
To determine whether the local discordance of DNA methylation states observed in DNMT3AINS AMLs accompanies the diversification of population-level epigenetic states, we orthogonally examined an inter-molecule DNA methylation heterogeneity score named epipolymorphism21. As a result, we observed significant increases of epipolymorphism in DNMT3AINS AMLs (p = 0.015, two-sided Mann–Whitney U test between WT and DNMT3AINS; Fig. 4a), indicating that the erosion of local concordance of DNA methylation in DNMT3AINS AML occurs rather stochastically, and thus gives rise to the epigenetically diversified cell population. Of note, sample purity (p = 0.23, two-sided Mann–Whitney U test between WT and DNMT3AINS; Supplementary Fig. 12a) and heterogeneity of cell type composition did not seem to confound the observed increased epigenetic diversity (p = 0.37, two-sided Mann–Whitney U test between WT and DNMT3AINS; Supplementary Fig. 12b).
a Epipolymorphism distribution across different genomic contexts. P values from two-sided Mann–Whitney U tests between DNMT3AWT and DNMT3AINS are shown. The center line denotes the median, the upper and lower box limits denote upper and lower quartiles, and the whiskers denote 1.5× interquartile range. b Experimental scheme to identify genes with methylation disorder-associated inter-sample expression variation. c Association between promoter bivalency and variable gene expression. Values in the table denote the number of genes in the corresponding condition. Odds ratio and p value from two-sided Fisher’s exact test are shown. Functional enrichment of top 4000 genes showing highest inter-sample expression variation in fourth quartile (Q4) of LPMD values for (d) GO Biological Process, (e) GO Cellular Component, and (f) GO Molecular Function terms. Bars represent Benjamini-Hochberg adjusted p values from two-sided Fisher’s exact tests. In (d), GO terms are grouped by broader biological concepts that are shown on the right side. g Gene expression levels (in TPM) and their inter-sample coefficient of variation (CV) for eight representative genes. Samples were grouped according to LPMD quartiles; Q1 (n = 10), Q2 (n = 9), Q3 (n = 9) and Q4 (n = 10).
High LPMD is associated with increased transcriptional variance of genes involved in remodeling of leukemic stem cell niche
Given the remarkable specificity of local DNA methylation discordance and epigenetic diversity at bivalent domains, we then sought the functional implications of DNA methylation disorders in leukemia development at the transcriptome level. Importantly, the promoters of the developmental genes in stem cells are widely known to be frequently marked by bivalent chromatin marks22. Thus, the heterogeneity of DNA methylation in developmental promoters occurring at DNMT3AINS AMLs suggests the possibility that the heterogeneity of the developmental gene regulation within leukemic cell population facilitates the progression of the disease by conferring the fitness advantage of cells.
To assess whether the epigenetic diversity of bivalent domains is associated with transcriptional diversity of the corresponding genes, a subset of Li2016 AML samples (n = 38) profiled with both RRBS and RNA-seq data was analyzed. According to the additive property of variance, we assumed that the cell-level transcriptional variability, if it exists, will in turn manifest itself in the sample-level (i.e., bulk cell-level) transcriptional variability. Therefore, we measured and compared inter-sample variances of gene expression levels within each quartile of samples sorted by bivLPMD levels (Fig. 4b).
We found that top 4000 genes showing increased transcriptional variability in high-LPMD group (the highest quartile) were greatly enriched for genes having bivalent domains in their promoters (Odds ratio = 4.619, p < 10−308, Fisher’s exact test; Fig. 4c), which supports the linkage between the observed epigenetic heterogeneity of bivalent domains and the transcriptional heterogeneity. As expected, functions of those genes were enriched for cell differentiation (Fig. 4d). Interestingly, we also found that they were also enriched for the biological processes shaping the hematopoietic stem cell niche in the bone marrow, including cell-cell adhesion, angiogenesis, cell proliferation and survival, cell communication, chemokine-mediated signaling and extracellular matrix organization (Fig. 4d). Moreover, genes associated with high transcriptional variability were predominantly associated with cell membrane and extracellular matrix (ECM) (Fig. 4e), suggesting the combinatorial diversification of the membrane protein configuration of progenitor cell, and eventually, the diversification of the modes of cell-cell and cell-ECM interaction within the hematopoietic stem cell niche. The enrichment of their molecular function towards membrane receptors, cytokines as well as chemokines also supports this notion (Fig. 4f). Notably, variably expressed genes selected from single cell gene expression measurements of clonal hematopoiesis of intermediate potential (CHIP) sample with R736C DNMT3AINS variants also resulted in similar results (Supplementary Fig. 13), while variably expressed genes identified from healthy blood samples (from GTEx) did not (Supplementary Fig. 14). Figure 4g demonstrates representative genes implying the heterogeneity of factors sculpting stem cell niche in high-bivLPMD AML samples. It highlights the transcriptional variability of cell adhesion molecule (CDH26), chemokines (CXCL6 and CCL25), secreted signaling factors (WNT9A and SHH), signaling receptors (PTCH1 and FGFR1) and downstream regulator (AXIN1). As WNT9A and AXIN1 imply the heterogeneity of the activity of WNT signaling pathways, whose significance has been underscored in hematopoietic stem cell maintenance23,24, we can envision that the diversity of the local concentration of paracrine factors in bone marrow stem cell niches may increase the fitness of leukemic stem cells communicating with it.
Collectively, these results showing the association of increased epigenetic and transcriptional variability propose a leukemogenic model that is worth exploring through functional experiments. It suggests that the increased transcriptional variability for both cell-intrinsic biological processes involving the balance between self-renewal and differentiation and cell-extrinsic factors surrounding each blast cell25 may confer fitness advantages to leukemic cells. Specifically, the external factors include direct interaction with other blast cells sharing the niche through cell-cell junctions, and other secretory factors including signaling molecules, cytokines and chemokines, produced by nearby cells triggering the intracellular signal transduction. A population of malignant cells experiencing locally heterogeneous environment may result in the increased adaptive potential of the disease.
DNA methylation disorder at bivalent domains, but not absolute level of DNA methylation, robustly predicts the response of AML cells to hypomethylating agents
We then asked whether increased LPMD at bivalent domains and associated epigenetic/transcriptomic diversity actually contribute to the sustained survival of AML cells. To examine the dependency of leukemic cells to increased LPMD, we took a functional epigenomic approach by examining the survival of AML cells upon the elimination of the discordance of DNA methylation. To this end, we utilized the DNA methylation profiles of AML cell lines in Cancer Cell Line Encyclopedia (CCLE) and associated drug response profiles. Specifically, the drug responses of CCLE cell lines were collected from Cancer Therapeutics Response Portal (CTRP) v2, and DNA methylation profiles of corresponding cell lines were obtained by processing publicly available RRBS data by our own pipeline.
Meanwhile, hypomethylating agents (HMAs) including decitabine and azacitidine have been an invaluable epigenetic treatment option for AML patients who are not suitable for intensive chemotherapy26. Recent studies have shown complex and pleotropic mechanism of action of HMAs27,28,29, which in part explains why a robust biomarker predicting the response of a patient to HMA treatment still remains obscure. By examining the correlation between LPMD and response of AML cell lines to HMA, we aimed to show the importance of the sustained methylation discordance in the survival of AML cells, as well as the potential of LPMD as a biomarker for the response to HMA.
Strikingly, we observed a significant negative correlation between bivLPMD and the area under dose-response curve (AUDRC) of AML cell lines measured for decitabine (Pearson’s r = −0.56 and p = 0.015; Fig. 5a, b). This association persisted even when sufficient concentration of decitabine was treated in combination with other drugs (Pearson’s r = −0.59 and −0.59, and p = 0.016 and 0.0096 for Decitabine combined with Navitoclax and Carboplatin, respectively; Fig. 5a), suggesting that higher degree of LPMD at bivalent domains predicts better response to decitabine. We additionally found that a high bivLPMD is also a good predictor of the response to RG-108, a non-nucleoside DNMT inhibitor that induces demethylation through direct binding to the active site of DNMTs (Pearson’s r = −0.49 and p = 0.044; Fig. 5a, b). We note that we could not observe any notable response to azacitidine for these AML cell lines, which may be due to an experimental artifact of CTRP v2 experiments (Supplementary Fig. 15), as the activity of azacitidine on AML cell lines were previously reported30. The association gradually diminished when the genomic regions for which LPMD values were calculated became distant from the core regulatory regions (from promoters and CpG islands to CpG shelves; Supplementary Fig. 16), implying that the functional importance of the DNA methylation heterogeneity for the survival of AML cells was mediated by gene regulation. Remarkably, LPMDs calculated for non-bivalent non-regulatory regions did not show significant correlation with responses to HMAs (Fig. 5c, d, Supplementary Fig. 17) which further highlights that the discordant DNA methylation states at bivalent regulatory domains is specifically important for the survival of AML cells. Still, we cannot rule out the potential of LINE and SINE LPMD values as a biomarker for HMA response, since the effects of HMA mediated by the immune response associated with the reactivation of those retrotransposable elements cannot be identified in cell line experiments. We also confirmed that bivLPMD did not correlate with the age of cell line at sampling time (Pearson’s r = 0.15 and p = 0.57; Supplementary Fig. 18a).
a Correlation between LPMD at bivalent domains (bivLPMD) and area under dose-response curve (AUDRC) for hypomethylating agents. Pearsons’s correlation coefficients and corresponding p-values are shown. b Example dose-response curves for RG-108 and decitabine for two representative cell lines, THP1 and HEL9217, with low bivLPMD and high bivLPMD, respectively. c Correlation between LPMD at SINE and AUDRC for hypomethylating agents. Pearsons’s correlation coefficients and corresponding p-values are shown. d Example dose-response curves for RG-108 and decitabine for two representative cell lines, KASUMI1 and SIGM5, with low and high LPMD at SINE, respectively. e Correlation between average methylation level at bivalent domain and AUDRC. Pearsons’s correlation coefficients and corresponding p values are shown. f Schematic diagram showing the retrospective analysis examining the utility of bivLPMD as a biomarker predicting hypomethylating agent (HMA) response. g Comparison of bivLPMD values in AML patient groups showing complete remission or not after HMA treatment. The center line denotes the median, the upper and lower box limits denote upper and lower quartiles, and the whiskers denote 1.5× interquartile range. P value from two-sided Mann–Whitney U test is shown. EM-seq, enzymatic methyl-seq.
Importantly, the responses of AML cell lines to decitabine and RG-108 were not associated with their methylation levels per se (Pearson’s r = 0.19 and p = 0.45; Fig. 5e). These results provide additional evidence supporting that focal increase of average methylation levels observed in AML is a mere collateral consequence of myeloproliferation, and the viability of AML cells generally does not depend on them. It is noteworthy that these results collectively suggest that AML cells were ‘addicted’ to the methylation discordance, since the erasure of discordant methylation states with hypomethylating agents triggered their death.
To additionally confirm that our results on AML cell lines can be extended to clinical applications, we retrospectively measured the bivLPMD values using targeted enzymatic methyl-seq (EM-seq) from blood samples of 34 AML patients (Supplementary Data 3) who later underwent HMA treatment and examined its association with the response to HMA treatment (Fig. 5f). Custom sequencing panel covering bivalent domains was designed for efficient measurement of bivLPMD through targeted EM-seq (Methods). Reassuringly, bivLPMD values were shown to be a good predictor of complete remission after HMA response (p = 0.0066, two-sided Mann–Whitney U test; Fig. 5g), while being not correlated with patient age (Pearson’s r = −0.19 and p = 0.28; Supplementary Fig. 18b). Collectively, these results show the importance of bivLPMD in the survival of AML cells.
Clinical implications of DNMT3A INS in hematological disorders
Given the association between DNMT3AINS and increased local methylation disorder and its functional impact in AML, we sought for the clinical outcomes of hematological conditions associated with DNMT3AINS. We first asked whether DNMT3AINS is generally associated with adverse outcome of AML patients. To this end, we performed a pooled survival analysis of 668 non-M3 AML patients using three large cohorts (Ley et al. (n = 233), TCGA-LAML (n = 179)11 and BeatAML (n = 256)31). Both DNMT3AINS and DNMT3AR882 showed significantly poorer overall survival compared to DNMT3AWT (log-rank p = 0.0094 and 0.0047, respectively; Fig. 6a), while DNMT3AOther did not (p = 0.482). Additionally, multivariate Cox regression showed that DNMT3AINS is an independent risk factor (Hazard ratio 1.85, 95% CI 1.28–2.67) of AML even after accounting for age and gender (Fig. 6b).
Discussion
AML comprises heterogeneous subtypes of diseases that can be classified under microscopic inspection of cells or based on genetic abnormalities. Although such subclassifications have been routinely utilized for the stratification of patient outcomes and the decision of treatment regimens, understanding the mechanistic relevance between molecular biomarkers and clinical outcome of AML will allow more thorough substratification of the disease. Since the early studies, the molecular classification of AML has highlighted remarkable recurrence of mutations in epigenetic modifiers including DNMT3A, IDH1/2, and TET2. However, the link between epigenetic alterations and aberrant epigenetic profiles has been only recently studied for its clinical relevance10,32. In this regard, the complicated mutational landscape of DNMT3A involving conspicuous enrichment of mutations at residue R882 and dispersed mutations throughout non-R882 residues provides an excellent opportunity to investigate the mechanistic connection between genetic and epigenetic alterations.
In this study, we characterize the methylomes of AMLs harboring DNMT3A mutations that reduce the stability of the protein by analyzing the methylation profiles from three different AML cohorts. We show that they were associated with highly disordered local DNA methylation patterns specifically at bivalent domains, which in turn leads to the epigenetic diversity of AML cell population. As far as our knowledge is concerned, this is the first study that systematically analyzes the effect of the destabilization of DNMT3A directly on the methylomes of AML patients.
To date, researchers have been struggling to clearly provide the common effect of non-R882 DNMT3A mutations on leukemia, as the functional consequences of non-R882 mutations vary widely for the activity of the mutant proteins33. In line with this challenge, our results suggest a new perspective: the effect of individual non-R882 mutation on enzymatic activity may not be critical, at least for DNMT3AINS mutations. This is because a mutant DNMT3A harboring one of those mutations is prone to be degraded and thus would not actively participate in de novo methylation. Instead, our results suggest that the common consequence of DNMT3AINS variants, namely the reduction of intracellular DNMT3A concentration, is a key factor affecting the initiation and progression of AML.
Nevertheless, it seems that some DNMT3AINS variants, especially those residing in the tetramer interface, further strengthen the dimeric preference of the enzyme by hampering the tetramerization by weakening the interaction at the tetramerization interface. Our experimental results showing the predominant dimerization of R736S DNMT3A in vitro (Supplementary Fig. 5) suggest that some non-R882 variants may further promote the dimeric preference of the enzyme. Such residues that can elicit the synergy between destabilization and interface effect include S714 (stability score 0.688), R729 (stability score 0.364), R736 (stability score 0.316), R749 (stability score 0.339), S770 (stability score 0.419) and R771 (stability score 0.527), and they are shown to be among the most frequently mutated residues in hematological malignancies following R882 (Supplementary Fig. 19).
Our observations suggest a potential explanation for the enigmatic recurrence of DNMT3AINS variants in AML that has been poorly accounted for. In particular, our results link the biochemical property of DNMT3AINS and the local DNA methylation disorder in DNMT3AINS AML (Fig. 7). The reduced dosage of intracellular DNMT3A due to the instability-driven degradation of DNMT3AINS may favor the dimerization of DNMT3A over its tetramerization, as supported by the experimental study showing that the DNMT3A oligomerization is determined by its concentration14. Thus, DNMT3AINS AML may show prevalent dimer-driven distributive de novo DNA methylation, whereas DNMT3AWT AML exerts tetramer-driven processive catalysis. Distributive methylation leads to a decreased concordance of local DNA methylation states, and the random dissociation of DNMT3A dimers from DNA in turn triggers the concomitant increase of the epigenetic diversity of cancer cell population. Although the clear mechanism of how the epigenetic diversity drives the progression and aggressiveness of AML cells still remains to be elucidated, our results showed the association between epigenetic and transcriptional heterogeneity of leukemic cells. Especially, the functional heterogeneity was enriched for genes contributing to the fitness of leukemic stem cells within the hematopoietic stem cell niche. Furthermore, the correlation between the epigenetic diversity at bivalent regulatory domains and response to HMA implies the connection between epigenetic diversity and transcriptional heterogeneity of cancer cells. Still, this study has several limitations. It is still obscure whether the DNMT3A-ins variants have characteristic mutational co-occurrence patterns (Supplementary Fig. 20), which will provide more direct clues for the functional role of DNMT3AINS variants. Also, the prognostic value of the LPMD level itself could not be established due to the lack of large-scale methylation sequencing data paired with survival information. While the processivity of DNMT3A has been independently observed by different groups of researchers13,33, we also note that there exist biochemical studies that do not support the processivity the enzyme34,35,36, where the cooperative catalysis of DNMT3A is instead explained through the multimerization of the protein on DNA. However, even in this case, the non-cooperative catalysis can be easily linked with the reduced amount of intracellular DNMT3A due to DNMT3AINS variants.
Proteasomal degradation of destablized DNMT3A proteins harboring DNMT3AINS mutations leads to decreased effective concentration of intracellular DNMT3A. Thus, the dimerization of DNMT3A protein is preferred over their tetramerization. Unlike DNMT3A tetramers, which conduct de novo methylation in a processive manner, DNMT3A dimers dissociate from DNA frequently during catalysis. This distributive de novo methylation results in stochastic local disorder of DNA methylation patterns, which in turn confers population-level epigenetic diversity of hematopoietic stem cells. Increased epigenetic diversity of cell population translates to increased fitness or adaptive potential of cell population, ultimately leading to poorer outcome of the patients.
Cancer has long been appreciated as an intrinsically heterogeneous disease. Genetically and epigenetically distinct cells, or subclones, arise from sporadic molecular aberrations, and they compete and cooperate with each other while exploiting the limited resources surrounding them. For recent decades, the extent of such intratumor heterogeneity has shown great potential as a clinical biomarker. However, studies so far have primarily focused on their prognostic power, and it is still questionable that the heterogeneity itself can be exploited as an actionable therapeutic target. In this regard, epigenetic intratumor heterogeneity, thanks to the reversible nature of epigenetic modifications, would bring a novel therapeutic avenue that exploits direct manipulation of the heterogeneity of cancer cell population, i.e., homogenization of epigenetic states of cancer cells. Such intervention may undermine the fitness of cancer cell population, which ultimately triggers cell death. Indeed, this proposed mechanism may have already been implicitly functioning behind the conventional HMA treatments, but it has not been clearly elucidated before, as shown by the lack of DNA methylation-based biomarkers for HMAs. Our observations from functional epigenomic analyses in part support this scenario, and further provide an effective way to predict the response of AML cells to HMAs, which greatly increase the precision of the antileukemic therapies in clinical practice.
Methods
RRBS
To construct the MSP1 and Apek1 digested reduced-representation bisulfite sequencing (RRBS) library, 500 ng of input genomic DNA was assembled into 50 μl of reactions with MspI (NEB), incubated at 37 °C for 24–26 h. ApeKI (NEB) was then added and incubated at 75 °C for 16–20 h. The digested products were purified with a MiniElute PCR Purification Kit (Qiagen). After purification, the digested products were blunt-ended, and then dA was added, followed by methylated-adapter ligation. A range of 160–420 adapter-ligated fraction was excised from a 2% agarose gel. Bisulfite conversion was conducted using a ZYMO EZ DNA Methylation-Gold Kit™ (ZYMO), following the manufacturer’s instructions. The final libraries were generated by PCR amplification using PfuTurbo Cx Hotstart DNA polymerase (Agilent technologies, Santa Clara, CA, USA). RRBS libraries were analyzed by an Agilent 2100 Bioanalyzer (Agilent Technologies). The methylation data were generated using two different platforms, Illumina HiSeq 2500 Standard 100 PE (100 bp paired end) and NovaSeq 6000 S4 150 PE (150 bp paired end).
Collecting and processing public DNA methylation data
DNA methylation profiles for the public cohorts analyzed in this study were collected and processed as follows. Raw eRRBS sequencing data for 47 AML patients7 were obtained from dbGaP under accession phs001027.v2.p1. Sequencing was performed for each patient at both points of diagnosis and relapse, thus resulting in 94 sequencing runs in total. Bisulfite sequencing reads were adapter-trimmed with Trim galore!37 v0.6.7 (RRID:SCR_011847) with—rrbs option turned on. Reads were aligned to the hg38 reference genome with Bismark38 v0.22.3 (RRID:SCR_005604), and CpG methylation levels were extracted using MethylDackel39 v0.4.0. The same RRBS processing pipeline was applied to our own SNUH cohort.
Illumina HumanMethylation450 BeadChip array-based DNA methylation profiles of 140 TCGA-LAML patients were downloaded from Genomic Data Commons (GDC) data portal.
Sample collection for SNUH cohort
The samples were collected in accordance with the guidelines and regulations of the Seoul National University Hospital (IRB Nunber H-1103-004-353). DNMT3A mutations for patients with AML or myelodysplastic syndrome patients were identified using clinical NGS panel screening.
Definition of DNMT3A INS variants and DNMT3A INS AML
DNMT3AINS variants were identified using the catalog of stability ratios of DNMT3A amino acid substitutions that were experimentally determined by previous study6. Although the catalog covers a large number of residues (248/912 amino acids), still some of mutations occurring in clinical AML samples are not covered. Therefore, we extrapolated the ratios to assign stability scores for those uncharted substitutions by assigning a single stability score for each amino acid position, instead of each amino acid substitutions. It was done by computing the average of all known stability ratios resulting from the substitution each amino acid. Indeed, this procedure makes individual stability score less sensitive to the amino acid properties, thus some false positive or negative DNMT3AINS classification can be produced. However, we considered that it will be more beneficial to increase the sensitivity of the whole study by incorporating more variants to the analyses.
All variants having processed stability scores below 0.75 were classified as DNMT3AINS. Moreover, nonsense and frameshift variants were also included as part of DNMT3AINS variants, as the truncation of DNMT3A protein are known to cause protein degradation in AML cells40. An AML sample was classified as DNMT3AINS AML only if it harbors a single mutation on DNMT3A gene and it is DNMT3AINS. If a sample harbor DNMT3AR882 mutation, it was classified as DNMT3AR882 AML regardless of the existence of other mutations to reflect the dominant-negative effect of DNMT3AR882 variant. All the other samples having non-destabilizing variants or multiple variants were classified as DNMT3AOther.
Collecting and processing somatic mutation profiles
Somatic variants for each individual were determined as follows. Whole exome sequencing data for Li2016 cohort were accessed via dbGaP under accession phs001027.v2.p1. In total, whole exome sequencing runs for 94 cancer samples (diagnosis and relapse) as well as 47 matched normal samples were obtained. Reads were aligned to hg38 reference genome with bwa v0.7.17-r118841 (RRID:SCR_010910). To increase the sensitivity of variant calls, we considered somatic variants called by at least one of Strelka242 v2.9.10 (RRID:SCR_005109) and Varscan43 v2.4.4 (RRID:SCR_006849) as valid somatic variants. Resulting variants were annotated with SnpEff44 v5.0 (RRID:SCR_005191) and SnpSift45 v4.3t (RRID:SCR_015624). Finally, variants were post-filtered to avoid false positive calls using the following criteria: (1) variants should be present with variant allele frequency greater than 5% and 1% for Li2016 and SNUH cohort, respectively, (2) variant alleles should be supported by at least five sequencing reads, (3) variants should not be present with ExAC population allele frequency more than 1%, and (4) only missense, nonsense, frameshift and splice variants were considered. For TCGA-LAML cohort, we collected the corresponding mutational profiles from cBioPortal46 (RRID:SCR_014555).
Computation of local pairwise methylation discordance (LPMD)
To measure the disorder of DNA methylation, we devised a new measure called local pairwise methylation discordance (LPMD) that measures the extent to which a pair of nearby CpGs at a fixed distance have conflict in their methylation states. LPMD takes advantage of the phased methylation states of nearby CpGs that are simultaneously captured by a single bisulfite sequencing read. Through the enumeration of all the sequencing reads, LPMDd is computed as the proportion of CpG pairs at genomic distance \(d\) (in bp) with different methylation states. LPMD values were computed using Metheor v0.1.215.
On the other hand, we cannot extract a pair of DNA methylation states that originates from a single cell (i.e., phased methylation states) using the results from DNA methylation arrays. To devise an alternative of sequencing-based LPMD values using methylation levels measured by DNA methylation arrays, the difference of DNA methylation levels of a CpG pair at a fixed distance was utilized. The use of this measure can be justified by the fact that the methylation level difference of CpG pair forms the lower bound of LPMD. Assume that there is a CpG pair with methylation level \({\beta }_{1}\) and \({\beta }_{2}\), where \({\beta }_{1} \, < \, {\beta }_{2}\), without loss of generality. Then, the maximum proportion of CpG pairs both having methylated state will be\(\,{\beta }_{1}\). Similarly, the maximum proportion of CpG pairs both having unmethylated state will be\(\,1-{\beta }_{2}\). Thus, the lowest possible proportion of CpG pairs having different methylation state is \(1-({\beta }_{1})-\left(1-{\beta }_{2}\right)={\beta }_{2}-{\beta }_{1}\), which is the methylation level difference of the pair. Sample-wise array-based LPMD was computed similarly to sequencing-based LPMD by specifying the distance between CpG pairs.
Computation of epipolymorphism
Epipolymorphism21 is a cell population-wise measure that quantifies the diversity of methylation patterns, or epialleles, of four consecutive CpG sites (CpG quartets). To compute epipolymorphism from bisulfite read alignments of Li2016 cohort, we only considered CpG quartets that are supported by more than ten sequencing reads. CpG quartets harboring CpG site that overlaps with dbSNP 151 SNPs were excluded. For each CpG quartets, epipolymorphism is defined considering 16 possible patterns of DNA methylation states. For convenience, here we denote unmethylated and methylated states as ‘0’ and ‘1’, respectively. Then we can think of 16 possible DNA methylation patterns from \({x}_{0}\) = 0000 (fully unmethylated pattern) to \({x}_{15}\) = 1111 (fully methylated pattern), and epipolymorphism is defined as below.
where \({n}_{i}\) denotes the number of reads supporting pattern \({x}_{i}\) and \(N=\,{\sum }_{i=0}^{15}{n}_{i}\). Epipolymorphism values were computed using Metheor v0.1.215.
Reference epigenome for CD34 hematopoietic stem cells
Reference epigenomes for CD34-positive hematopoietic stem cells (HSCs) were downloaded from ENCODE (RRID:SCR_015482) under accession number ENCSR970ENS. In particular, the raw whole genome bisulfite sequencing data was downloaded under library accession ENCLB590SRF and was processed as described above. Processed signal p-values and called peaks for ChIP-seq targeting H3K4me1, H3K4me3, H3K9me3, H3K27me3, H3K27ac and H3K36me3 histone marks were downloaded under accession number ENCSR401CJA, ENCSR136QKZ, ENCSR957WQX, ENCSR355PUX, ENCSR620AZM and ENCSR164ROX, respectively. Similarly, signal p-values and peaks for DNase I hypersensitive sites were downloaded under accession ENCSR468ZXN. For the subsequent analyses, signal p-values were normalized with arcsinh transformation. The core 15-state chromatin states inferred by ChromHMM19 were downloaded from Roadmap Epigenomics (RRID:SCR_008924) for the enrichment analysis of differentially methylated regions. Bivalent domains in CD34-positive hematopoietic stem cells are defined as the genomic regions with chromatin states named 10_TssBiv, 11_BivFlnk or 12_EnhBiv.
Selection of the bivalent domains for targeted enzymatic methyl-seq
We selected representative bivalent domains that show pronounced methylation disorder in DNMT3AINS AMLs compared to HSCs for targeted enzymatic methyl-seq (EM-seq). To obtain sufficient depths for the targeted regions, the total span of the sequencing panel was aimed to be about 500 kbp, which is about 4% of the bivalent domains in the HSC reference epigenome (~12,526 kbp in total). The following describes how we prioritized bivalent domains to be selected for the panel. First, bivalent domains were ranked by average difference of DNA methylation level between SNUH5763 sample and HSC reference epigenome. At the same time, they were ranked also by density of containing CpGs (number of CpGs divided by the length of the region). Of note, we found that a majority of (90%) bivalent domains were hypermethylated, and higher density of CpGs was positively correlated with methylation level difference (Pearson’s r = 0.554, p < 10−308). Final ranks were obtained by taking geometric mean of methylation level difference and CpG density for each bivalent domain and the top 454 bivalent domains spanning 499,859 bp were selected for the panel.
Targeted enzymatic methyl-seq
We applied an improved methylation detection using EM-Seq to avoid loss of DNA, GC biased coverage, and poor complexity compared with BS-Seq47. Targeted capture panel was designed to tile the selected bivalent domains (10 bp flanking). 4200 hybrid capture probes using the Twist target enrichment (Twist Bioscience, San Francisco, CA, USA) were synthesized to capture ~58,000 CpGs within the selected bivalent domain regions. Genomic DNA samples were fragmented physically by Covaris (200 to 300 bp). Methylated cytosine residues of initial 200 ng input gDNA were converted enzymatically by Twist Bioscience’s NEBNext Enzymatic Methyl-Seq (EM-Seq). Then pre-PCR amplification and sample library preparation were processed. Twist fast hybridization target enrichment with 8-plexing, post PCR amplification, and libraries were sequenced on DNBSEQ-G400 Dx (MGI Tech, Shenzhen, CHINA) with 100 bp paired-end reads with a minimum coverage of 280x (average coverage 380x; 234x ~ 559x).
Genome annotations
All the bioinformatics analyses were performed with hg38 human reference genome. Annotations for human CpG islands were downloaded from UCSC Table Browser. Based on the CpG island annotations, annotations for CpG shores (defined as up/downstream 2 kb regions flanking CpG islands) and CpG shelves (defined as further up/downstream 2 kb regions flanking the borders of CpG shores) were obtained using BEDTools48 v2.26.0 (RRID:SCR_006646). Gene annotations were obtained from GENCODE49 v32 release (RRID:SCR_014966). Annotations for CpG methylation canyons were obtained from a previous study50.
Identification of differentially methylated regions
Differentially methylated regions (DMRs) between various DNMT3A subclasses were identified by metilene v0.2-817. We required at least 4 CpGs for a region to be called as a DMR, while allowing at most 500 bp-away adjacent CpG pair within a DMR. Among those candidate regions, regions showing methylation difference greater than 0.2 and showing Benjamini-Hochberg adjusted p-value less than 0.01 were finally called as DMRs.
Functional enrichment analysis
Enrichr51 from gseapy52 v0.14.0 Python package was used for functional enrichment analysis of gene sets. For reference sets of gene ontology (GO) biological processes, cellular components and molecular functions, GO_Biological_Process_2021, GO_Cellular_Components_2021 and GO_Molecular_Function_2021 sets were used.
Drug response analysis
Drug response analyses were conducted by reanalyzing public experimental results for Cancer Cell Line Encyclopedia (CCLE)53 cell lines. Only the cell lines of hematopoietic lineage derived from AML that have associated raw RRBS data were used. Raw RRBS data were obtained under SRA accession SRP186687 and processed as described above. To avoid spurious methylation calls we excluded CpGs that overlaps with SNPs using dbSNP version 151. Moreover, we excluded CpGs located at ENCODE blacklisted regions54 and their flanking 1000 bp regions from analysis.
The responses of the cell lines to hypomethylating agents were adopted from Cancer Therapeutics Response Portal (CTRP) v255. Area under drug response curve (AUDRC) was used as a measure of drug response, and the fitted curve was reconstructed and visualized with the following four-parameter logistic nonlinear regression model56:
where \(x\) is the concentration of the drug at which the response of cells is to be computed, \(c\) is the dosage of the drug where the 50% of cells shows response, \(b\) is the baseline response, which denotes the response of cells at sufficiently high concentration of the drug and \(s\) is the steepest slope of the logistic curve.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All RRBS and EM-seq data generated in this study were deposited in the NCBI Sequence Read Archive (SRA) database under project accession PRJNA933381. Reference epigenomes for CD34-positive hematopoietic stem cells (HSCs) were downloaded from ENCODE under accession number ENCSR970ENS. Processed signal p-values and called peaks for ChIP-seq targeting H3K4me1, H3K4me3, H3K9me3, H3K27me3, H3K27ac and H3K36me3 histone marks were downloaded under accession number ENCSR401CJA, ENCSR136QKZ [https://www.encodeproject.org/experiments/ ENCSR136QKZ], ENCSR957WQX [https://www.encodeproject.org/experiments/ ENCSR957WQX], ENCSR355PUX [https://www.encodeproject.org/experiments/ ENCSR355PUX], ENCSR620AZM [https://www.encodeproject.org/experiments/ ENCSR620AZM] and ENCSR164ROX [https://www.encodeproject.org/experiments/ ENCSR164ROX], respectively. The core 15-state chromatin states inferred by ChromHMM were downloaded from Roadmap Epigenomics for the enrichment analysis of differentially methylated regions. Raw RRBS reads for Cancer Cell Line Encyclopedia (CCLE) were obtained under SRA accession SRP186687. Annotations for human CpG islands were downloaded from UCSC Table Browser. Gene annotations were obtained from GENCODE v32 release (RRID:SCR_014966). Source data are provided with this paper.
Code availability
The analysis codes and bioinformatics pipelines used for this study are available at the GitHub repository [https://github.com/dohlee/dnmt3a-instability-and-methylation-disorder]57.
References
Yang, L., Rau, R. & Goodell, M. A. DNMT3A in haematological malignancies. Nat. Rev. Cancer 15, 152–165 (2015).
Chen, T., Ueda, Y., Dodge, J. E., Wang, Z. & Li, E. Establishment and maintenance of genomic methylation patterns in mouse embryonic stem cells by Dnmt3a and Dnmt3b. Mol. Cell Biol. 23, 5594–5605 (2003).
Brunetti L., Gundry M. C., Goodell M. A. DNMT3A in Leukemia. Cold Spring Harb Perspect Med. 7, a030320 (2017).
Russler-Germain, D. A. et al. The R882H DNMT3A mutation associated with AML dominantly inhibits wild-type DNMT3A by blocking its ability to form active tetramers. Cancer Cell 25, 442–454 (2014).
Huang Y. H. et al. Systematic profiling of DNMT3A variants reveals protein instability mediated by the DCAF8 E3 Ubiquitin Ligase Adaptor. Cancer Discov. 12, 220–235 (2021).
Landau, D. A. et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell 26, 813–825 (2014).
Li, S. et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22, 792–799 (2016).
Sheffield, N. C. et al. DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma. Nat. Med. 23, 386–395 (2017).
Pan, H. et al. Epigenomic evolution in diffuse large B-cell lymphomas. Nat. Commun. 6, 6921 (2015).
Li, S. et al. Somatic mutations drive specific, but reversible, epigenetic heterogeneity states in AML. Cancer Discov. 10, 1934–1949 (2020).
Cancer Genome Atlas Research, N. et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
Varadi, M. et al. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
Holz-Schietinger, C., Matje, D. M., Harrison, M. F. & Reich, N. O. Oligomerization of DNMT3A controls the mechanism of de novo DNA methylation. J. Biol. Chem. 286, 41479–41488 (2011).
Holz-Schietinger, C., Matje, D. M. & Reich, N. O. Mutations in DNA methyltransferase (DNMT3A) observed in acute myeloid leukemia patients disrupt processive methylation. J. Biol. Chem. 287, 30941–30951 (2012).
Lee, D., Koo, B., Yang, J. & Kim, S. Metheor: ultrafast DNA methylation heterogeneity calculation from bisulfite read alignments. PLOS Comput. Biol. 19, e1010946 (2023).
Spencer, D. H. et al. CpG island hypermethylation mediated by DNMT3A is a consequence of AML progression. Cell 168, 801–816.e813 (2017).
Juhling, F. et al. metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res. 26, 256–262 (2016).
Encode Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Ernst, J. & Kellis, M. Chromatin-state discovery and genome annotation with ChromHMM. Nat. Protoc. 12, 2478–2492 (2017).
Scherer, M. et al. Quantitative comparison of within-sample heterogeneity scores for DNA methylation data. Nucleic Acids Res. 48, e46 (2020).
Landan, G. et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat. Genet. 44, 1207–1214 (2012).
Bernstein, B. E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006).
Fleming, H. E. et al. Wnt signaling in the niche enforces hematopoietic stem cell quiescence and is necessary to preserve self-renewal in vivo. Cell Stem Cell 2, 274–283 (2008).
Huang, J., Nguyen-McCarty, M., Hexner, E. O., Danet-Desnoyers, G. & Klein, P. S. Maintenance of hematopoietic stem cells through regulation of Wnt and mTOR pathways. Nat. Med. 18, 1778–1785 (2012).
Wainwright, E. N. & Scaffidi, P. Epigenetics and cancer stem cells: unleashing, hijacking, and restricting cellular plasticity. Trends Cancer 3, 372–386 (2017).
Santini, V. & Ossenkoppele, G. J. Hypomethylating agents in the treatment of acute myeloid leukemia: a guide to optimal use. Crit. Rev. Oncol. Hematol. 140, 1–7 (2019).
Kordella, C., Lamprianidou, E. & Kotsianidis, I. Mechanisms of action of hypomethylating agents: endogenous retroelements at the epicenter. Front. Oncol. 11, 650473 (2021).
Sigalotti, L. et al. Epigenetic drugs as pleiotropic agents in cancer treatment: biomolecular aspects and clinical applications. J. Cell Physiol. 212, 330–344 (2007).
Agrawal, K., Das, V., Vyas, P. & Hajduch, M. Nucleosidic DNA demethylating epigenetic drugs - A comprehensive review from discovery to clinic. Pharm. Ther. 188, 45–79 (2018).
Hollenbach, P. W. et al. A comparison of azacitidine and decitabine activities in acute myeloid leukemia cell lines. PloS ONE 5, e9001 (2010).
Tyner, J. W. et al. Functional genomic landscape of acute myeloid leukaemia. Nature 562, 526–531 (2018).
Glass, J. L. et al. Epigenetic Identity in AML depends on disruption of nonpromoter regulatory elements and is affected by antagonistic effects of mutations in epigenetic modifiers. Cancer Discov. 7, 868–883 (2017).
Sandoval, J. E., Huang, Y. H., Muise, A., Goodell, M. A. & Reich, N. O. Mutations in the DNMT3A DNA methyltransferase in acute myeloid leukemia patients cause both loss and gain of function and differential regulation by protein partners. J. Biol. Chem. 294, 4898–4910 (2019).
Gowher, H. & Jeltsch, A. Enzymatic properties of recombinant Dnmt3a DNA methyltransferase from mouse: the enzyme modifies DNA in a non-processive manner and also methylates non-CpA sites. J. Mol. Biol. 309, 1201–1208 (2001).
Emperle, M., Rajavelu, A., Reinhardt, R., Jurkowska, R. Z. & Jeltsch, A. Cooperative DNA binding and protein/DNA fiber formation increases the activity of the Dnmt3a DNA methyltransferase. J. Biol. Chem. 289, 29602–29613 (2014).
Rajavelu, A., Jurkowska, R. Z., Fritz, J. & Jeltsch, A. Function and disruption of DNA methyltransferase 3a cooperative DNA binding and nucleoprotein filament formation. Nucleic Acids Res. 40, 569–580 (2012).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Ryan D. MethylDackel GitHub Repository. https://github.com/dpryan79/MethylDackel (2022).
Cole, C. B. et al. Haploinsufficiency for DNA methyltransferase 3A predisposes hematopoietic cells to myeloid malignancies. J. Clin. Investig. 127, 3657–3674 (2017).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
Cingolani, P. et al. Using drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet 3, 35 (2012).
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
Vaisvila, R. et al. Enzymatic methyl sequencing detects DNA methylation at single-base resolution from picograms of DNA. Genome Res. 31, 1280–1289 (2021).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Frankish, A. et al. Gencode 2021. Nucleic Acids Res. 49, D916–D923 (2021).
Jeong, M. et al. Large conserved domains of low DNA methylation maintained by Dnmt3a. Nat. Genet. 46, 17–23 (2014).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Ghandi, M. et al. Next-generation characterization of the cancer cell line Encyclopedia. Nature 569, 503–508 (2019).
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE Blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
Rees, M. G. et al. Correlating chemical sensitivity and basal gene expression reveals mechanism of action. Nat. Chem. Biol. 12, 109–116 (2016).
Gadagkar, S. R. & Call, G. B. Computational tools for fitting the Hill equation to dose-response curves. J. Pharm. Toxicol. Methods 71, 68–76 (2015).
Lee et al. Increased local DNA methylation disorder in AMLs with DNMT3A-destabilizing variants and its clinical implication (Zenodo) https://doi.org/10.5281/zenodo.14201865 (2024).
Acknowledgements
This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C1876) and a grant from Korea Ministry of Science and ICT (grant number: 2021R1A2C3005360) (to Y.K.). This research was also supported by a grant (grant number: NRF-2020R1A2B5B03001517) from the National Research Foundation of Korea (to J.S.), and by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [No.RS-2021-II211343, Artificial Intelligence Graduate School Program (Seoul National University)] (to S.K.). Genome Opinion Inc. implemented and optimized the targeted enzymatic methyl-seq panel (LifeEx EM) and provided the sequencing data.
Author information
Authors and Affiliations
Contributions
D.L., B.K., S-H.K., C-H.S., J-J.S., J.K., S.K., and Y.K. designed research. D.L., B.K., J-J.S., J.K., and Y.K. performed research. S-H.K., J.B., J.H., D.Y.S., C-H.S., J-J.S., J.K., S-S.Y., and Y.K. contributed vital new reagents or analytical tools. D.L., B.K., S-H.K., J-J.S., J.K., S.K., and Y.K. analyzed data, D.L., B.K., J.B., J.H., D.Y.S., S.J, S-S.Y., S.K., and Y.K. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
Y.K. is the founder and CEO of Genome Opinion. C.H.S. is the executive director and owns stock in Genome Opinion. The other authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lee, D., Koo, B., Kim, S. et al. Increased local DNA methylation disorder in AMLs with DNMT3A-destabilizing variants and its clinical implication. Nat Commun 16, 560 (2025). https://doi.org/10.1038/s41467-024-55691-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-55691-z