Fig. 2: Flowchart illustration of the CPEL method.
From: Detection of haplotype-dependent allele-specific DNA methylation in WGBS data

The CPEL method first requires genome-wide identification of haplotypes, performed via read-based SNP phasing using WGS and SNP data, followed by mapping WGBS methylation reads to the homologous alleles of each haplotype according to their allele of origin. Using the WGBS data assigned to each allele, the CPEL method then computes a maximum-likelihood estimate \(\widehat{U}({\bf{x}})\) of the potential energy landscape (PEL) of the methylation state x, given by Eq. (2), determines a PDM estimate \(\widehat{p}({\bf{x}}) \sim \exp \{-\widehat{U}({\bf{x}})\}\), and summarizes methylation stochasticity in terms of the MML μ(X) and the NME h(X) of the random methylation state X, as well as in terms of the JSD \(D({\widehat{p}}_{1},{\widehat{p}}_{2})\) between the two estimated PDMs \({\widehat{p}}_{1}({\bf{x}})\) and \({\widehat{p}}_{2}({\bf{x}})\). Subsequently, the CPEL method performs hypothesis testing using the two test statistics TMML and TNME in Eqs. (4) and (5) in order to identify haplotypes demonstrating significant imbalances in MML and NME (MML-haps and NME-haps), as well as the test statistic TPDM in Eq. (6), in order to identify haplotypes that exhibit significant differences between the two PDMs associated with their homologous paternal and maternal alleles (PDM-haps). To perform this step, the CPEL method uses an one-sided empirical bootstrap approach that estimates the P value of a test by \(\widehat{p}=(1/L)\mathop{\sum }\nolimits_{l = 1}^{L}I[{t}_{l}\ge {t}_{* }]\), where t1, t2, …, tL are test statistic values appropriately drawn from the null distribution via bootstrapping, \(t_{* } \) is the observed test statistic value, and I[ ⋅ ] is the Iverson bracket. Following a Benjamini–Hochberg step for multiple hypothesis testing correction, the CPEL method outputs three distinct lists of haplotypes, MML imbalanced haplotypes, NME imbalanced haplotypes, and PDM imbalanced haplotypes, together with their corresponding Q-values. Haplotypes associated with Q-values smaller than 0.05 are considered to be statistically significant.