Fig. 1: Overview of this study for reliable identification of DNAm biomarker candidate pool.
From: Causality-driven candidate identification for reliable DNA methylation biomarker discovery

a Challenges of DNAm biomarker discovery. Current data-based screening methods may generate a less reliable candidate pool, due to confounding factors such as measurement noises and individual characteristics. This deficiency necessitates compensation through subsequent multistep and costly experiments, leading to the overall resource-intensive workflow. b Overview of the proposed causality-driven deep regularization (CDReg) framework that integrates causal thinking, priori guidance, and deep learning. The spatial-relation regularization aligns weight differences with spatial distances to ensure sites with close distances receive similar weights, thereby excluding the spatially isolated noise sites. The contrastive scheme pushes apart paired diseased-normal samples from the same subject to encourage selecting disease-specific differential sites rather than subject-specific ones. c Comprehensive performance evaluation in this study. The experiments include simulations and applications, and the latter involves microarray data from lung adenocarcinoma (LUAD) tissue samples and Alzheimer’s disease (AD) blood samples, as well as WGBS data of prostate cancer (PC) tissue samples. The results demonstrate the comprehensive advantages of CDReg across accurate identification, biological significance, and inter-class discrimination.