Fig. 1: The definition and validation of the DRIAD framework. | Nature Communications

Fig. 1: The definition and validation of the DRIAD framework.

From: Machine learning identifies candidates for drug repurposing in Alzheimer’s disease

Fig. 1: The definition and validation of the DRIAD framework.

a Overview of the machine learning framework used to establish potential associations between gene lists and Alzheimer’s disease. (i) The framework accepts as input gene lists derived from experimental data or extracted from database resources or literature. (ii) Given a gene expression matrix, the framework subsamples it to a particular gene list of interest, and (iii) subsequently trains and evaluates through cross-validation a predictor of Braak stage of disease. (iv) The process is repeated for randomly selected gene lists of equal lengths to determine whether predictor performance associated with the gene list of interest is significantly higher than what is expected by chance. b AMP-AD datasets used by the machine learning framework. The three datasets used to evaluate the predictive power of gene lists are provided by The Religious Orders Study and Memory and Aging Project (ROSMAP), The Mayo Clinic Brain Bank (MAYO), and The Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB). The schematic highlights regions of the brain that are represented in each dataset. The MSBB dataset spans four distinct regions, which are designated using Brodmann (BM) area codes. c Performance of predictors trained on gene lists reported in previous studies of AMP-AD datasets. The predictors are evaluated for their ability to distinguish early-vs-late disease stages with performance reported as area under the ROC curve (AUC). The vertical line on each row denotes predictor performance associated with a gene list reported in the literature, while the background distribution is constructed over randomly selected lists of matching lengths. Each row is annotated with the pubmed ID of the study, the supplemental resource that contained the gene list, and a short keyphrase providing functional context. Shown unadjusted p-values were computed with a one-sided empirical test, by counting the fraction of randomly selected lists in the background distribution that outperformed the corresponding literature lists.

Back to article page