Table 1 Properties of predictive models for six tools

From: Biological relevance of computationally predicted pathogenicity of noncoding variants

Method

Assumption of pathogenicity

Predictors

Modeling approaches

Performance (AUROC)a

CADD

Evolutionary fitness

Evolutionary parameters, ENCODE summaries, functional annotations, population frequencies

Support vector machines

0.92b

CATO

Molecular functions

Cell type- and tissue-specific assays, evolutionary parameters, functional annotations

Logistic regression

NAc

DeepSEA

Molecular functions

Local sequences, evolutionary parameters

Deep learning, Logistic regression

0.85

EIGEN

Noned

Evolutionary parameters, ENCODE summaries, population frequencies

Unsupervised learning

0.79

GWAVA

DAVs vs. CPPs

Evolutionary parameters, ENCODE summaries, population frequencies

Random forests

0.97

LINSIGHT

Evolutionary fitness

Evolutionary parameters, ENCODE summaries, functional annotations

Generalized linear model

0.96

  1. AUROC = area under the receiver operator characteristic curve, DAV = disease-associated variant, CPP = common population polymorphism
  2. aHighest AUROC values in classifying DAVs and CPPs reported in the original publications
  3. bCADD reported AUROC values that mixed coding and noncoding variants
  4. cCATO predicts transcription factor occupancy instead of pathogenicity
  5. dEIGEN uses an unsupervised learning approach and thus makes no assumption of pathogenicity during training