Table 1 Properties of predictive models for six tools

Method	Assumption of pathogenicity	Predictors	Modeling approaches	Performance (AUROC)^a
CADD	Evolutionary fitness	Evolutionary parameters, ENCODE summaries, functional annotations, population frequencies	Support vector machines	0.92^b
CATO	Molecular functions	Cell type- and tissue-specific assays, evolutionary parameters, functional annotations	Logistic regression	NA^c
DeepSEA	Molecular functions	Local sequences, evolutionary parameters	Deep learning, Logistic regression	0.85
EIGEN	None^d	Evolutionary parameters, ENCODE summaries, population frequencies	Unsupervised learning	0.79
GWAVA	DAVs vs. CPPs	Evolutionary parameters, ENCODE summaries, population frequencies	Random forests	0.97
LINSIGHT	Evolutionary fitness	Evolutionary parameters, ENCODE summaries, functional annotations	Generalized linear model	0.96

AUROC = area under the receiver operator characteristic curve, DAV = disease-associated variant, CPP = common population polymorphism
^aHighest AUROC values in classifying DAVs and CPPs reported in the original publications
^bCADD reported AUROC values that mixed coding and noncoding variants
^cCATO predicts transcription factor occupancy instead of pathogenicity
^dEIGEN uses an unsupervised learning approach and thus makes no assumption of pathogenicity during training

Quick links

Search