Supplementary Figure 9: Performance evaluation of the deep learning network and other classifiers.
From: Predicting the clinical impact of human mutation with deep neural networks

a, Accuracy of the deep learning network PrimateAI at predicting a benign consequence for a test set of 10,000 primate variants that were withheld from training and comparison with other classifiers, including SIFT, PolyPhen-2, CADD, REVEL, M-CAP, LRT, MutationTaster, MutationAssessor, FATHMM, PROVEAN, VEST3, MetaSVM, MetaLR, MutPred, DANN, FATHMM-MKL_coding, Eigen, GenoCanyon, integrated_fitCons, and GERP. The y axis represents the percentage of primate variants classified as benign, based on normalizing the threshold for each classifier to its 50th-percentile score using a set of 10,000 randomly selected variants that were matched to the primate variants for trinucleotide context to control for mutational rate and gene conversion. b, Comparison of the performance of the PrimateAI network in separating de novo missense variants in DDD cases versus controls, along with the 20 existing methods listed above. The y axis shows the P values of the Wilcoxon rank-sum test for each classifier. c, Comparison of the performance of the PrimateAI network in separating de novo missense variants in DDD cases versus unaffected controls within 605 disease-associated genes, with the 20 methods listed above. The y axis shows the P values of the Wilcoxon rank-sum test for each classifier.