Fig. 2: MAVERICK effectively classifies the pathogenicity of a wide-range of protein-altering variants.
From: Deep structured learning for variant prioritization in Mendelian diseases

a Areas under the receiver operating characteristic curve for the known genes and novel genes test sets. b Areas under the precision-recall curve for the known genes and novel genes test sets. c, d Box plots of MAVERICK classification performance on each type of protein-altering variant that it can assess. The y-axis shows the distribution of MAVERICK predictions for each variant type where the value plotted for any given variant is the probability for the true class label (e.g., benign variants are plotted by their benign score). The boxes show the median, 25th and 75th percentiles, while the whiskers show the 5th and 95th percentiles and any remaining points are shown as blue dots. The number of variants used for each box plot is shown at the bottom of the figure. c The performance for the known genes test set. d The performance for the novel genes test set. e MAVERICK predictions for every possible missense variant on the known dominant spastic paraplegia gene SPAST. For each variant, MAVERICK’s predicted benign score is plotted on the bottom subplot, the recessive score is plotted on the middle subplot, the dominant score is plotted on the top subplot. A diagram of domains in the gene is given at the top. ClinVar pathogenic variants are plotted in red. ClinVar benign variants are plotted in green. ClinVar variants of uncertain significance are plotted in blue.