Fig. 1: VEPs have variable performance in annotating known oncogenic mutations.
From: AI cancer driver mutation predictions are valid in real-world data

A Frequency of known oncogenic mutations and variants of unknown significance (VUS) in commonly altered oncogenes and tumor suppressor genes in GENIE as annotated by OncoKB. B Distributions of prediction scores from AlphaMissense and FATHMM from non-pathogenic dbSNPs (N = 7474) and missense mutations in GENIE v.14-public, broken down by their occurrence in oncogenes (OG, N = 408,771), tumor suppressor genes (TSG, N = 506,068) or genes that act as both (OG/TSG, N = 57,592) at the population level, in which all occurrences of missense mutations are included. Points higher on the y-axis corresponded with higher predicted pathogenicity. Boxplots represent mean scores (center line) ± interquartile range (IQR); whiskers span 1.5 × IQR from each quartile, with outliers shown individually. See Fig. S3B for population-level distributions from all VEPs and Fig. S3A for mutation-level distributions. Brackets denote significance from two-sided Tukey’s tests with FDR correction (*: q ≤ 0.05, ****: q ≤ 1e-04). C. AUROCs (± 95%CI) of 12 variant annotation methods in classifying known oncogenic mutations (N = 180,540) and non-oncogenic SNPs (N = 180,540 upsampled from 7474) at the population level. DeLong’s test was used to compare AUROCs with FDR correction. Within each methodological class, pairwise comparisons were performed between the top-performing method and others (*: q ≤ 0.05 marked by black asterisks). Red asterisks denote significant differences (q ≤ 0.05) between each class’s top performer and the overall best method (bolded AUROC). Tracks at left indicate how each method was trained: “Supervised” denotes use of labeled training data; “Human-curated” specifies whether labels originated from manually curated resources (e.g., ClinVar); and “Cancer-trained” denotes use of cancer-specific datasets (e.g., Cancer Genome Census). D. Density plots showing true positive rates (TPR) of AlphaMissense and FATHMM over all genes. TPRs and the number of known oncogenic mutations (N) in select commonly mutated oncogenes and tumor suppressor genes are shown. See Supplemental Appendix for a complete list of TPRs. Source data are provided as a Source Data file.