Fig. 1: Evaluation of AlphaMissense compared to expert-curated genotype–phenotype association databases within a rare disease cohort.

a Gene coverage comparison across missense variant effect prediction methods (VEPs). A total of 17,530 genes are covered by the four VEPs, and variants in these genes serve as targets for further analysis. b Counts of likely pathogenic variants predicted by AlphaMissense (> 0.567), REVEL (> 0.75), ESM1b (< –7.5), and BayesDel (> 0.0692655) among ClinVar pathogenic (ClinVar_P) and likely pathogenic (ClinVar_LP) variants in the cohort. The y-axis represents the proportion of total ClinVar_P (N = 475) or ClinVar_LP (N = 903) variants found in the cohort. Numeric labels indicate variant counts for each category and method. c Counts and proportions of variants reported in ClinVar among all variants classified as likely pathogenic by each method (top panel). Stratification of variants classified as likely pathogenic by each method according to their ClinVar classifications (bottom panel). d Precision and recall metrics for each prediction method (AM AlphaMissense). e Discrepancies for variants discovered in our cohort that are also annotated in both HGMD Professional (DM and DM? classes) and ClinVar, comparing AlphaMissense predictions with the classifications provided by these databases.