Fig. 2: Statistics of variants from gnomAD, ClinVar and HGMD databases aggregated in the G2P portal.

a–c, Distribution of variant types (single nucleotide variation (SNV) versus non-SNV; insertion, deletion and inversion) and associated protein consequences (missense, synonymous, nonsense, frameshift, in-frame indel and others for all other protein consequences) among 20 million protein-coding variants in gnomAD (a), ClinVar (b) and HGMD (c) databases. Among all databases, a majority of human protein-coding variants are SNV occurring missense mutations. d, Distribution of gnomAD variants categorized by AFs: very rare; AF < 0.1%, rare; 0.1% ≤ AF < 0.5%, low frequency; 0.5% ≤ AF < 5%, and common; AF ≥ 5%. The distributions of each AF group are illustrated across different protein consequences (missense, synonymous, nonsense, frameshift and in-frame indel). e, Distribution of the clinical significance of ClinVar variants (PLP, BLB, VUS/CI and others) displayed across different protein consequences. f, Distribution of confidence levels (high or low) for HGMD variants across different protein consequences.