Table 2 Percentage of rare variant types in AVADA, HGMD, and ClinVar

From: AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature

Variant type

AVADA

HGMD

ClinVar

Stoploss

0.08%

0.14%

0.10%

Nonframeshift indel

1.87%

3.12%

2.62%

Splicing

4.05%

7.35%

3.82%

Stopgain

12.37%

13.87%

8.58%

Frameshift

14.60%

22.16%

11.22%

Missense

67.03%

53.36%

73.67%

  1. The table shows fractions of variant types in roughly synchronized time-wise AVADA, HGMD, and ClinVar, each subset to rare variants (≤3% allele frequency in a large healthy control cohort2) of the shown variant types. Despite being based purely on automatic natural language processing methods, AVADA (unvalidated) variant type fractions are always within the range between all variants deposited in manually curated HGMD and ClinVar at roughly synchronized timestamps ±1%.
  2. AVADA Automatic Variant Evidence Database, HGVS Human Genome Variation Society, HGMD Human Gene Mutation Database.