Fig. 3
From: AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature

Automatic variant retrieval results. (a) Top ten journals in AVADA. AVADA retrieved variants from 3159 articles in Human Mutation, 2330 articles in American Journal of Human Genetics, 2042 articles in Human Molecular Genetics, etc. (b) Top ten journals in all of HGMD. Similar to AVADA, the top three journals are Human Mutation, American Journal of Human Genetics, and Human Molecular Genetics. Reassuringly, the two lists share nine of the top ten journals even though HGMD is manually curated whereas AVADA automatically retrieves variant evidence, but does not validate it. (c) (Unvalidated) AVADA variants intersected with all curated disease-causing variants in HGMD (“DM” variants only) and ClinVar (“likely/pathogenic” variants only). AVADA retrieves 85,888 variants also in the HGMD set (subset to disease-causing variants) and 26,033 variants also in the ClinVar set (subset to pathogenic and likely pathogenic variants). (d) AVADA’s potential value in patient diagnosis. We enumerate the number of patient diagnostic variants found in each of four databases, for 245 Deciphering Developmental Disorders (DDD) diagnosed patients. Curated HGMD and ClinVar (predating the DDD publication) are subset to disease-causing (“DM”), and “likely/pathogenic,” respectively. For tmVar and AVADA, we manually validated all diagnostic evidence shown. AVADA completely subsumes and almost triples abstract-based tmVar. And while ClinVar alone implicates 21 diagnostic variants, AVADA offers unvalidated evidence for an additional 27 variants, of which 18 are valid, virtually doubling ClinVar’s reach.