Fig. 1
From: AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature

Construction of the Automated Variant Evidence Database (AVADA). Identification of relevant literature: AVADA discovers potentially relevant articles (about the genetic causes of Mendelian diseases) from PubMed, downloads their full text, and again filters potentially relevant articles based on the articles’ full text. Variant mapping: Variant descriptions are detected in articles using 47 manually built regular expressions. Variant descriptions are then linked to mentioned genes to form gene–variant candidate mappings. Gene–variant candidate mappings are filtered using a gene–variant candidate classifier and converted to genomic coordinates. AVADA ultimately retrieves (unvalidated) evidence about 203,536 distinct genetic variants in 5827 genes from 61,116 articles.