Table 3 Human genetic variation data sets and derived tools.

From: Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases

 

BaylorSeq

BCM

Duke/Columbia

Harvard

Miami

NIH

PacificNW

Stanford

UCLA

Utah

Vanderbilt

WUSTL

Known disease gene databases

 ClinVar

 OMIM

 HGMD: Human Gene Mutation Database

 

  

 

 dbSNP

   

  

  

 CGD: Clinical Genomic Database

         

 

 Orphanet

       

  

 

Healthy human population single-nucleotide variant (SNV)/indel databases

 gnomAD: Genome Aggregation Database

 ExAC: Exome Aggregation Consortium

 

 

 1000 Genomes Project

 

   

 Institution—internal controlsa

 

 

 

 

 EVS: Exome Variant Server

 

 

 

    

 TOPMed: Trans-Omics for Precision Medicine

  

   

 

 

 UK10K

      

   

 Greater Middle East (GME) Variome Project

  

        

 xKJPN: 1000+ Japanese

  

         

 GenomeAsia 100 K Project

  

         

 Iranome

  

         

Human structural variant (SV) databases

 gnomAD-SV: Genome Aggregation Database SVs

 

 

 

 DGV: Database of Genomic Variants

 

 

  

 dbVar: Database of Genomic Structural Variation

 

     

 

 

 ClinGen: Clinical Genome Resource

 

 

   

 

 

 DECIPHER

 

 

  

  

  

 Institution—internal controlsa

        

 

Within-human selective constraint scores

 pLI: probability of loss-of-function (LoF) intolerance

 

 

 Missense (constraint) Z score

 

 

 

  

 pREC: probability of homozygote LoF intolerance

  

    

   

 (sub)RVIS: Residual Variation Intolerance Score

  

  

      

 L-o/e-UF: LoF observed/expected upper-bound fraction

   

  

     

 CCR: constrained coding regions

        

  

 LIMBR: Localized Intolerance Model w/ Bayesian Regression

  

         

 MTR: missense tolerance ratio

  

         

 s_het: selective effect of heterozygous LoF

   

        

 M-o/e-UF: missense observed/expected upper-bound fraction

      

     

 LoFtool

         

  

Tool used by default. Tool used in specific cases or contexts only.b

  1. Knowledge of variation within human populations with and without disease can be effectively used to assess the likelihood of a variant to cause the genetic condition under investigation. Tool and data set citations are listed in Extended Data Table 1.
  2. aHuman sequence variation data sets that are internal to particular institutions and used by clinical sites surveyed here include variants present in patients from Baylor College of Medicine (BCM), the Institute for Genomic Medicine (Duke/Columbia), Brigham Genomic Medicine (Harvard), the NIH Undiagnosed Diseases Program (NIH), Centers for Mendelian Genomics (PacificNW), University of California–Los Angeles (UCLA), the Centre d’Etude du Polymorphisme Humain (Utah), and BioVu (Vanderbilt), and a curated set of copy-number variants (CNVs) detected via genome sequencing (GS) and confirmed via chromosomal microarray analysis (Washington University School of Medicine [WUSTL]).
  3. bThe contexts in which specific human population variant data sets are used include historical reasons (ExAC), when a variant’s gnomAD-derived MAF is 0 or close to 0 (TOPMed), when patients’ inferred ancestry is non-European (TOPMed), Middle Eastern (GME), Japanese (xKJPN), Asian (GenomeAsia), and/or Iranian (Iranome), and when a predicted structural variant impacts a clinically relevant gene (gnomAD-SV, DGV, ClinGen, DECIPHER).