Table 2 Quality control (QC) checks of variants for rare disease diagnosis.
- QC checks of variant data fall into three main categories, listed in bold above. Although some tools can be used for many of these steps, we illustrate here which QC steps they are actually used for in practice. Note the clarifications for some of the QC tools and steps listed in footnotes a–e. Tool citations are listed in Extended Data Table 1.
- ES exome sequencing, GS genome sequencing, SNV single-nucleotide variant.
- aBCFtools refers to the Wellcome Trust Sanger Institute’s suite of tools: BCFtools, VCFtools, SAMtools, and HTSlib.
- bThese tools either call de novo variants from sequencing reads to reduce false positive calls or provide de novo frequencies where a high frequency indicates a likely false positive.
- cThe expected transition (Ts) to transversion (Tv) ratios assume variants are called with respect to the human reference sequence; if variants are called with respect to computed ancestral alleles, the expected Ts/Tv ratio for ES should be ~1.
- dExpected relatedness between family members is estimated using a “kinship coefficient”; unexpectedly low kinship implies a family member is not as related as was originally assumed, unexpectedly high kinship suggests consanguinity, and maximal kinship implies an accidental sample duplication.
- eMosaicism—where an individual contains a mix of genetically distinct cells—may be relevant for disease rather than only indicative of sequencing errors.
