Table 1 Outcome from the hard filters utilized in the QC pipeline, at the variant, genotype, and sample levels, for ClinVar-indexed biallelic sites only.

From: Empirical design of a variant quality control pipeline for whole genome sequencing data using replicate discordance

Variant Level

Site Removal Criterion

Sequential Filtering

Independent Filtering

# Pass (% Pass), Variants

Monomorphic

38,402 (100)

38,402 (100)

1

Missingness ≥ 5%

38,359 (99.89)

38,776 (99.79)

2

Blacklisted region or LCR

38,359 (100)

38,402 (100)

3

DP < 25,000

37,771 (98.47)

38,098 (98.05)

4

MQ < 58.75 or MQ > 61.25

37,025 (98.02)

37,696 (97.01)

5

VQSLOD < 7.81

36,415 (98.35)

37,080 (95.43)

6

InbreedingCoeff < –0.8

35,751 (98.18)

38,102 (98.06)

Genotype Level

Genotype Removal Criterion

# Pass (% Pass), Genotypes

7

DP < 10

9,253,660 (99.94)

10,037,482 (99.74)

8

GQ < 20

8,722,641 (94.26)

9,435,150 (93.75)

Sample Level

Sample Removal Criterion

# Pass (% Pass), Samples

9

Missingness ≥ 10%

259 (100)

259 (100)

  1. The third column represents the number and percentage of variants, genotypes, and samples remaining following the serial application of all nine filters. The fourth column presents the outcome of applying each individual filter to the full ClinVar-indexed dataset (38,402 biallelic variants), indicating each filter’s absolute removal rate. Of 17,585,919 biallelic sites genome-wide, 38,402 matched to ClinVar (which contains 416,908 variants in the 2019-01-02 version used here). Matching was performed using ClinVar version 2019-01-02.