Table 5 Non-reference concordance rate after running each hard filter in the QC pipeline in succession at the variant level, for biallelic and triallelic variants.

From: Empirical design of a variant quality control pipeline for whole genome sequencing data using replicate discordance

Variant Filter

Site Removal Criterion

Concordance Rate of Passing Sites (%)

All Biallelic

Biallelic SNVs

Biallelic Indels

All Triallelic

Monomorphic

98.532

98.690

96.887

84.155

1

Missingness ≥ 5%

98.533

98.690

96.887

84.155

2

Within blacklisted region or LCR

98.533

98.690

96.887

84.155

3

DP < 25,000

98.798

98.904

97.673

87.570

4

MQ < 58.75 or MQ > 61.25

99.401

99.482

98.536

92.704

5

InbreedingCoeff < –0.8

99.404

99.486

98.529

92.671

6

VQSLOD < 7.81

99.694

99.810

98.529

94.358

  1. These values were calculated following removal of non-‘PASS’ sites according to GATK HaplotypeCaller. A pair of genotypes is concordant when the genotypes of a duplicate pair are identical. The change in concordance rate was always positive. Prior to QC, 98.532% of the 30,137,375 replicate non-reference genotypes at genome-wide biallelic sites were concordant; following QC, 99.694% of the 25,180,411 remaining non-reference genotypes were concordant. Prior to QC, 84.155% of the 2,604,018 replicate genotypes at genome-wide triallelic sites were concordant; following QC, 94.358% of the 1,522,106 remaining genotypes were concordant.