Figure 3 | Heredity

Figure 3

From: From next-generation resequencing reads to a high-quality variant data set

Figure 3

Alignment: quality assessment and postprocessing. Quality assessment: Examples: (a) A correctly aligned region (reads are shown as gray vertical bars with SNPs indicated as colored letters). (b) A spurious alignment where reads exhibit many small insertions (indicated as purple Is), deletions (shown as black horizontal lines) and SNPs. Local realignment: Examples: (c) Pre-realignment: suspicious-looking interval, covered by reads exhibiting both mismatching nucleotides and small indels at different positions that would benefit from a local realignment. (d) Post-realignment: reads were locally realigned such that the number of mismatching bases is minimized across all reads. Duplicate removal: Examples: (e) Pre-removal of duplications: duplicates (blue boxes) manifest themselves as high coverage read support. (f) Post-removal of duplications: no excess coverage because of identical duplicates. Base quality score recalibration: Examples: Reported versus empirical quality scores (g) before and (h) after recalibration. Empirical quality scores were calculated by PHRED-scaling the observed rate of mismatches with the reference genome. Bases with a quality value of <5 (indicated in light blue) were ignored during the recalibration. Residual error for each of the 16 genomic dinucleotide contexts (for example, the AC contexts refers to a site in a read where the current nucleotide, a cytosine (C), is preceded by an adenine (A)) (i) before and (j) after recalibration. Residual error by machine cycle (with positive and negative cycle numbers given for the first and second read in a pair) (k) before and (l) after recalibration. Examples (af) were plotted using IGV (Robinson et al., 2011). Examples (gl) were plotted using GATK (McKenna et al., 2010; DePristo et al., 2011; Van der Auwera et al., 2013).

Back to article page