Figure 3 | Modern Pathology

Figure 3

From: Targeted next generation sequencing of clinically significant gene mutations and translocations in leukemia

Figure 3

Data analysis pipeline. FASTQ files containing sequence and quality scores were output from the Illumina HiSeq and aligned to the human reference genome (build 37/hg19) using either BWA or Novoalign on a server cluster. The aligned data was then stored as a sorted BAM file and analyzed for SNVs, indels, and translocations. SNVs were called using the Unified Genotyper function of the GATK package. SNVs were further filtered by flagging known polymorphisms in dbSNP (build 130) and by removing SNVs occurring in non-coding regions. Small and medium size indels (<100 bp) were identified using Pindel and the GATK Indel Genotyper V2 software packages with default parameters. Indels occurring outside of coding regions or splice sites were ignored. Translocations were identified by first running Breakdancer to identify paired-end reads in which one end mapped to a gene in the capture region and the other did not. As this methodology is subject to considerable noise, largely because of sequence repeats and areas of homology, we then performed a second level of verification using Slope to find chimeric single end reads within the regions identified by Breakdancer. Finally, results from all three branches of the analysis pipeline were merged into single variant calling format (VCF) file.

Back to article page