Table 1 Computational steps, dependency conditions and their execution time in the NGS workflow.

From: Accelerating next generation sequencing data analysis with system level optimizations

Step ID

Job step name

Application name

Application module

Input file name

Application parameters

Output file name

Recommended no. of cores

Job dependency condition

% of execution time

S1

Map to Reference

BWA KIT

Seqtk, trimadap, SamTools, bwa mem, samblaster

*.fastq.gz

Default

*.bam

N/M

6.5%

S2

Build a standard BAM INDEX

sambamba

Index

*.bam

Default

*.bam.bai

1

S1

0.5%

S3

Realigner TargetCreator

GATK

Target creator

*.aln.bam

T RealignerTargetCreator, −R hs37d5.fa, −known Mills_and_1000G_ gold_standard.indels.vcf.gz,

*.realigner. intervals

4 or 8

S2

3%

S4

Indel Realigner

GATK

INDEL

*aln.bam, *.realigner. intervals

T IndelRealigner, −R hs37d5.fa, −known Mills_and_1000G _gold_standard.indels.vcf.gz, −knownIntervals

*.realigned. bam

1

S3

2%

S5

Base Recalibrator

GATK

Base Recalibration

*.realigned. bam

T BaseRecalibrator, −R hs37d5.fa, −knownSites dbsnp_138.vcf.gz

*.recal.table

N/M

S4

13%

S6

Print Reads

GATK

Analyse the Reads

*.realigned. bam, *. recal.table

T PrintReads, −R hs37d5.fa, −BQSR

*.realigned. recal.bam

2 or 4

S5

25%

S7

Haplotype Caller

GATK

Haplotype

*.realigned. recal.bam

T HaplotypeCaller, −R hs37d5.fa, −pairHMM VECTOR_LOGLESS_CACHING, − −emitRef Confidence GVCF, − −variant _index_type LINEAR, − −variant_index_parameter 128000, − −dbsnp Mills_and_1000G_ gold_standard.indels.vcf.gz

*.raw.snps. indels.g.vcf

4 or 8

S6

43%

S8

Variant Recalibrator

GATK

Variant recalibration

*.realigned. bam, *.recal.table

−T BaseRecalibrator, −R hs37d5.fa, −known Mills_and_1000G_ gold_standard.indels.vcf.gz, −BQSR

*.after_recal. table

N

S5

6%

S9

Analyze Covariates

GATK

Analyse the variant

*.recal.table, *.after_ recal. table

−T AnalyzeCovariates −before −after

*.recal_plots. pdf

1

S8

1%

  1. Where, N is the total number of cores and M is the number of CPUs.