Table 2 Summary information of the tools, computing resources and run time in each pipeline step.

From: Design and implementation of a hybrid cloud system for large-scale human genomic research

Pipeline Step

Operation

Application

Location

Total job run time (hour)

Mean job run time (min)

Median job run time (min)

Total Job (job count)

Requested Resouce per Job (# the reason why does not use the location)

Step 1-1

Alignment

bwa ver. 0.7.17 Reference hs38DH.fa (hs38, ALT contigs, decoy contigs, and HLA genes)

System A

24735.4*

652.9

548.9

2273

20 cores/memory 32 Gb

System B

1255.2

753.1

740.1

100

32 cores/memory 120 Gb

System C

95797.4*

648.4

648.4

8865

System C allows only job assignment per compute node.

56 cores/memory 192 Gb

Step 1-2

Variant call

GATK ver. 4.1.4

HaplotypeCaller

System C

181964.2*

971.5

1122.6

11,238

System C allows only job assignment per compute node.

56 cores/memory 192 Gb

Step 2

Genomic DB import

GATK ver. 4.1.4

GenomicDBImport

System A

56,202.6

1064.1

1044.0

3169

Memory 18 Gb

Step 3

Joint-Genotyping

GATK ver. 4.1.4

GenotypeGVCFs

System A

63638.1*

18,445.8

19,352.0

207

Memory 16 Gb

System B

–

–

–

–

#Most of the jobs cannot complete within max running time (2 days).

System C

–

–

–

–

#Cannot process jobs with the overload of I/O access.

System D

56242.7*

5709.9

5834.1

591

memory 16 Gb (chr1)

memory 10 Gb (chr3,9)

System E

45630.9*

1154.7

1377.2

2371

Two cores/memory 16 Gb (r5large)

Step 4

Variant quality score recalibration (VQSR)

GATK ver. 4.1.4

VariantRecalibrator and ApplyVQSR

System A

872.0

5813.4

82.2

28

VariantRecalibrator (INDEL/SNP) memory 32 G/288 Gb

ApplyVQSR memory 128 Gb

Step 5-1

Annotation

SNPEff Ver.4.3i

System A

10.9

25.2

25.5

26

Memory 12 Gb

Step 5-2

Annotation

VEP API Ver.106

DB Ver.105

System A

797.6

1840.6

1791.7

26

Memory 16 Gb

  1. The total jobs, the run time (mean/median/total) and resource allocation of computing resources are summarized in each analysis step.
  2. *Total job run time was estimated from the mean job run time in the logged jobs.