Table 2 Performance comparison of assembly on Nanopore and PacBio HiFi datasets

From: De novo diploid genome assembly using long noisy reads

Dataset

Pipeline

Size (Mb)

NG50 (Mb)

Quality (reference-based)

Quality (k-mer-based)

BUSCO (%)

Hamming error (%)

Phase block NG50 (Mb)

Intra-block

switch error (%)

A. thaliana

Col-0 × C24

ONT, 106X

Ref

133.3/119.2

26.2/23.8

–/–

44.7/44.0

99.3/99.1

0.14/0.01

12.7/23.8

0.15/0.02

Canu +

Purge_dups (pri/alt)

125.9/68.3

8.1/0.0

22.0/23.5

29.8/28.7

96.6/37.4

37.95/7.02

0.1/0.0

8.33/5.71

Flye + HapDup (dual)

136.4/136.5

2.1/2.1

25.6/25.0

28.5/28.3

97.8/97.8

11.85/13.62

0.5/0.5

0.66/0.67

Shasta (pri/alt)

126.3/105.0

0.8/0.5

22.4/23.8

28.9/30.6

94.9/92.2

11.75/7.96

0.3/0.3

7.11/5.39

PECAT (pri/alt)

131.2/123.9

14.3/7.7

30.2/30.6

31.4/32.1

98.5/98.2

3.18/0.32

9.1/7.1

0.35/0.36

PECAT (dual)

131.1/125.1

14.3/14.0

30.4/30.7

31.6/32.2

98.6/98.4

1.07/0.38

7.8/7.9

0.37/0.37

B. taurus

Bison × Simmental

ONT, UL, 200X

Ref

2651.6/2861.7

87.8/104.4

–/–

38.7/36.1

93.1/95.7

0.67/0.48

87.8/104.4

0.74/0.61

Flye + HapDup (dual)

2713.0/2713.2

33.4/33.4

25.0/25.0

28.5/28.4

83.4/83.3

6.52/6.57

8.2/8.1

0.44/0.44

Shasta (pri/alt)

2815.7/2406.9

0.3/0.2

15.1/15.1

22.9/24.0

61.3/59.1

22.05/25.65

0.1/0.1

23.26/25.79

PECAT (pri/alt)

2968.2/2770.0

94.3/93.8

25.6/25.6

28.8/28.5

83.6/83.1

0.39/0.38

79.6/86.1

0.37/0.38

PECAT (dual)

2970.2/2856.3

94.3/93.8

25.5/25.6

28.8/28.8

83.3/83.4

0.39/0.38

79.5/86.1

0.37/0.39

HG002

ONT, R9, UL 59X

Ref

2959.3/3061.7

146.7/154.4

–/–

58.6/59.4

92.8/95.8

0.15/0.08

90.4/106.7

0.02/0.03

Flye+ HapDup (dual)

2932.2/2932.4

49.9/49.9

33.2/33.2

40.9/40.8

94.2/94.1

7.91/8.24

20.5/19.6

0.08/0.09

Shasta (pri/alt)

3007.0/2443.9

24.2/2.9

28.8/30.1

34.7/38.5

92.4/79.6

18.93/11.42

0.8/0.5

8.66/5.36

PECAT (pri/alt)

3059.3/2857.7

92.9/15.0

30.8/30.8

41.8/41.6

94.7/91.0

15.72/1.48

22.2/13.0

0.08/0.11

PECAT (dual)

3057.2/2927.2

92.7/74.5

30.8/30.8

41.8/41.8

94.6/91.6

9.67/10.98

30.8/23.6

0.08/0.11

HG002

ONT, R10,

UL, 116X

Flye+ HapDup (dual)

2954.7/2952.1

61.9/59.3

36.4/36.2

48.9/48.2

95.8/95.7

3.13/3.02

46.5/39.1

0.02/0.03

Shasta (pri/alt)

3095.4/2729.9

45.3/33.8

34.3/34.4

42.1/48.7

95.7/92.2

3.05/0.86

17.2/15.4

0.30/0.44

PECAT (pri/alt)

3159.7/2895.6

91.4/59.9

38.0/38.0

49.0/50.1

95.8/92.4

3.75/0.30

59.4/58.0

0.04/0.05

PECAT (dual)

3153.4/2917.8

91.4/80.2

38.0/38.0

49.2/50.0

95.8/92.7

2.99/1.49

63.8/59.2

0.04/0.06

HG002 HiFi, 36X

Hifiasm (pri/alt) *

3112.2/2910.3

89.9/0.4

50.0/50.0

55.3/56.5

95.6/77.0

24.38/1.67

1.1/0.3

0.11/0.01

Hifiasm (dual) *

3015.3/3077.5

44.8/64.5

50.0/50.0

54.9/57.8

95.5/94.9

34.80/24.10

1.0/1.0

0.16/0.11

Hifiasm (Hi-C) *

3075.0/2908.7

55.1/55.1

50.0/50.0

54.9/57.9

95.7/92.5

0.45/0.25

20.6/20.4

0.09/0.06

  1. ‘Size’ is the total number of base pairs in all contigs generated by assemblers. ‘NG50’ is the length of the shortest contig for which longer and equal length contigs cover at least 50 of genome size. The genome sizes of A. thaliana, B. taurus, and HG002 that we used for evaluation are 130 M, 2.7 G, and 3 G, respectively. ‘BUSCO’ is gene completeness evaluated by BUSCO. ‘Hamming error’ is the fraction of nondominant parental-specific k-mers in a contig. ‘Quality (reference-based)’ is the metric ‘q50’ evaluated by Pomoxis. ‘Quality (k-mer-based)’, ‘Phase block NG50’, and ‘Intra-block switch error’ are evaluated by mercury. ‘pri/alt’ represents primary/alternate assembly format. ‘dual’ represents dual assembly format. ‘ONT’ indicates the dataset is composed of Nanopore reads. ‘UL’ indicates the reads are ultra-long reads. ‘HiFi’ indicates the dataset is composed of PacBio HiFi reads. ‘Hi-C’ represents that the assembly uses the additional Hi-C reads. The two sets of contigs are separately reported in each cell. ‘Ref’ is the reference genome. The sources of the reference genomes are illustrated in Supplementary Table 17. For B. taurus and HG002, Canu didn’t finish the assembly in 3 weeks, so it is excluded. Asterisks mark previously published assemblies.