Table 2 Comparison of assembly-based metrics (with % improvement) evaluated from testing E. coli: with downsampled short reads (D-SR) having 18x coverage (lowest coverage) and original short reads, E. coli (Sequel-sequenced) S. cerevisiae, A. funestus (merged flowcells) on proovread, LoRDEC, CoLoRMap, and HECIL.

From: HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning

Data

Evaluation Metric

Original

proovread

LoRDEC

CoLoRMap

HECIL (Iter 1)

HECIL (Iter 5)

E. coli (D-SR)

# Contigs

182

29 (84.0)

28 (84.6)

24 (86.8)

20 (89.0)

Largest contig

69,266

567,484 (719.2)

885,819 (1178.8)

813,262 (1074.1)

1,204,631 (1639.1)

Total length

3,508,197

4,235,031 (20.7)

4,068,085 (15.9)

4,036,161 (15.0)

4,596,013 (31.0)

N50

24,663

189,712 (669.2)

179,638 (628.3)

184,367 (647.5))

232,826 (844.0)

NG50

17,847

212,621 (1091.3)

190,621 (968.0)

210,913 (1081.7)

267,311 (1397.7)

Aligned base (%) - Ref/Query

83/84

87/89

92/93

48/92

97/100

Average Identity (1–1) - Ref/Query

88/88

93/93

97/97

97/97

99/99

E. coli

# Contigs

182

26 (85.7)

24 (86.8)

19 (89.5)

19 (89.5)

17 (90.6)

Largest contig

69,266

605,792 (774.5)

920,903 (1229.5)

1,089,140 (1472.4)

1,223,474 (1666.3)

1,481,824 (2039.3)

Total length

3,508,197

4,629,719 (31.9)

4,623,137 (31.7)

4,624,793 (31.8)

4,838,971 (37.9)

5,106,276 (45.5)

N50

24,663

231,774 (839.7)

226,456 (818.2)

239,066 (869.3)

256,830 (941.3)

288,192 (1068.5)

NG50

17,847

231,774 (1198.6)

226,456 (1168.8)

239,066 (1239.5)

294,635 (1550.8)

344,848 (1832.2)

Aligned base (%) - Ref/Query

82/87

92/92

98/98

54/94

99/99

99/99

Average Identity (1–1) - Ref/Query

91/91

95/95

96/96

97/97

98/98

99/99

E. coli (Sequel)

# Contigs

84

34 (59.5)

29 (65.4)

29 (65.4)

27 (67.8)

24 (71.4)

Largest contig

88,975

775,707 (771.8)

884,469 (894.0)

1,363,678 (1432.6)

1,627,011 (1728.6)

1,865,932 (1997.1)

Total length

5,389,574

6,012,453 (11.5)

5,821,596 (8.0)

5,819,632 (7.9)

6,374,798 (18.2)

6,773,369 (25.6)

N50

18,611

119,735 (543.3)

117,028 (528.8)

127,892 (587.1)

141,213 (658.7)

162,580 (773.5)

NG50

13,903

116,255 (736.1)

113,036 (713.0)

118,087 (749.3)

122,389 (780.3)

149,637 (976.2)

Aligned base (%) - Ref/Query

78/80

89/89

95/95

67/92

97/97

98/98

Average Identity (1–1) - Ref/Query

88/88

92/92

92/92

93/93

95/96

98/98

S. cerevisiae

# Contigs

26

32 (−23.0)

28 (−7.6)

24 (7.6)

24 (7.6)

23 (11.5)

Largest contig

1,543,990

1,537,979 (−0.3)

1,552,711 (0.5)

1,555,857 (0.7)

1,558,190 (0.9)

1,713,201 (10.9)

Total length

12,341,981 (1.1)

12,485,995 (1.1)

12,497,078 (1.2)

12,315,869 (−0.2)

12,435,702 (0.7)

12,731,203 (3.1)

N50

777,602

777,713 (0.0)

818,962 (5.3)

932,935 (19.9)

1,018,591 (30.9)

1,308,313 (68.2)

NG50

777,602

777,713 (0.0)

818,962 (5.3)

932,935 (19.9)

1,538,190 (97.8)

2,005,346 (157.8)

Aligned base (%) - Ref/Query

95/90

91/91

95/95

78/97

99/99

99/99

Average Identity (1–1) - Ref/Query

92/92

93/93

97/97

98/98

99/99

99/99

A. funestus

# Contigs

998

712 (28.6)

788 (21.0)

847 (15.1)

633 (36.5)

543 (45.5)

Largest contig

71,070

36,306 (−48.9)

75,298 (5.9)

72,306 (1.7)

84,490 (18.8)

94,937 (33.5)

Total length

25,405,949

8,371,287 (−67.0)

26,745,092 (5.2)

26,802,126 (5.5)

28,954,268 (13.9)

32,371,298 (27.4)

N50

13,038

14,802 (13.5)

15,118 (15.9)

14,555 (11.6)

16,409 (25.8)

19,014 (45.8)

NG50

71,070

45,637 (−35.7)

77,294 (8.7)

76,306 (7.3)

84,490 (18.8)

91,303 (28.4)

Aligned base (%) - Ref/Query

20/87

23/93

27/96

20/95

31/99

37/99

Average Identity (1–1) - Ref/Query

83/83

87/87

95/95

92/92

98/98

99/99

  1. For the case of HECIL, metrics are reported before and after using the iterative learning algorithm; specifically, iteration 1 (the core algorithm) and iteration 5 (with four rounds of learning) are shown.