Table 2 The quality and performance of long-read assembly with NECAT.

From: Efficient assembly of nanopore reads via highly accurate and intact error correction

Genome

Pipeline

Assembly size (Mb)

Contig

NG50 (Kb)

NGA50 (Kb)

MA/local MA

QV (pre-/post-polish)

BUSCO

Correct/contig/total time

E. coli

Ref.

4.6

1

4642

–/–

–/–/–

Canu

4.6

1

4601

3335

2/18

18.0/22.1

18.4%

26.1/698.1/724.2

Canu + S

4.6

1

4630

3287

3/2

18.6/22.2

19.8%

26.1/8.0/34.1

Flye

4.6

1

4622

3071

2/2

20.2/22.6

20.2%

–/–/630.4

NECAT

4.6

1

4595

3984

2/3

18.5/22.3

19.8%

1.6/1.2/2.8

S. cerevisiae

S228C

12.2

17

924

–/–

–/–/–

Canu

12.7

26

814

703

38/33

22.3/28.5

98.5%

493.3/1029.9/1523.2

Canu + S

12.4

19

815

705

34/29

22.7/28.9

98.2%

493.3/38.4/531.7

Flye

12.3

26

943

706

21/26

21.8/29.0

98.5%

–/–/197.8

NECAT

12.3

19

937

708

26/35

23.1/29.0

98.3%

4.4/4.9/9.3

A. thaliana

TAIR10

119.7

7

23,460

–/–

–/–/–

Canu

113.4

288

6523

445

478/1152

15.6/19.5

98.5%

193.1/1229.9/1423.0

Canu + S

115.6

44

11,071

527

576/1170

15.9/19.6

98.8%

193.1/125.9/319.0

Flye

126.6

154

12,043

627

1085/1962

16.8/18.5

98.7%

–/–/59.4

NECAT

122.9

136

11,157

582

886/1304

16.0/18.9

98.8%

19.8/28.0/47.9

D. melanogaster

dm6

143.7

1870

25,287

–/–

–/–/–

Canu

146.8

499

3509

3240

1307/678

20.2/22.2

91.3%

289.6/1259.2/1548.8

Canu + S

135.8

162

14,456

6473

587/333

20.8/23.2

91.6%

289.6/294.4/584.0

Flye

139.9

593

11,925

5129

558/749

21.4/22.5

89.9%

–/–/127.9

NECAT

143.0

277

18,072

6323

1117/1333

20.2/22.3

92.0%

37.7/32.7/70.4

C. reinhardtii

Ref. v5.5

111.1

53

7784

–/–

–/–/–

Canu

116.4

93

4564

739

853/2269

19.3/22.2

97.9%

950.4/17,369.6/18,320.0

Canu + S

109.7

46

4498

713

655/1629

20.1/23.0

97.7%

950.4/816.0/1766.4

Flye

112.9

65

6573

831

764/2029

21.6/23.6

98.4%

–/–/185.8

NECAT

113.4

54

6169

732

831/2273

19.8/22.4

98.0%

54.8/47.0/101.8

O. sativa

Ref.v4.0

382.8

15

30,829

–/–

–/–/–

Canu

383.9

385

5041

2253

474/8334

15.9/15.9

58.6%

2768.0/16,800.0/19,568.0

Canu + S

366.4

229

3586

1832

394/5116

16.3/16.3

59.2%

2768.0/1926.3/4694.3

Flye

380.7

249

3552

2213

573/1742

16.4/16.3

59.2%

–/–/817.6

NECAT

373.1

120

9650

3311

479/4873

16.0/16.3

58.4%

186.9/330.3/517.2

S. pennellii

Ref

915.6

899

2522

–/–

–/–/–

Canu

961.8

2010

1664

797

5614/15,301

–/20.3

97.1%

5733.1/15,398.4/21,131.5

Canu + S

915.6

899

2522

–/–

97.2%

5733.1/2510.2/8243.3

Flye

1026.0

3180

1971

651

8504/10,726

16.0/18.5

96.7%

–/–/3590.8

NECAT

991.8

1344

4802

992

5813/12,592

15.2/17.3

95.5%

799.6/2434.1/3233.7

NA12878 (rel3,4)

Ref38

3272.1

639

145,139

–/–

–/–/–

Canu

2759.0

2337

5691

3368

1977/25,179

15.4/24.5

86.3%

–/–/60,000.0

NECAT

2798.4

1494

14,066

9538

964/4591

16.6/24.6

74.9%

2217.6/5904.0/8121.6

NA12878 (rel6)

Flye

2867.0

3309

28,407

16,640

4054/7258

22.9/24.2

74.6%

–/–/2500.0

NECAT

2846.9

1047

20,913

13,441

948/1467

23.1/24.4

74.5%

2518.4/6900.4/9418.8

  1. “Assembly size” is the total number of base pairs in all contigs generated by assemblers. “NG50” indicates that 50% of reference genome size was contained in contigs having length ≥ n. “NGA50” is NG50 of aligned blocks that contigs are broken into at mis-assembly breakpoints. “MA/local MA” are the numbers of misassemblies and local misassemblies evaluated by QUAST. “QV” is defined as \(10 \times \log _{10}(\frac{{\rm{100kbp}}}{\# \;{\rm{mismatches}}\;{\rm{per}}\;{\rm{100}}\;{\rm{kbp}} + \;\# \;{\rm{indels}}\;{\rm{per}}\;{\rm{100}}\;{\rm{kbp}}})\), where “# mismatches per 100 kbp” and “# indels per 100 kbp” are evaluated by QUAST. “BUSCO” is gene completeness evaluated by BUSCO. All the pipelines were tested on the same computer with a 2.0 GHz CPU and 3 T GB RAM of memory. For the first five data sets, we ran all the pipelines on our computer with 32 threads; the correction and contig computational time of the pipelines were recorded. For O. sativa, S. pennellii, and the human data set, we ran all pipelines on our computer with 64 threads, and correction and contig computational time were recorded. The S. pennellii assemblies of Canu and Canu + Smartdenovo were acquired from https://www.plabipd.de/portal/solanum-pennellii, NG50 of which were longer than those generated by us. The S. pennellii assembly of Canu + Smartdenovo was used as the reference genome, and therefore its metrics NGA50, MA, and QV are not evaluated. The NA12878 (rel3,4) assembly and running time of Canu were acquired from public paper. The NA12878 (rel6) assembly and running time of Flye were acquired from https://github.com/fenderglass/Flye.