Table 3 Error correction benchmarking results for simulated PacBio CLR reads of metagenomic datasets with different complexity

From: VeChat: correcting errors in long reads using variation graphs

Method

#Reads

Error rate (%)

Mismatch (%)

Indel (%)

Haplotype coverage (%)

N50 (bp)

NGA50 (bp)

#Misassemblies

Low complexity (20 genomes)

        

VeChat

293466

0.036

0.020

0.015

96.9

11,866

29,555

104

Racon

299053

0.200

0.122

0.078

91.7

11,811

29,514

794

CONSENT

299333

0.214

0.149

0.065

98.4

11,841

29,556

515

Canu

253381

0.259

0.134

0.125

97.4

12,370

29,457

139

Daccord

298284

0.259

0.243

0.016

92.8

11,862

29,595

280

High complexity (100 genomes)

        

VeChat

1441190

0.088

0.061

0.026

97.5

11,886

30,129

2774

CONSENT

1497216

0.274

0.163

0.112

99.4

11,839

30,204

3263

Canu

1185152

0.354

0.192

0.162

99.0

12,706

30,016

873

Racon

–

–

–

–

–

–

–

–

Daccord

–

–

–

–

–

–

–

–

  1. The average sequencing coverage of strains is about 30x and the sequencing error rate is 10%. Racon and Daccord failed to run for high complexity dataset.