Table 1 Assembly statistics for different assemblers using the barcode-removed short-reads or linked-reads on mock communities

From: Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity

 

Pangaea

Athena

Supernova

cloudSPAdes

MEGAHIT

metaSPAdes

ATCC-MSA-1003 (stLFR)

Total assembly length

59,484,233

52,159,846

35,226,545

-

55,506,708

57,225,487

Genome fraction (%)

84.43

77.12

52.21

-

81.99

83.99

Longest alignment

2,853,278

2,281,647

1,105,108

-

883,580

883,552

Overall N50

1,619,916

875,747

243,194

-

127,879

132,556

Overall N70

614,609

615,896

132,825

-

63,957

63,879

Overall N90

5,248

110,802

50,969

-

5,222

3,688

Overall NA50

731,990

677,911

215,052

-

116,995

125,586

NA50 per strain

628,059

576,621

145,646

-

140,463

134,477

NGA50 per strain

677,353

575,371

137,023

-

133,877

133,978

ATCC-MSA-1003 (TELL-Seq)

Total assembly length

61,990,266

60,847,375

56,748,937

62,316,993

60,291,592

60,648,311

Genome fraction (%)

82.63

81.99

76.46

82.44

82.10

82.46

Longest alignment

4,968,123

4,968,084

1,096,372

884,364

867,473

776,102

Overall N50

1,360,322

466,498

102,757

127,419

128,069

112,342

Overall N70

465,633

184,646

41,121

56,614

54,222

49,466

Overall N90

8,045

9,929

8,084

6,051

5,893

5,429

Overall NA50

649,672

361,569

97,312

118,159

112,513

105,630

NA50 per strain

838,457

483,734

123,277

129,001

122,932

119,253

NGA50 per strain

887,107

485,196

121,657

129,531

121,938

118,391

ATCC-MSA-1003 (10x)

Total assembly length

58,860,253

52,292,807

89,828,047

-

56,558,134

-

Genome fraction (%)

83.23

77.19

75.08

-

82.74

-

Largest alignment

2,277,835

2,278,264

974,529

-

883,602

-

Overall N50

1,033,793

601,544

32,128

-

151,002

-

Overall N70

564,696

356,490

12,725

-

73,366

-

Overall N90

13,052

73,456

4,075

-

6,715

-

Overall NA50

483,416

453,155

30,194

-

132,728

-

NA50 per strain

421,157

328,424

93,097

-

141,574

-

NGA50 per strain

441,029

334,491

89,993

-

143,179

-

CAMI-high

Total assembly length

799,834,811

773,344,531

757,315,614

752,648,273

772,095,404

759,108,499

Genome fraction (%)

27.71

27.33

26.69

26.66

27.46

27.01

Longest alignment

3,402,679

3,402,661

2,700,464

2,525,441

2,378,494

2,518,442

Overall N50

254,141

211,169

135,915

117,495

114,277

133,877

Overall N70

124,320

108,630

61,513

54,187

52,909

62,467

Overall N90

38,713

38,425

15,770

10,227

10,946

12,521

Overall NA50

212,494

201,566

132,676

113,425

110,658

132,603

NA50 per strain

66,310

66,060

46,464

40,507

39,340

45,216

NGA50 per strain

54,412

49,896

34,201

31,266

28,301

34,547

Non-zero NGA50

195

180

175

177

180

177

ZYMO

Total assembly length

36,094,665

35,254,802

25,103,660

35,843,842

35,358,538

35,511,020

Genome fraction (%)

49.24

48.14

34.27

47.99

48.38

48.46

Longest alignment

2,839,942

2,717,703

768,486

1,012,282

1,079,942

847,644

Overall N50

1,094,665

761,749

288,912

210,427

124,248

191,688

Overall N70

638,009

445,656

139,701

136,651

70,860

106,796

Overall N90

180,146

157,620

45,025

60,073

25,980

44,425

Overall NA50

1,072,622

760,284

226,947

208,894

116,738

191,688

NA50 per strain

1,066,276

927,278

136,210

245,562

179,245

183,935

NGA50 per strain

1,083,616

938,270

132,015

246,262

177,318

179,371

  1. The missing values for cloudSPAdes and metaSPAdes were because they required over 2TB memory on the relevant datasets that exceeded our server limit.
  2. The highest values are in bold.