Table 2 Alignment performance benchmarks on GTDB complete dataset

From: Efficient sequence alignment against millions of prokaryotic genomes with LexicMap

Query (length)

Tool

Hits (total)

Hits (high)

Hits (medium)

Hits (low)

Time

RAM

A rare gene

1,299 bp

LexicMap

6,255

2,311

46

3,898

30 s

2.1 GB

BLASTn

7,121

2,311

47

4,763

2,171 s

351.2 GB

BLASTn (ws = 15)

57,741

2,311

47

55,383

3,171 s

324.1 GB

MMseqs2

67,537

2,304

54

65,179

26,174 s

400.7 GB

Minimap2

2,312

2,312

0

0

17,208 s

20.2 GB

A 16S rRNA gene

1,542 bp

LexicMap

306,064

60,999

69,293

175,772

303 s

5.2 GB

BLASTn

301,197

61,878

109,477

129,842

2,760 s

378.4 GB

BLASTn (ws = 15)

301,197

61,878

109,477

129,842

3,291 s

378.4 GB

MMseqs2

324,364

60,915

89,874

173,575

31,140 s

400.7 GB

Minimap2

17,656

15,998

1,652

6

17,313 s

20.2 GB

A plasmid

52,830 bp

LexicMap

65,029

21

2,808

62,200

539 s

6.8 GB

BLASTn

69,311

21

2,865

66,425

2,262 s

364.7 GB

BLASTn (ws = 15)

91,847

21

2,865

88,961

3,082 s

142.8 GB

MMseqs2

90,277

7

1,650

88,620

44,710 s

400.7 GB

Minimap2

3,033

35

1,873

1,125

19,715 s

20.2 GB

1,033 AMR genes

1 kb (median)

LexicMap

4,665,317

1,123,251

776,153

2,765,913

8,620 s

10.7 GB

BLASTn

5,357,772

1,150,407

772,858

3,434,507

4,686 s

442.1 GB

BLASTn (ws = 15)

10,877,544

1,150,410

840,464

8,886,670

4,561 s

311.9 GB

MMseqs2

10,137,345

1,148,942

808,177

8,180,226

184,470 s

406.9 GB

Minimap2

2,078,490

943,516

39,529

815,445

38,058 s

20.2 GB

  1. A high-similarity alignment has query coverage of ≥90% (for genes) or ≥70% (for plasmids) and identity of >90%. A low-similarity alignment has query coverage of <50% (genes) or <30% (plasmids) and identity of <80%. All the remaining alignments are classified as medium similarity. Hits stand for genome hits. Hits (high), hits (medium) and hits (low) denote the number of genomes with high-similarity, medium-similarity and low-similarity matches, respectively.