Extended Data Table 2 Alignment performance of LexicMap on GenBank+RefSeq dataset

From: Efficient sequence alignment against millions of prokaryotic genomes with LexicMap

Query

Query length

Hits (total)

Hits (high)

Hits (medi)

Hits (low)

Time

RAM

A rare gene

1,299 bp

41,718

11,746

115

29,857

3m:06s

4.0 GB

A 16S rRNA gene

1,542 bp

1,955,167

245,884

501,691

1,207,592

32m:59s

11.1 GB

A plasmid

52,830 bp

560,330

96

15,370

544,864

52m:22s

14.5 GB

1033 AMR genes

1 kb (median)

30,967,882

7,636,386

4,858,063

18,473,433

15h:52m:08s

24.9 GB

  1. Sequence identifiers are available in Methods. Hits stand for genome hits. Hits (high), hits (medium), and hits (low) mean the number of genomes with high-, medium-, and low-similarity matches (details in Methods), respectively.