Table 1 Comparison of recent gLMs with multi-species and single-species training approaches
From: Genomic language models could transform medicine but not yet
Model | Parameters | Sequence length (in bp) | Genomes trained on | Human genome included | Training type |
|---|---|---|---|---|---|
GPN MSA9 | 86,000,000 | 128 | 100 | Yes | Multi-species |
GPN20 | 65,612,800* | 512 | 8 | No | Multi-species |
Evo21 | 40,000,000,000 | 1,000,000 | 128,000 | Yes | Multi-species |
Nucleotide transformer6 | 2,500,000,000 | 6000 | 850 | Yes | Multi-species |
DNABERT-221 | 117,000,000 | 877 (BPE) | 135 | Yes | Multi-species |
DNABERT13 (k = 6) | 110,000,000 | 512 | 1 | Yes | Single-genome |
HyenaDNA22 | 1,600,000 | 1,000,000 | 1 | Yes | Single-genome |
GROVER15 | 86,511,201* | 2076 (BPE) | 1 | Yes | Single-genome |