Table 2 The AUC results for binary sequence classification tasks which have multiple species involved, including promoter region prediction (first four rows), human vs worm classification and mouse transcription factor binding site (TFBS) identification
From: Benchmarking DNA foundation models for genomic and genetic tasks
Data | DNABERT-2 | NT-v2 | HyenaDNA | Caduceus-Ph | GROVER |
|---|---|---|---|---|---|
Promoter Arabidopsis NonTATA | 0.9457 | 0.9395 | 0.9547 | 0.9437 | 0.949 |
Promoter Arabidopsis TATA | 0.951 | 0.95 | 0.9609 | 0.9372 | 0.9486 |
Promoter B.Amyloliquefaciens | 0.8518 | 0.8225 | 0.8643 | 0.8686 | 0.8617 |
Promoter R.Capsulatus | 0.6855 | 0.6746 | 0.7116 | 0.67 | 0.7154 |
Human vs worm | 0.9799 | 0.9785 | 0.9502 | 0.9915 | 0.9843 |
Mouse TFBS 1 | 0.711 | 0.704 | 0.5899 | 0.6841 | 0.6947 |
Mouse TFBS 2 | 0.9072 | 0.9005 | 0.8996 | 0.9472 | 0.9093 |
Mouse TFBS 3 | 0.9308 | 0.9269 | 0.8944 | 0.9351 | 0.9327 |
Mouse TFBS 4 | 0.7622 | 0.6942 | 0.588 | 0.7047 | 0.6815 |
Mouse TFBS 5 | 0.6783 | 0.7077 | 0.627 | 0.715 | 0.6822 |