Table 2 The AUC results for binary sequence classification tasks which have multiple species involved, including promoter region prediction (first four rows), human vs worm classification and mouse transcription factor binding site (TFBS) identification

From: Benchmarking DNA foundation models for genomic and genetic tasks

Data

DNABERT-2

NT-v2

HyenaDNA

Caduceus-Ph

GROVER

Promoter Arabidopsis NonTATA

0.9457

0.9395

0.9547

0.9437

0.949

Promoter Arabidopsis TATA

0.951

0.95

0.9609

0.9372

0.9486

Promoter B.Amyloliquefaciens

0.8518

0.8225

0.8643

0.8686

0.8617

Promoter R.Capsulatus

0.6855

0.6746

0.7116

0.67

0.7154

Human vs worm

0.9799

0.9785

0.9502

0.9915

0.9843

Mouse TFBS 1

0.711

0.704

0.5899

0.6841

0.6947

Mouse TFBS 2

0.9072

0.9005

0.8996

0.9472

0.9093

Mouse TFBS 3

0.9308

0.9269

0.8944

0.9351

0.9327

Mouse TFBS 4

0.7622

0.6942

0.588

0.7047

0.6815

Mouse TFBS 5

0.6783

0.7077

0.627

0.715

0.6822

  1. Using mean token pooling method. Bolded: higher than at least two other AUCs, p < 0.01. P-values are calculated using one-sided DeLong’s test.