Table 1 The AUC results for binary sequence classification tasks on human genome

From: Benchmarking DNA foundation models for genomic and genetic tasks

Data

DNABERT-2

NT-v2

HyenaDNA

Caduceus-Ph

GROVER

DNase I Hypersensitive

0.8666

0.8524

0.8295

0.8799

0.857

Human TFBS 1

0.8382

0.8315

0.8301

0.8796

0.8618

Human TFBS 2

0.821

0.809

0.8205

0.8687

0.8495

Human TFBS 3

0.7896

0.7974

0.7875

0.8249

0.8158

Human TFBS 4

0.726

0.7103

0.7149

0.7725

0.763

Human TFBS 5

0.9204

0.9149

0.9159

0.9294

0.931

Promoter GM12878

0.9856

0.9835

0.976

0.9865

0.9839

Promoter HUVEC

0.9903

0.987

0.9817

0.9896

0.9885

Promoter Hela-S3

0.9886

0.9838

0.981

0.9871

0.9857

Promoter NHEK

0.9501

0.9323

0.9271

0.9567

0.9507

Acceptor

0.8969

0.7928

0.7946

0.8449

0.8041

Coding

0.9438

0.9289

0.9406

0.9735

0.9594

Donor

0.9056

0.8198

0.8128

0.8535

0.819

Enhancer

0.8717

0.8674

0.8339

0.8384

0.8554

Enhancer Cohn

0.8223

0.7894

0.7754

0.821

0.8161

Enhancer Ensembl

0.9369

0.9389

0.9356

0.9431

0.9382

Open chromatin region

0.7253

0.7183

0.7191

0.765

0.7455

Promoter All 300 bps

0.9426

0.9445

0.9394

0.9519

0.9402

Promoter All 70 bps

0.8311

0.8527

0.832

0.8748

0.8506

Promoter NonTATA 251 bps

0.9297

0.8905

0.928

0.9426

0.9395

Promoter NonTATA 300 bps

0.9765

0.9758

0.9662

0.9834

0.9728

Promoter NonTATA 70 bps

0.8531

0.8729

0.8516

0.8961

0.8704

Promoter TATA 300 bps

0.7646

0.7791

0.8077

0.76

0.78

Promoter TATA 70 bps

0.7781

0.7947

0.7827

0.8103

0.796

  1. The tasks include promoter region identification (across multiple datasets), coding region detection, splice site donor and acceptor identification, enhancer identification (across multiple datasets), transcription factor binding site identification (across multiple datasets), and open chromatin region identification (across multiple datasets). Using mean token pooling method. Bolded: higher than at least two other AUCs, p < 0.01. P-values are calculated using one-sided DeLong’s test.