Table 3 Metrics for binary prediction of co-citation between two input abstracts via cosine similarity averaged across all evaluation sets, sorted by \(F1_{max}\). Threshold refers to the optimal decision cutoff using the cosine similarities of that dataset. SE models use their domain token for all domains. Models trained in this work are highlighted in bold.

From: Contrastive learning and mixture of experts enables precise vector embeddings in biological databases

Model

F1

Precision

Recall

Threshold

Ratio

ROC-AUC

MoE\(_{{{\varvec{all}}}}\)

0.8875

0.8610

0.9166

0.7083

1.7189

0.9426

SE\(_{{{\varvec{all}}}}\)

0.8770

0.8510

0.9067

0.7475

1.5535

0.9338

SE\(_{{{\varvec{cancer}}}}\)

0.8311

0.7809

0.9055

0.6552

1.7491

0.8606

MPNet

0.8038

0.7535

0.8611

0.4364

1.7541

0.8762

Mini

0.7940

0.7351

0.8631

0.3822

1.7706

0.8641

E5\(_{base}\)

0.7910

0.7322

0.8601

0.8082

1.0676

0.8664

E5\(_{large}\)

0.7908

0.7323

0.8594

0.8020

1.0664

0.8671

SE\(_{{{\varvec{autoimmune}}}}\)

0.7702

0.7042

0.8626

0.7258

1.3705

0.8151

TF-IDF

0.7523

0.7024

0.8097

0.0744

2.2966

0.8209

Llama-3.2-1B

0.7489

0.6894

0.8197

0.8403

1.0769

0.8174

SE\(_{{{\varvec{cvd}}}}\)

0.7458

0.6596

0.8637

0.6669

1.3775

0.7947

SE\(_{{{\varvec{copd}}}}\)

0.7347

0.6441

0.8689

0.7264

1.1872

0.7715

SE\(_{{{\varvec{cancer}}}}\)

0.7132

0.6200

0.8526

0.5719

1.2614

0.7416

BioBERT

0.7123

0.6314

0.8168

0.9384

1.0154

0.7646

PubmedBERT

0.7111

0.6488

0.7867

0.9853

1.0039

0.7614

RoBERTa\(_{large}\)

0.6999

0.5815

0.8789

0.9949

1.0011

0.7395

SciBERT

0.6992

0.6010

0.8360

0.8648

1.0311

0.7400

ModernBERT\(_{large}\)

0.6991

0.6014

0.8347

0.9350

1.0146

0.7378

BERT\(_{large}\)

0.6987

0.6069

0.8232

0.8857

1.0302

0.7370

BERT\(_{base}\)

0.6956

0.5816

0.8652

0.8417

1.0411

0.7296

ModernBERT\(_{base}\)

0.6919

0.5749

0.8687

0.9427

1.0120

0.7236

RoBERTa\(_{base}\)

0.6800

0.5487

0.8940

0.9834

1.0031

0.6998