Table 6 Metrics for binary prediction of co-citation between two input abstracts via cosine similarity for the COPD evaluation set, sorted by \(F1_{max}\). Threshold refers to the optimal decision cutoff using the cosine similarities of that dataset. Models trained in this work are highlighted in bold.

From: Contrastive learning and mixture of experts enables precise vector embeddings in biological databases

Model

F1

Precision

Recall

Threshold

Ratio

ROC-AUC

SE\(_{{{\varvec{copd}}}}\)

0.8270

0.8215

0.8326

0.6844

1.4667

0.9043

MoE\(_{{{\varvec{all}}}}\)

0.7861

0.7039

0.8901

0.7461

1.2528

0.8548

SE\(_{{{\varvec{all}}}}\)

0.7661

0.7207

0.8176

0.8515

1.1422

0.8293

Llama-3.2-1B

0.6940

0.5965

0.8296

0.8434

1.0463

0.7272

SE\(_{{{\varvec{cvd}}}}\)

0.6897

0.5828

0.8445

0.6226

1.1619

0.6922

SciBERT

0.6870

0.5591

0.8909

0.8502

1.0264

0.6968

BioBERT

0.6868

0.5397

0.9439

0.9249

1.0112

0.7023

BERT\(_{base}\)

0.6856

0.5705

0.8587

0.8503

1.0306

0.7004

PubmedBERT

0.6854

0.5750

0.8483

0.9838

1.0029

0.7042

RoBERTa\(_{large}\)

0.6851

0.5816

0.8333

0.9949

1.0009

0.7044

BERT\(_{large}\)

0.6849

0.5676

0.8632

0.8822

1.0224

0.6970

ModernBERT\(_{base}\)

0.6810

0.5358

0.9342

0.9344

1.0100

0.6937

ModernBERT\(_{large}\)

0.6798

0.5289

0.9514

0.9226

1.0104

0.6850

E5\(_{base}\)

0.6796

0.5601

0.8640

0.8210

1.0273

0.7031

RoBERTa\(_{base}\)

0.6784

0.5412

0.9088

0.9830

1.0027

0.6749

E5\(_{large}\)

0.6771

0.5595

0.8572

0.8146

1.0275

0.7027

SE\(_{{{\varvec{autoimmune}}}}\)

0.6742

0.5201

0.9581

0.6473

1.0717

0.6641

MPNet

0.6728

0.5601

0.8423

0.4736

1.1831

0.7003

SE\(_{{{\varvec{cancer}}}}\)

0.6716

0.5122

0.9753

0.6528

1.0415

0.6148

TF-IDF

0.6696

0.5288

0.9126

0.0737

1.4375

0.6832

SE\(_{{{\varvec{cancer}}}}\)

0.6696

0.5166

0.9514

0.5194

1.0755

0.6155

Mini

0.6678

0.5023

0.9963

0.1744

1.1585

0.6733