Table 8 Metrics for binary prediction of co-citation between two input abstracts via cosine similarity for the parasitic evaluation set, sorted by \(F1_{max}\). Threshold refers to the optimal decision cutoff using the cosine similarities of that dataset. Models trained in this work are highlighted in bold.

From: Contrastive learning and mixture of experts enables precise vector embeddings in biological databases

Model

F1

Precision

Recall

Threshold

Ratio

ROC-AUC

SE\(_{{{\varvec{cancer}}}}\)

0.9060

0.8866

0.9262

0.6125

2.1802

0.9668

MoE\(_{{{\varvec{all}}}}\)

0.9004

0.8783

0.9237

0.6812

1.8100

0.9634

SE\(_{{{\varvec{all}}}}\)

0.8905

0.8694

0.9127

0.7208

1.6434

0.9567

MPNet

0.8105

0.7698

0.8556

0.4600

1.7108

0.8829

Mini

0.8011

0.7480

0.8624

0.4044

1.7149

0.8722

E5\(_{base}\)

0.7933

0.7437

0.8499

0.8082

1.0659

0.8709

E5\(_{large}\)

0.7904

0.7381

0.8507

0.8021

1.0637

0.8691

Llama-3.2-1B

0.7460

0.6829

0.8218

0.8425

1.0717

0.8143

TF-IDF

0.7442

0.7048

0.7882

0.0744

2.1749

0.8158

SE\(_{{{\varvec{autoimmune}}}}\)

0.7242

0.6464

0.8233

0.7742

1.1249

0.7769

SE\(_{{{\varvec{cvd}}}}\)

0.7202

0.6314

0.8381

0.6968

1.2209

0.7689

BioBERT

0.7108

0.6159

0.8403

0.9384

1.0143

0.7648

PubmedBERT

0.7103

0.6362

0.8040

0.9855

1.0037

0.7606

SE\(_{{{\varvec{copd}}}}\)

0.7039

0.5867

0.8797

0.7341

1.1159

0.7376

BERT\(_{large}\)

0.6987

0.6117

0.8146

0.8858

1.0298

0.7393

SE\(_{{{\varvec{cancer}}}}\)

0.6979

0.6002

0.8335

0.5975

1.2195

0.7297

BERT\(_{base}\)

0.6964

0.5916

0.8462

0.8417

1.0415

0.7337

ModernBERT\(_{large}\)

0.6964

0.5974

0.8346

0.9350

1.0138

0.7328

RoBERTa\(_{large}\)

0.6929

0.5900

0.8395

0.9953

1.0010

0.7287

SciBERT

0.6922

0.5871

0.8430

0.8690

1.0267

0.7326

ModernBERT\(_{base}\)

0.6880

0.5724

0.8622

0.9446

1.0110

0.7167

RoBERTa\(_{base}\)

0.6742

0.5418

0.8924

0.9834

1.0028

0.6885