Table 3 Metrics for binary prediction of co-citation between two input abstracts via cosine similarity averaged across all evaluation sets, sorted by \(F1_{max}\). Threshold refers to the optimal decision cutoff using the cosine similarities of that dataset. SE models use their domain token for all domains. Models trained in this work are highlighted in bold.

Model	F1	Precision	Recall	Threshold	Ratio	ROC-AUC
MoE\(_{{{\varvec{all}}}}\)	0.8875	0.8610	0.9166	0.7083	1.7189	0.9426
SE\(_{{{\varvec{all}}}}\)	0.8770	0.8510	0.9067	0.7475	1.5535	0.9338
SE\(_{{{\varvec{cancer}}}}\)	0.8311	0.7809	0.9055	0.6552	1.7491	0.8606
MPNet	0.8038	0.7535	0.8611	0.4364	1.7541	0.8762
Mini	0.7940	0.7351	0.8631	0.3822	1.7706	0.8641
E5\(_{base}\)	0.7910	0.7322	0.8601	0.8082	1.0676	0.8664
E5\(_{large}\)	0.7908	0.7323	0.8594	0.8020	1.0664	0.8671
SE\(_{{{\varvec{autoimmune}}}}\)	0.7702	0.7042	0.8626	0.7258	1.3705	0.8151
TF-IDF	0.7523	0.7024	0.8097	0.0744	2.2966	0.8209
Llama-3.2-1B	0.7489	0.6894	0.8197	0.8403	1.0769	0.8174
SE\(_{{{\varvec{cvd}}}}\)	0.7458	0.6596	0.8637	0.6669	1.3775	0.7947
SE\(_{{{\varvec{copd}}}}\)	0.7347	0.6441	0.8689	0.7264	1.1872	0.7715
SE\(_{{{\varvec{cancer}}}}\)	0.7132	0.6200	0.8526	0.5719	1.2614	0.7416
BioBERT	0.7123	0.6314	0.8168	0.9384	1.0154	0.7646
PubmedBERT	0.7111	0.6488	0.7867	0.9853	1.0039	0.7614
RoBERTa\(_{large}\)	0.6999	0.5815	0.8789	0.9949	1.0011	0.7395
SciBERT	0.6992	0.6010	0.8360	0.8648	1.0311	0.7400
ModernBERT\(_{large}\)	0.6991	0.6014	0.8347	0.9350	1.0146	0.7378
BERT\(_{large}\)	0.6987	0.6069	0.8232	0.8857	1.0302	0.7370
BERT\(_{base}\)	0.6956	0.5816	0.8652	0.8417	1.0411	0.7296
ModernBERT\(_{base}\)	0.6919	0.5749	0.8687	0.9427	1.0120	0.7236
RoBERTa\(_{base}\)	0.6800	0.5487	0.8940	0.9834	1.0031	0.6998

Quick links

Search