Table 5 Metrics for binary prediction of co-citation between two input abstracts via cosine similarity for the skin cancer evaluation set, sorted by \(F1_{max}\). Threshold refers to the optimal decision cutoff using the cosine similarities of that dataset. Models trained in this work are highlighted in bold.

Model	F1	Precision	Recall	Threshold	Ratio	ROC-AUC
SE\(_{{{\varvec{cancer}}}}\)	0.8509	0.8308	0.8720	0.6680	1.4538	0.9203
MoE\(_{{{\varvec{all}}}}\)	0.7687	0.7130	0.8339	0.8505	1.1447	0.8226
SE\(_{{{\varvec{all}}}}\)	0.7301	0.6070	0.9158	0.7867	1.1066	0.7724
Llama-3.2-1B	0.6856	0.5891	0.8201	0.8249	1.0442	0.6863
SciBERT	0.6793	0.5289	0.9493	0.8387	1.0176	0.6392
E5\(_{large}\)	0.6769	0.5455	0.8916	0.7891	1.0272	0.6890
E5\(_{base}\)	0.6759	0.5404	0.9020	0.7970	1.0270	0.6838
SE\(_{{{\varvec{copd}}}}\)	0.6756	0.5291	0.9343	0.6925	1.0549	0.6214
TF-IDF	0.6747	0.5284	0.9331	0.0437	1.5295	0.6752
RoBERTa\(_{large}\)	0.6747	0.5165	0.9723	0.9938	1.0006	0.6518
ModernBERT\(_{large}\)	0.6735	0.5130	0.9804	0.9086	1.0076	0.6162
BioBERT	0.6733	0.5245	0.9400	0.9268	1.0077	0.6399
BERT\(_{large}\)	0.6723	0.5291	0.9216	0.8729	1.0155	0.6300
PubmedBERT	0.6716	0.5189	0.9516	0.9799	1.0026	0.6289
MPNet	0.6711	0.5086	0.9862	0.2200	1.1740	0.6545
SE\(_{{{\varvec{cvd}}}}\)	0.6701	0.5129	0.9666	0.6468	1.0530	0.6317
BERT\(_{base}\)	0.6693	0.5122	0.9654	0.8139	1.0190	0.6223
Mini	0.6680	0.5147	0.9516	0.2560	1.1917	0.6682
RoBERTa\(_{base}\)	0.6680	0.5032	0.9931	0.9759	1.0012	0.5989
SE\(_{{{\varvec{cancer}}}}\)	0.6677	0.5014	0.9988	0.4490	1.0502	0.6236
ModernBERT\(_{base}\)	0.6677	0.5143	0.9516	0.9273	1.0062	0.6150
SE\(_{{{\varvec{autoimmune}}}}\)	0.6669	0.5003	1.0000	0.4453	1.0659	0.6545

Quick links

Search