Table 5 Metrics for binary prediction of co-citation between two input abstracts via cosine similarity for the skin cancer evaluation set, sorted by \(F1_{max}\). Threshold refers to the optimal decision cutoff using the cosine similarities of that dataset. Models trained in this work are highlighted in bold.
Model | F1 | Precision | Recall | Threshold | Ratio | ROC-AUC |
|---|---|---|---|---|---|---|
SE\(_{{{\varvec{cancer}}}}\) | 0.8509 | 0.8308 | 0.8720 | 0.6680 | 1.4538 | 0.9203 |
MoE\(_{{{\varvec{all}}}}\) | 0.7687 | 0.7130 | 0.8339 | 0.8505 | 1.1447 | 0.8226 |
SE\(_{{{\varvec{all}}}}\) | 0.7301 | 0.6070 | 0.9158 | 0.7867 | 1.1066 | 0.7724 |
Llama-3.2-1B | 0.6856 | 0.5891 | 0.8201 | 0.8249 | 1.0442 | 0.6863 |
SciBERT | 0.6793 | 0.5289 | 0.9493 | 0.8387 | 1.0176 | 0.6392 |
E5\(_{large}\) | 0.6769 | 0.5455 | 0.8916 | 0.7891 | 1.0272 | 0.6890 |
E5\(_{base}\) | 0.6759 | 0.5404 | 0.9020 | 0.7970 | 1.0270 | 0.6838 |
SE\(_{{{\varvec{copd}}}}\) | 0.6756 | 0.5291 | 0.9343 | 0.6925 | 1.0549 | 0.6214 |
TF-IDF | 0.6747 | 0.5284 | 0.9331 | 0.0437 | 1.5295 | 0.6752 |
RoBERTa\(_{large}\) | 0.6747 | 0.5165 | 0.9723 | 0.9938 | 1.0006 | 0.6518 |
ModernBERT\(_{large}\) | 0.6735 | 0.5130 | 0.9804 | 0.9086 | 1.0076 | 0.6162 |
BioBERT | 0.6733 | 0.5245 | 0.9400 | 0.9268 | 1.0077 | 0.6399 |
BERT\(_{large}\) | 0.6723 | 0.5291 | 0.9216 | 0.8729 | 1.0155 | 0.6300 |
PubmedBERT | 0.6716 | 0.5189 | 0.9516 | 0.9799 | 1.0026 | 0.6289 |
MPNet | 0.6711 | 0.5086 | 0.9862 | 0.2200 | 1.1740 | 0.6545 |
SE\(_{{{\varvec{cvd}}}}\) | 0.6701 | 0.5129 | 0.9666 | 0.6468 | 1.0530 | 0.6317 |
BERT\(_{base}\) | 0.6693 | 0.5122 | 0.9654 | 0.8139 | 1.0190 | 0.6223 |
Mini | 0.6680 | 0.5147 | 0.9516 | 0.2560 | 1.1917 | 0.6682 |
RoBERTa\(_{base}\) | 0.6680 | 0.5032 | 0.9931 | 0.9759 | 1.0012 | 0.5989 |
SE\(_{{{\varvec{cancer}}}}\) | 0.6677 | 0.5014 | 0.9988 | 0.4490 | 1.0502 | 0.6236 |
ModernBERT\(_{base}\) | 0.6677 | 0.5143 | 0.9516 | 0.9273 | 1.0062 | 0.6150 |
SE\(_{{{\varvec{autoimmune}}}}\) | 0.6669 | 0.5003 | 1.0000 | 0.4453 | 1.0659 | 0.6545 |