Table 8 Metrics for binary prediction of co-citation between two input abstracts via cosine similarity for the parasitic evaluation set, sorted by \(F1_{max}\). Threshold refers to the optimal decision cutoff using the cosine similarities of that dataset. Models trained in this work are highlighted in bold.
Model | F1 | Precision | Recall | Threshold | Ratio | ROC-AUC |
|---|---|---|---|---|---|---|
SE\(_{{{\varvec{cancer}}}}\) | 0.9060 | 0.8866 | 0.9262 | 0.6125 | 2.1802 | 0.9668 |
MoE\(_{{{\varvec{all}}}}\) | 0.9004 | 0.8783 | 0.9237 | 0.6812 | 1.8100 | 0.9634 |
SE\(_{{{\varvec{all}}}}\) | 0.8905 | 0.8694 | 0.9127 | 0.7208 | 1.6434 | 0.9567 |
MPNet | 0.8105 | 0.7698 | 0.8556 | 0.4600 | 1.7108 | 0.8829 |
Mini | 0.8011 | 0.7480 | 0.8624 | 0.4044 | 1.7149 | 0.8722 |
E5\(_{base}\) | 0.7933 | 0.7437 | 0.8499 | 0.8082 | 1.0659 | 0.8709 |
E5\(_{large}\) | 0.7904 | 0.7381 | 0.8507 | 0.8021 | 1.0637 | 0.8691 |
Llama-3.2-1B | 0.7460 | 0.6829 | 0.8218 | 0.8425 | 1.0717 | 0.8143 |
TF-IDF | 0.7442 | 0.7048 | 0.7882 | 0.0744 | 2.1749 | 0.8158 |
SE\(_{{{\varvec{autoimmune}}}}\) | 0.7242 | 0.6464 | 0.8233 | 0.7742 | 1.1249 | 0.7769 |
SE\(_{{{\varvec{cvd}}}}\) | 0.7202 | 0.6314 | 0.8381 | 0.6968 | 1.2209 | 0.7689 |
BioBERT | 0.7108 | 0.6159 | 0.8403 | 0.9384 | 1.0143 | 0.7648 |
PubmedBERT | 0.7103 | 0.6362 | 0.8040 | 0.9855 | 1.0037 | 0.7606 |
SE\(_{{{\varvec{copd}}}}\) | 0.7039 | 0.5867 | 0.8797 | 0.7341 | 1.1159 | 0.7376 |
BERT\(_{large}\) | 0.6987 | 0.6117 | 0.8146 | 0.8858 | 1.0298 | 0.7393 |
SE\(_{{{\varvec{cancer}}}}\) | 0.6979 | 0.6002 | 0.8335 | 0.5975 | 1.2195 | 0.7297 |
BERT\(_{base}\) | 0.6964 | 0.5916 | 0.8462 | 0.8417 | 1.0415 | 0.7337 |
ModernBERT\(_{large}\) | 0.6964 | 0.5974 | 0.8346 | 0.9350 | 1.0138 | 0.7328 |
RoBERTa\(_{large}\) | 0.6929 | 0.5900 | 0.8395 | 0.9953 | 1.0010 | 0.7287 |
SciBERT | 0.6922 | 0.5871 | 0.8430 | 0.8690 | 1.0267 | 0.7326 |
ModernBERT\(_{base}\) | 0.6880 | 0.5724 | 0.8622 | 0.9446 | 1.0110 | 0.7167 |
RoBERTa\(_{base}\) | 0.6742 | 0.5418 | 0.8924 | 0.9834 | 1.0028 | 0.6885 |