Table 6 Metrics for binary prediction of co-citation between two input abstracts via cosine similarity for the COPD evaluation set, sorted by \(F1_{max}\). Threshold refers to the optimal decision cutoff using the cosine similarities of that dataset. Models trained in this work are highlighted in bold.
Model | F1 | Precision | Recall | Threshold | Ratio | ROC-AUC |
|---|---|---|---|---|---|---|
SE\(_{{{\varvec{copd}}}}\) | 0.8270 | 0.8215 | 0.8326 | 0.6844 | 1.4667 | 0.9043 |
MoE\(_{{{\varvec{all}}}}\) | 0.7861 | 0.7039 | 0.8901 | 0.7461 | 1.2528 | 0.8548 |
SE\(_{{{\varvec{all}}}}\) | 0.7661 | 0.7207 | 0.8176 | 0.8515 | 1.1422 | 0.8293 |
Llama-3.2-1B | 0.6940 | 0.5965 | 0.8296 | 0.8434 | 1.0463 | 0.7272 |
SE\(_{{{\varvec{cvd}}}}\) | 0.6897 | 0.5828 | 0.8445 | 0.6226 | 1.1619 | 0.6922 |
SciBERT | 0.6870 | 0.5591 | 0.8909 | 0.8502 | 1.0264 | 0.6968 |
BioBERT | 0.6868 | 0.5397 | 0.9439 | 0.9249 | 1.0112 | 0.7023 |
BERT\(_{base}\) | 0.6856 | 0.5705 | 0.8587 | 0.8503 | 1.0306 | 0.7004 |
PubmedBERT | 0.6854 | 0.5750 | 0.8483 | 0.9838 | 1.0029 | 0.7042 |
RoBERTa\(_{large}\) | 0.6851 | 0.5816 | 0.8333 | 0.9949 | 1.0009 | 0.7044 |
BERT\(_{large}\) | 0.6849 | 0.5676 | 0.8632 | 0.8822 | 1.0224 | 0.6970 |
ModernBERT\(_{base}\) | 0.6810 | 0.5358 | 0.9342 | 0.9344 | 1.0100 | 0.6937 |
ModernBERT\(_{large}\) | 0.6798 | 0.5289 | 0.9514 | 0.9226 | 1.0104 | 0.6850 |
E5\(_{base}\) | 0.6796 | 0.5601 | 0.8640 | 0.8210 | 1.0273 | 0.7031 |
RoBERTa\(_{base}\) | 0.6784 | 0.5412 | 0.9088 | 0.9830 | 1.0027 | 0.6749 |
E5\(_{large}\) | 0.6771 | 0.5595 | 0.8572 | 0.8146 | 1.0275 | 0.7027 |
SE\(_{{{\varvec{autoimmune}}}}\) | 0.6742 | 0.5201 | 0.9581 | 0.6473 | 1.0717 | 0.6641 |
MPNet | 0.6728 | 0.5601 | 0.8423 | 0.4736 | 1.1831 | 0.7003 |
SE\(_{{{\varvec{cancer}}}}\) | 0.6716 | 0.5122 | 0.9753 | 0.6528 | 1.0415 | 0.6148 |
TF-IDF | 0.6696 | 0.5288 | 0.9126 | 0.0737 | 1.4375 | 0.6832 |
SE\(_{{{\varvec{cancer}}}}\) | 0.6696 | 0.5166 | 0.9514 | 0.5194 | 1.0755 | 0.6155 |
Mini | 0.6678 | 0.5023 | 0.9963 | 0.1744 | 1.1585 | 0.6733 |