Table 2 Summary of the models used in the study.
Model | Parameter count (millions) | Huggingface path |
|---|---|---|
Llama-3.2 | 1236 | meta-llama/Llama-3.2-1B |
ModernBERT\(_{large}\) | 395 | answerdotai/ModernBERT\(_{large}\) |
BERT\(_{large}\) | 336 | google-bert/bert\(_{large}\)-uncased |
E5\(_{large}\) | 335 | intfloat/e5\(_{large}\)-v2 |
RoBERTa\(_{large}\) | 335 | FacebookAI/roberta\(_{large}\) |
MoE\(_{all}\) (ours) | 150 active, 384 total | GleghornLab/MoE\(_{all}\)-sentence |
ModernBERT\(_{base}\) | 149 | answerdotai/ModernBERT\(_{base}\) |
Roberta\(_{base}\) | 125 | FacebookAI/roberta\(_{base}\) |
SciBERT | 110 | allenai/scibert_scivocab_uncased |
BERT\(_{base}\) | 110 | google-bert/bert\(_{base}\)-uncased |
E5\(_{base}\) | 109 | intfloat/e5\(_{base}\)-v2 |
PubmedBERT | 109 | microsoft/BiomedNLP-BiomedBERT\(_{base}\)-uncased-abstract-fulltext |
MPNet | 109 | sentence-transformers/all-mpnet\(_{base}\)-v2 |
BioBERT | 108 | dmis-lab/biobert-v1.1 |
Mini | 23 | sentence-transformers/all-MiniLM-L6-v2 |