Table 4 Comparison of GatorTronS with existing transformer-based LLMs for semantic textual similarity, natural language inference, and question answering.
From: A study of generative large language model for medical research and healthcare
Semantic textual similarity | Natural language inference | Question answering | ||||
|---|---|---|---|---|---|---|
2019 n2c223 | MedNLI24 | emrQA Medication25 | emrQA Relation25 | |||
Transformer | Pearson correlation | Accuracy | F1 score | Exact Match | F1 score | Exact Match |
ClinicalBERT | 0.879 | 0.827 | 0.691 | 0.241 | 0.931 | 0.853 |
GatorTron, 90B | 0.881 | 0.867 | 0.718 | 0.298 | 0.954 | 0.903 |
GatorTronS, 1B | 0.853 | 0.851 | 0.702 | 0.288 | 0.965 | 0.924 |
GatorTronS, 5B | 0.888 | 0.882 | 0.726 | 0.305 | 0.968 | 0.926 |
GatorTronS, 10B | 0.893 | 0.886 | 0.728 | 0.311 | 0.972 | 0.929 |
GatorTronS, 20B | 0.898 | 0.885 | 0.726 | 0.307 | 0.973 | 0.927 |