Table 3 F1-score comparison of different-size LLMs
From: Interactive computer-aided diagnosis on medical image using large language models
Model | Size | Cardiomegaly | Edema | Consolidation | Atelectasis | Pleural Effusion | Average |
---|---|---|---|---|---|---|---|
text-babbage-001 | ~1.3B | 0.350 | 0.479 | 0.418 | 0.471 | 0.639 | 0.471 |
text-curie-001 | ~6.7B | 0.529 | 0.451 | 0.369 | 0.515 | 0.674 | 0.508 |
text-davinci-003 | ~175B | 0.587 | 0.593 | 0.447 | 0.578 | 0.749 | 0.591 |
ChatGPT | ~175B | 0.627 | 0.534 | 0.440 | 0.636 | 0.787 | 0.605 |