Table 3 F1-score comparison of different-size LLMs

From: Interactive computer-aided diagnosis on medical image using large language models

Model

Size

Cardiomegaly

Edema

Consolidation

Atelectasis

Pleural Effusion

Average

text-babbage-001

~1.3B

0.350

0.479

0.418

0.471

0.639

0.471

text-curie-001

~6.7B

0.529

0.451

0.369

0.515

0.674

0.508

text-davinci-003

~175B

0.587

0.593

0.447

0.578

0.749

0.591

ChatGPT

~175B

0.627

0.534

0.440

0.636

0.787

0.605

  1. Best performance are indicated in bold.