Table 1 Comparison of diagnostic accuracy with state-of-the-art methods

From: Interactive computer-aided diagnosis on medical image using large language models

Observation

CvT2DistilGPT2

R2GenCMN

PCAM

Ours (GPT-3)

Ours (ChatGPT)

 

PR

RC

F1

PR

RC

F1

PR

RC

F1

PR

RC

F1

PR

RC

F1

Cardiomegaly

0.512

0.591

0.549

0.590

0.534

0.561

0.846

0.190

0.310

0.606

0.569

0.587

0.663

0.595

0.627

Edema

0.224

0.468

0.303

0.563

0.252

0.348

0.602

0.579

0.591

0.563

0.626

0.593

0.556

0.514

0.534

Consolidation

0.063

0.239

0.099

0.667

0.121

0.205

0.325

0.788

0.460

0.310

0.803

0.447

0.322

0.697

0.440

Atelectasis

0.306

0.388

0.342

0.442

0.504

0.471

0.468

0.991

0.636

0.408

0.991

0.578

0.470

0.981

0.636

Pleural Effusion

0.454

0.692

0.548

0.819

0.500

0.618

0.728

0.916

0.811

0.634

0.916

0.749

0.736

0.845

0.787

Average

0.312

0.476

0.368

0.616

0.382

0.441

0.594

0.693

0.562

0.504

0.781

0.591

0.549

0.726

0.605

  1. PR stands for precision. RC stands for recall and F1 stands for F1-score. Best performance are indicated in bold.