Table 1 Comparison of diagnostic accuracy with state-of-the-art methods

Observation	CvT2DistilGPT2			R2GenCMN			PCAM			Ours (GPT-3)			Ours (ChatGPT)
	PR	RC	F1	PR	RC	F1	PR	RC	F1	PR	RC	F1	PR	RC	F1
Cardiomegaly	0.512	0.591	0.549	0.590	0.534	0.561	0.846	0.190	0.310	0.606	0.569	0.587	0.663	0.595	0.627
Edema	0.224	0.468	0.303	0.563	0.252	0.348	0.602	0.579	0.591	0.563	0.626	0.593	0.556	0.514	0.534
Consolidation	0.063	0.239	0.099	0.667	0.121	0.205	0.325	0.788	0.460	0.310	0.803	0.447	0.322	0.697	0.440
Atelectasis	0.306	0.388	0.342	0.442	0.504	0.471	0.468	0.991	0.636	0.408	0.991	0.578	0.470	0.981	0.636
Pleural Effusion	0.454	0.692	0.548	0.819	0.500	0.618	0.728	0.916	0.811	0.634	0.916	0.749	0.736	0.845	0.787
Average	0.312	0.476	0.368	0.616	0.382	0.441	0.594	0.693	0.562	0.504	0.781	0.591	0.549	0.726	0.605

PR stands for precision. RC stands for recall and F1 stands for F1-score. Best performance are indicated in bold.

Quick links

Search