Fig. 2: Quantitative and qualitative evaluation of LLaVA-Rad using existing report generation benchmarks on MIMIC-CXR. | Nature Communications

Fig. 2: Quantitative and qualitative evaluation of LLaVA-Rad using existing report generation benchmarks on MIMIC-CXR.

From: A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings

Fig. 2

a Comparison between LLaVA-Rad and open-source models according to existing factual correctness (F1-CheXbert-14, F1-RadGraph) and lexical similarity (ROUGE-L) metrics. b Comparison between LLaVA-Rad and closed-source models according to existing factual correctness and lexical similarity metrics. c Comparison between model size and factual correctness shows that LLaVA-Rad is both smaller and more factually correct compared to existing approaches. d Illustration of a sample generated report from LLaVA-Rad compared with that of LLaVA and LLaVA-Med. LLaVA-Rad’s generations that match reference findings are highlighted. e Comparison of the performance on cross-modal retrieval demonstrated by LLaVA-Rad, LLaVA-Med and LLaVA. In ae values correspond to mean statistic in MIMIC-CXR test-set (n = 2461 image-report pairs) with the exception of MAIRA-1 and Med-PaLM M which are derived from their original publications. In a, b error bars correspond to 95% bootstrap confidence intervals derived from 500 samples. Source data are provided as a Source Data file.

Back to article page