Fig. 5: Analyzing the performance of LLaVA-Rad using ablation studies and attention visualization. | Nature Communications

Fig. 5: Analyzing the performance of LLaVA-Rad using ablation studies and attention visualization.

From: A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings

Fig. 5

a Comparison of using different image encoders (BiomedCLIP-CXR from LLaVA-Rad, BiomedCLIP continually pre-trained on MIMIC-CXR, BiomedCLIP, and OpenAI CLIP) to start the alignment and fine-tuning stages. b Ablation study on only using rule-processed MIMIC-CXR training data or GPT-4 processed training data in alignment and fine-tuning stages. c Attention visualization qualitatively demonstrates the appropriate grounding of LLaVA-Rad in-specific image regions when generating a word (bold text) as part of a specific finding (bottom row). In a, b values represent mean metric scores and error bars indicate 95% bootstrap confidence intervals derived from 500 resampling iterations. Source data are provided as a Source Data file.

Back to article page