Fig. 1: LLaVA-Rad overview. | Nature Communications

Fig. 1: LLaVA-Rad overview.

From: A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings

Fig. 1

a To train LLaVA-Rad, we assemble a large dataset with over 697 thousand chest X-ray image-text pairs; GPT-4 is used to synthesize reports from labels, translate reports from Spanish, and process and structure the corresponding radiology reports. b We adopt a modular three-stage approach to train LLaVA-Rad, comprised of pre-training, alignment and fine-tuning. c A qualitative visualization of the model’s attention during its generative process. d For evaluation, we also propose a novel factual error scoring approach using GPT-4 and demonstrate its parity with expert evaluation. e LLaVA-Rad outperforms much larger generalist and specialized models like GPT-4V and Med-PaLM M on prior standard report evaluation metrics. MLP multi-layer perceptron. The example chest X-ray image in b is obtained from ref. 27 with permission for reproduction from the authors.

Back to article page