Fig. 3: Assessing visual instruction tuning influence on model performance and characterizing intrinsic CT report structure effects on metric scoring. | Nature Communications

Fig. 3: Assessing visual instruction tuning influence on model performance and characterizing intrinsic CT report structure effects on metric scoring.

From: Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation

Fig. 3

a Traditional language metrics were calculated for each MLLM experiment. While sentence pairing significantly increased the model scoring, statistical significance in two-sided Wilcoxon signed-rank test was also detected between the fine-tuned BrainGPT models and the baseline Otter models; and between the CVIT (Template Instruction and Keyword Instructions) and RVIT (plain instruction and context example instruction) models. b MLLM score demonstrated a significant positive shift in the two-sided Pearson correlation coefficient analysis after sentence pairing preprocessing; the pairing-related increase was particularly significant in the measure of CIDEr (TF-IDF-based). (* p < 0.05, ** p < 0.01, *** p < 0.001). CVIT: clinical visual instruction tuning, RVIT: regular visual instruction tuning.

Back to article page