Table 3 Comparison of baseline models on test subsets that focuses on region belonging to node anatomy.

Question type	Metric	Baseline-Global	Baseline-Grounded
GRG	BLEU ↑	34.76	38.43
	ROUGE ↑	44.13	46.05
	BERT-Sim ↑	72.91	73.56
	RadGraph F1 ↑	0.41	0.43
	RadCliQ-v0 ↑	1.81	1.76
	RadCliQ-v1 ↑	0.15	0.11
GVQA-Abnormality (all cases)	BLEU ↑	40.91	40.75
	ROUGE ↑	41.94	42.06
	BERT-Sim ↑	66.89	67.78
	Hit Score ↑	45.92	48.90
GVQA-Abnormality (abnormal cases)	BLEU ↑	13.95	15.45
	ROUGE ↑	15.52	17.44
	BERT-Sim ↑	52.20	54.69
	Hit Score ↑	21.59	26.34
GVQA-Presence	Accuracy ↑	95.67	98.05
Size	L1 (mm) ↑	7.65	7.35

Quick links

Search