Table 3 Comparison of baseline models on test subsets that focuses on region belonging to node anatomy.
From: Development of a large-scale grounded vision language dataset for chest CT analysis
Question type | Metric | Baseline-Global | Baseline-Grounded |
|---|---|---|---|
GRG | BLEU ↑ | 34.76 | 38.43 |
ROUGE ↑ | 44.13 | 46.05 | |
BERT-Sim ↑ | 72.91 | 73.56 | |
RadGraph F1 ↑ | 0.41 | 0.43 | |
RadCliQ-v0 ↑ | 1.81 | 1.76 | |
RadCliQ-v1 ↑ | 0.15 | 0.11 | |
GVQA-Abnormality (all cases) | BLEU ↑ | 40.91 | 40.75 |
ROUGE ↑ | 41.94 | 42.06 | |
BERT-Sim ↑ | 66.89 | 67.78 | |
Hit Score ↑ | 45.92 | 48.90 | |
GVQA-Abnormality (abnormal cases) | BLEU ↑ | 13.95 | 15.45 |
ROUGE ↑ | 15.52 | 17.44 | |
BERT-Sim ↑ | 52.20 | 54.69 | |
Hit Score ↑ | 21.59 | 26.34 | |
GVQA-Presence | Accuracy ↑ | 95.67 | 98.05 |
Size | L1 (mm) ↑ | 7.65 | 7.35 |