Table 3 Comparison of baseline models on test subsets that focuses on region belonging to node anatomy.

From: Development of a large-scale grounded vision language dataset for chest CT analysis

Question type

Metric

Baseline-Global

Baseline-Grounded

GRG

BLEU ↑

34.76

38.43

ROUGE ↑

44.13

46.05

BERT-Sim ↑

72.91

73.56

RadGraph F1 ↑

0.41

0.43

RadCliQ-v0 ↑

1.81

1.76

RadCliQ-v1 ↑

0.15

0.11

GVQA-Abnormality (all cases)

BLEU ↑

40.91

40.75

ROUGE ↑

41.94

42.06

BERT-Sim ↑

66.89

67.78

Hit Score ↑

45.92

48.90

GVQA-Abnormality (abnormal cases)

BLEU ↑

13.95

15.45

ROUGE ↑

15.52

17.44

BERT-Sim ↑

52.20

54.69

Hit Score ↑

21.59

26.34

GVQA-Presence

Accuracy ↑

95.67

98.05

Size

L1 (mm) ↑

7.65

7.35