Table 10 Ablation study under CIDEr optimization on the COCO Karpathy test split. Metrics include BLEU@N (B@1–B@4), METEOR (M), ROUGE-L (R), CIDEr (C), and SPICE (S).

From: MSSA: memory-driven and simplified scaled attention for enhanced image captioning

Model variant

B@1

B@2

B@3

B@4

M

R

C

S

Baseline (Visual + LSTM)

78.2

63.2

49.0

37.3

27.9

57.1

122.6

21.1

+ Extended Multimodal Features

79.3

64.5

50.2

38.2

28.6

57.9

126.4

21.8

+ Memory-Driven Attention (MDA)

80.2

65.2

51.0

39.0

29.1

58.7

129.7

22.6

+ Simplified Scaled Attention (SSA) (Full MSSA)

81.1

66.1

51.9

40.0

29.5

59.3

131.4

23.2

  1. Significant values are in [bold].