Table 10 Ablation study under CIDEr optimization on the COCO Karpathy test split. Metrics include BLEU@N (B@1–B@4), METEOR (M), ROUGE-L (R), CIDEr (C), and SPICE (S).
From: MSSA: memory-driven and simplified scaled attention for enhanced image captioning
Model variant | B@1 | B@2 | B@3 | B@4 | M | R | C | S |
|---|---|---|---|---|---|---|---|---|
Baseline (Visual + LSTM) | 78.2 | 63.2 | 49.0 | 37.3 | 27.9 | 57.1 | 122.6 | 21.1 |
+ Extended Multimodal Features | 79.3 | 64.5 | 50.2 | 38.2 | 28.6 | 57.9 | 126.4 | 21.8 |
+ Memory-Driven Attention (MDA) | 80.2 | 65.2 | 51.0 | 39.0 | 29.1 | 58.7 | 129.7 | 22.6 |
+ Simplified Scaled Attention (SSA) (Full MSSA) | 81.1 | 66.1 | 51.9 | 40.0 | 29.5 | 59.3 | 131.4 | 23.2 |