Table 9 Performance comparison with CIDEr optimization.
From: MSSA: memory-driven and simplified scaled attention for enhanced image captioning
Model | B@1 | B@2 | B@3 | B@4 | M | R | C | S |
|---|---|---|---|---|---|---|---|---|
LSTM1 | – | – | – | 31.9 | 25.5 | 54.3 | 106.3 | – |
SCST34 | – | – | – | 34.2 | 26.7 | 55.7 | 114.0 | – |
Up-Down35 | 79.8 | – | – | 36.3 | 27.7 | 56.9 | 120.1 | 21.4 |
AoANet37 | 80.2 | – | – | 38.9 | 29.2 | 58.8 | 129.8 | 22.4 |
SGAE36 | 80.8 | – | – | 38.4 | 28.4 | 58.6 | 127.8 | 22.1 |
X-Transformer18 | 80.9 | – | – | 39.7 | 29.5 | 59.1 | 132.8 | 23.4 |
\(M^2\) Transformer38 | 80.8 | – | – | 39.1 | 29.1 | 58.4 | 131.2 | 22.6 |
DLCT45 | 81.4 | – | – | 39.8 | 29.5 | 59.1 | 133.8 | 23.0 |
TAAIC46 | 71.0 | – | – | 27.7 | 23.8 | 51.1 | 93.2 | 18.3 |
MAN27 | 80.6 | – | – | 39.3 | 28.4 | 58.5 | 126.5 | – |
SCD-Net25 | 81.3 | – | – | 39.4 | 29.2 | 59.1 | 131.6 | 23.0 |
ADF43 | 81.0 | 64.3 | 49.3 | 37.4 | 28.3 | 58.1 | 123.3 | 21.6 |
TCCTN42 | 81.3 | – | – | 39.4 | 29.2 | 58.9 | 132.8 | – |
ICEAP44 | 81.1 | 64.5 | – | 37.4 | 28.5 | 58.2 | 123.8 | 21.7 |
X-LAN18 | 80.8 | 65.6 | 51.4 | 39.5 | 29.5 | 59.2 | 132.0 | 23.4 |
MSSA | 81.1 | 66.1 | 51.9 | 40.0 | 29.5 | 59.3 | 131.4 | 23.2 |