Table 9 Quantitative results of fixed Backbone network usage metric learning and different encoder under MSR-VTT and MSVD benchmark dataset.
Backbones | Dataset | Encoder | Score | |||
|---|---|---|---|---|---|---|
B4 | M | R | C | |||
SE_ResNet152+ResNeXt-101+I3D | MSR-VTT | LSTM | 41.50 | 28.40 | 61.80 | 52.80 |
ViT Encoder Block | 42.20 | 28.90 | 62.16 | 54.30 | ||
MSVD | LSTM | 54.80 | 35.80 | 73.30 | 97.50 | |
ViT Encoder Block | 55.30 | 36.10 | 74.20 | 98.40 | ||