Table 10 Quantitative results of the metric learning module with fixed Backbone network and encoder architecture under the MSR-VTT and MSVD benchmark dataset.
Backbones | Encoder | Dataset | Metric learning | Score | |||
|---|---|---|---|---|---|---|---|
B4 | M | R | C | ||||
SE_ResNet152+ResNeXt101+I3D | ViT | MSR-VTT | NO | 41.70 | 27.31 | 61.30 | 52.20 |
YES | 42.20 | 28.90 | 62.16 | 54.30 | |||
MSVD | No | 54.60 | 35.80 | 72.90 | 97.00 | ||
YES | 55.30 | 36.10 | 74.20 | 98.40 | |||