Table 1 Different model results on a test set of the MSR-VTT dataset.

From: Semantic guidance network for video captioning

Years

Method

B4

M

R

C

2022

vc-HRNAT(IR+C)66

43.00

28.20

61.70

49.60

2022

vc-HRNAT(IR+I)66

42.10

28.00

61.60

48.20

2021

SGN38

40.80

28.30

60.80

49.50

2021

SCST67

40.30

28.80

61.20

54.10

2020

STG68

40.50

28.30

60.90

47.10

2020

SAAT33

39.90

27.70

61.20

51.00

2019

E2E69

40.40

27.00

61.00

48.30

2019

POS-CG(I3D+M)70

41.70

27.80

61.20

48.50

2019

POS-CG(IR+M)70

42.00

28.20

61.60

48.70

 

Ours

42.20

28.90

62.16

54.30

  1. The best experimental results are presented in bold.