Table 2 Different model results on the testset of the MSVD dataset.

From: Semantic guidance network for video captioning

Years

Method

B4

M

R

C

2022

vc-HRNAT(IR+C)66

57.7

36.3

74.0

96.3

2022

vc-HRNAT(IR+I)66

55.7

36.8

74.1

98.1

2021

SGN38

52.8

35.5

72.9

94.3

2021

SCST67

50.9

35.1

72.4

94.5

2020

STG68

52.2

36.9

73.9

93.0

2020

SAAT33

46.5

33.5

69.4

81.0

2019

E2E69

50.3

34.0

70.8

87.5

2019

POS-CG(I3D+M)70

53.5

34.9

72.1

91.0

2019

POS-CG(IR+M)70

52.5

34.1

71.3

88.7

 

Ours

55.3

36.1

74.2

98.4

  1. The best experimental results are presented in bold.