Table 8 Experimental Results for multimodal sentiment analysis with different tasks using our method. M, A, T, V respectively represent the multimodal, audio, text and vision task.

From: Multimodal sentiment analysis based on multi-layer feature fusion and multi-task learning

Task Type

SIMS

MOSI

Acc2

F1-score

Corr

Acc5

Acc2

F1-score

Corr

Acc7

M

71.15

72.21

0.504

36.57

79.1/81.8

79.0/81.8

0.737

41.5

M, A

72.33

72.99

0.513

39.07

80.0/82.1

80.1/82.1

0.743

42.4

M, T

73.09

73.67

0.527

41.14

80.7/82.4

80.7/82.5

0.754

43.3

M, V

75.49

75.72

0.532

40.29

81.2/83.5

81.1/83.5

0.752

41.8

M, A, T

74.18

74.28

0.529

41.17

82.2/83.9

82.0/83.8

0.761

44.4

M, A, V

76.12

76.31

0.541

41.82

82.6/84.7

82.5/84.6

0.769

42.6

M, T, V

78.96

79.15

0.571

43.26

84.1/86.0

83.9/85.9

0.782

46.3

M, A, T, V

80.56

80.27

0.583

45.20

85.2/86.6

85.2/86.7

0.792

46.7

  1. Significant values are in bold.