Table 4 Experiments on multimodal emotion recognition using ZuCo dataset.

From: Multi-branch convolutional neural network with cross-attention mechanism for emotion recognition

Model

Word-level Experiments

        Sentence-level Experiments

Acc (%)

Pre (%)

Rec (%)

F1 (%)

Acc (%)

Pre (%)

Rec (%)

F1 (%)

RNN-Multimodal46

N/A

71.70

72.80

71.40

N/A

N/A

N/A

N/A

CNN-Multimodal46

N/A

72.40

72.80

72.30

N/A

N/A

N/A

N/A

MLP

N/A

N/A

N/A

N/A

55.05

55.20

55.01

55.30

Transformer

N/A

N/A

N/A

N/A

65.97

62.11

63.18

77.38

ResNet50

N/A

N/A

N/A

N/A

65.69

61.73

49.18

52.88

Ours(TDF+Text)

93.44

93.49

93.24

93.60

95.64

95.69

95.54

95.60

Ours(FDF+Text)

90.01

90.90

90.99

90.93

92.11

92.09

91.99

91.93

Ours(TFDF+Text)

93.18

94.26

93.03

93.11

96.18

96.26

96.03

96.11

Ours(TDF+FDF+Text)

94.03

93.78

93.54

93.04

96.03

95.78

95.54

96.04

Ours(TDF+TFDF+Text)

94.32

94.15

94.22

94.22

96.32

96.15

96.22

96.22

Ours(FDF+TFDF+Text)

94.24

94.26

94.23

94.20

96.24

96.26

96.23

96.20

Ours(TDF+FDF+TFDF+Text)

94.13

94.19

94.05

94.11

96.95

96.98

97.05

97.01