Table 4 Experiments on multimodal emotion recognition using ZuCo dataset.
From: Multi-branch convolutional neural network with cross-attention mechanism for emotion recognition
Model | Word-level Experiments | Â Â Â Â Â Â Â Â Sentence-level Experiments | ||||||
|---|---|---|---|---|---|---|---|---|
Acc (%) | Pre (%) | Rec (%) | F1 (%) | Acc (%) | Pre (%) | Rec (%) | F1 (%) | |
RNN-Multimodal46 | N/A | 71.70 | 72.80 | 71.40 | N/A | N/A | N/A | N/A |
CNN-Multimodal46 | N/A | 72.40 | 72.80 | 72.30 | N/A | N/A | N/A | N/A |
MLP | N/A | N/A | N/A | N/A | 55.05 | 55.20 | 55.01 | 55.30 |
Transformer | N/A | N/A | N/A | N/A | 65.97 | 62.11 | 63.18 | 77.38 |
ResNet50 | N/A | N/A | N/A | N/A | 65.69 | 61.73 | 49.18 | 52.88 |
Ours(TDF+Text) | 93.44 | 93.49 | 93.24 | 93.60 | 95.64 | 95.69 | 95.54 | 95.60 |
Ours(FDF+Text) | 90.01 | 90.90 | 90.99 | 90.93 | 92.11 | 92.09 | 91.99 | 91.93 |
Ours(TFDF+Text) | 93.18 | 94.26 | 93.03 | 93.11 | 96.18 | 96.26 | 96.03 | 96.11 |
Ours(TDF+FDF+Text) | 94.03 | 93.78 | 93.54 | 93.04 | 96.03 | 95.78 | 95.54 | 96.04 |
Ours(TDF+TFDF+Text) | 94.32 | 94.15 | 94.22 | 94.22 | 96.32 | 96.15 | 96.22 | 96.22 |
Ours(FDF+TFDF+Text) | 94.24 | 94.26 | 94.23 | 94.20 | 96.24 | 96.26 | 96.23 | 96.20 |
Ours(TDF+FDF+TFDF+Text) | 94.13 | 94.19 | 94.05 | 94.11 | 96.95 | 96.98 | 97.05 | 97.01 |