Table 8 Experimental Results for multimodal sentiment analysis with different tasks using our method. M, A, T, V respectively represent the multimodal, audio, text and vision task.
From: Multimodal sentiment analysis based on multi-layer feature fusion and multi-task learning
Task Type | SIMS | MOSI | ||||||
|---|---|---|---|---|---|---|---|---|
Acc2 | F1-score | Corr | Acc5 | Acc2 | F1-score | Corr | Acc7 | |
M | 71.15 | 72.21 | 0.504 | 36.57 | 79.1/81.8 | 79.0/81.8 | 0.737 | 41.5 |
M, A | 72.33 | 72.99 | 0.513 | 39.07 | 80.0/82.1 | 80.1/82.1 | 0.743 | 42.4 |
M, T | 73.09 | 73.67 | 0.527 | 41.14 | 80.7/82.4 | 80.7/82.5 | 0.754 | 43.3 |
M, V | 75.49 | 75.72 | 0.532 | 40.29 | 81.2/83.5 | 81.1/83.5 | 0.752 | 41.8 |
M, A, T | 74.18 | 74.28 | 0.529 | 41.17 | 82.2/83.9 | 82.0/83.8 | 0.761 | 44.4 |
M, A, V | 76.12 | 76.31 | 0.541 | 41.82 | 82.6/84.7 | 82.5/84.6 | 0.769 | 42.6 |
M, T, V | 78.96 | 79.15 | 0.571 | 43.26 | 84.1/86.0 | 83.9/85.9 | 0.782 | 46.3 |
M, A, T, V | 80.56 | 80.27 | 0.583 | 45.20 | 85.2/86.6 | 85.2/86.7 | 0.792 | 46.7 |