Table 1 Comparative analysis of performance of various models.
Model name | Text modality | Image modality | Audio modality | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
 | A | P | R | F1 | A | P | R | F1 | A | P | R | F1 |
MISA | 0.50 | 0.25 | 0.50 | 0.33 | 0.86 | 0.86 | 0.86 | 0.86 | 0.57 | 0.67 | 0.57 | 0.51 |
MSFNet | 0.51 | 0.51 | 0.51 | 0.47 | 0.84 | 0.85 | 0.84 | 0.84 | 0.51 | 0.51 | 0.51 | 0.47 |
CLIP | 0.80 | 0.80 | 0.80 | 0.80 | 0.92 | 0.92 | 0.92 | 0.92 | 0.79 | 0.80 | 0.80 | 0.80 |
CLIP+BERT | 0.79 | 0.79 | 0.79 | 0.79 | 0.92 | 0.92 | 0.92 | 0.92 | 0.55 | 0.64 | 0.55 | 0.47 |
Proposed Model | 0.84 | 0.84 | 0.84 | 0.84 | 0.94 | 0.94 | 0.94 | 0.94 | 0.85 | 0.85 | 0.85 | 0.85 |