Table 5 Performance comparison of Transformer models with EEG, audio, and visual modalities.

From: EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts

Modality (method)

Mean Accuracy [%]

Mean F1-score

EEG (EEGformer)51

53.5

0.52

Transformer (AST)38

62.7

0.62

Transformer (Vivit)39

74.5

0.72