Table 2 Conceptual comparison with SOTA frameworks.
References | Method | Modalities | Core focus |
|---|---|---|---|
17 (2025) | FCN-based multimodal fusion (video, text, logs) | Facial, Textual, Behavioral | Single-run evaluation; no imbalance handling; limited generalizability. |
27 (2025) | Spatio-Temporal Representation Learning | EEG | Enhanced spatiotemporal fusion for emotion recognition. |
28 (2024) | Spatiotemporal EEG Analysis | EEG | High-resolution spatial–temporal modeling for clinical cognitive assessment. |
32 (2025) | Few-Shot Transfer Learning | Facial Expressions | Affective sentiment inference using limited annotated samples. |
This Work | Stability-Centric Multimodal Framework (Ensemble + MCNN) | Facial, Textual, Behavioral | Methodological rigor, stability-driven evaluation, and efficient multimodal fusion for engagement analysis. |