Table 2 Conceptual comparison with SOTA frameworks.

From: Beyond peak accuracy: a stability-centric framework for reliable multimodal student engagement assessment

References

Method

Modalities

Core focus

17 (2025)

FCN-based multimodal fusion (video, text, logs)

Facial, Textual, Behavioral

Single-run evaluation; no imbalance handling; limited generalizability.

27 (2025)

Spatio-Temporal Representation Learning

EEG

Enhanced spatiotemporal fusion for emotion recognition.

28 (2024)

Spatiotemporal EEG Analysis

EEG

High-resolution spatial–temporal modeling for clinical cognitive assessment.

32 (2025)

Few-Shot Transfer Learning

Facial Expressions

Affective sentiment inference using limited annotated samples.

This Work

Stability-Centric Multimodal Framework (Ensemble + MCNN)

Facial, Textual, Behavioral

Methodological rigor, stability-driven evaluation, and efficient multimodal fusion for engagement analysis.