Fig. 1

Schematic diagram of the Kinematic-Narrative Evaluation Learner (KINEVAL). The model consists of three main components: Unified Kinematic Encoding using Bi-GRU and attention to extract temporal features \(z^{(i)}\), Pedagogical Alignment Modeling that computes alignment between teacher intent \(\theta _t\) and student motion using delay-aware cosine similarity and attention to obtain \(\pi ^{(i)}\), and Context-Aware Decoding that processes contextual metadata \(\gamma _t\) to generate final evaluative embeddings \(e^{(i)}\), which are used to predict the final score \(E_{i\ell }\).