Table 5 Consolidated performance metrics across all model architectures.

From: Beyond peak accuracy: a stability-centric framework for reliable multimodal student engagement assessment

Model

Accuracy

Kappa

F1 Macro

Weighted

Mean

Std

Balanced Mean

Balanced Std

Kappa

Mean

Kappa Std

Mean

Std

Precision mean

Precision Std

Recall Mean

Recall Std

F1 Mean

F1 Std

Ensemble

0.901

0.043

0.846

0.074

0.782

0.089

0.847

0.068

0.876

0.052

0.828

0.094

0.832

0.087

Encoder

0.880

0.050

0.786

0.098

0.724

0.113

0.786

0.115

0.874

0.069

0.880

0.050

0.873

0.056

Inception

0.867

0.050

0.820

0.064

0.719

0.086

0.799

0.063

0.887

0.044

0.862

0.052

0.865

0.051

Snapshot

0.841

0.066

0.826

0.055

0.688

0.099

0.789

0.049

0.903

0.041

0.876

0.057

0.881

0.053

Transformer

0.870

0.057

0.815

0.080

0.722

0.105

0.801

0.081

0.889

0.031

0.867

0.050

0.871

0.044

Fcn

0.828

0.094

0.779

0.098

0.651

0.141

0.751

0.102

0.889

0.037

0.870

0.057

0.873

0.051

Timecnn

0.862

0.052

0.806

0.093

0.705

0.103

0.792

0.084

0.918

0.033

0.901

0.043

0.902

0.039

Mcnn

0.876

0.057

0.840

0.078

0.741

0.111

0.822

0.081

0.901

0.024

0.842

0.067

0.851

0.053