Table 1 Teacher-student performance comparison.

From: Optimised knowledge distillation for efficient social media emotion recognition using DistilBERT and ALBERT

Dataset

Model

Accuracy (%)

Δ vs. Teacher

Parameters

Remarks

Twitter

BERT

97.51

110 M

Teacher baseline

DistilBERT

97.35

− 0.16

66 M

40% smaller minimal drop

ALBERT

96.82

− 0.69

12 M

89% smaller

Social media

BERT

72.91

110 M

Teacher Baseline

DistilBERT

67.75

− 5.16

66 M

Significant drop due to class imbalance

ALBERT

67.12

− 5.79

12 M

Avg performance