Table 1 Teacher-student performance comparison.
Dataset | Model | Accuracy (%) | Δ vs. Teacher | Parameters | Remarks |
|---|---|---|---|---|---|
BERT | 97.51 | – | 110 M | Teacher baseline | |
DistilBERT | 97.35 | − 0.16 | 66 M | 40% smaller minimal drop | |
ALBERT | 96.82 | − 0.69 | 12 M | 89% smaller | |
Social media | BERT | 72.91 | – | 110 M | Teacher Baseline |
DistilBERT | 67.75 | − 5.16 | 66 M | Significant drop due to class imbalance | |
ALBERT | 67.12 | − 5.79 | 12 M | Avg performance |