Table 1 Summary of notations.
From: Counterclockwise block-by-block knowledge distillation for neural network compression
Term | Definition |
|---|---|
T | Teacher model |
S | Student model |
\(L^{N}_{local}\) | Loss of the between teacher Nth subnet block and student Nth subnet block |
\(L_{cls}\) | Cross entropy loss |
\(\lambda\) | A hyper-parameter used to balance the \(L^{N}_{local}\) and \(L_{cls}\) |
\(f_{N}\) | The function that maps an input image to the activation at block N of Teacher Model |
\(s_{N}\) | The replacement block for Nth Teacher block |