Table 1 Summary of notations.

From: Counterclockwise block-by-block knowledge distillation for neural network compression

Term

Definition

T

Teacher model

S

Student model

\(L^{N}_{local}\)

Loss of the between teacher Nth subnet block and student Nth subnet block

\(L_{cls}\)

Cross entropy loss

\(\lambda\)

A hyper-parameter used to balance the \(L^{N}_{local}\)

and \(L_{cls}\)

\(f_{N}\)

The function that maps an input image to the

activation at block N of Teacher Model

\(s_{N}\)

The replacement block for Nth Teacher block