Table 3 Experimental details with ResNet as the teacher network.

From: Counterclockwise block-by-block knowledge distillation for neural network compression

Training config

ResNet-18 in Tiny-imagenet-200

ResNet-18 in CIFAR-10

Base learning rate

2e-3

2e-3

Weight decay

0.05

0.05

Batch size

32

100

Training epoches(CBKD)

150,200,200,200

5,20,50,50

Learning rate schedule

Cosine decay

Cosine decay

Thaw training epoches

300

30

Warmup epoches

max((training epoches)*0.05,1)

Max((training epoches)*0.05,1)

Training epoches(Teacher)

180

30

Training epoches(Student)

300

30

Training epoches(KD)

300

30

Training epoches(FitNets)

100,300

10,30

Training epoches(RKD)

300

30

Training epoches(DKD)

300

30

Training epoches(L-S-KD)

300

30