Table 3 Experimental details with ResNet as the teacher network.
From: Counterclockwise block-by-block knowledge distillation for neural network compression
Training config | ResNet-18 in Tiny-imagenet-200 | ResNet-18 in CIFAR-10 |
|---|---|---|
Base learning rate | 2e-3 | 2e-3 |
Weight decay | 0.05 | 0.05 |
Batch size | 32 | 100 |
Training epoches(CBKD) | 150,200,200,200 | 5,20,50,50 |
Learning rate schedule | Cosine decay | Cosine decay |
Thaw training epoches | 300 | 30 |
Warmup epoches | max((training epoches)*0.05,1) | Max((training epoches)*0.05,1) |
Training epoches(Teacher) | 180 | 30 |
Training epoches(Student) | 300 | 30 |
Training epoches(KD) | 300 | 30 |
Training epoches(FitNets) | 100,300 | 10,30 |
Training epoches(RKD) | 300 | 30 |
Training epoches(DKD) | 300 | 30 |
Training epoches(L-S-KD) | 300 | 30 |