Scientific Reports

Table 3 Experimental details with ResNet as the teacher network.

From: Counterclockwise block-by-block knowledge distillation for neural network compression

Training config	ResNet-18 in Tiny-imagenet-200	ResNet-18 in CIFAR-10
Base learning rate	2e-3	2e-3
Weight decay	0.05	0.05
Batch size	32	100
Training epoches(CBKD)	150,200,200,200	5,20,50,50
Learning rate schedule	Cosine decay	Cosine decay
Thaw training epoches	300	30
Warmup epoches	max((training epoches)*0.05,1)	Max((training epoches)*0.05,1)
Training epoches(Teacher)	180	30
Training epoches(Student)	300	30
Training epoches(KD)	300	30
Training epoches(FitNets)	100,300	10,30
Training epoches(RKD)	300	30
Training epoches(DKD)	300	30
Training epoches(L-S-KD)	300	30

Back to article page

Search

Advanced search

Quick links