Fig. 1

The overall architecture of our proposed method. Multi-Dimensional KD. Multi-Dimensional KD is applied between two ViT models, where the teacher model consists of 12 transformer blocks, and the student model consists of 3 transformer blocks.

The overall architecture of our proposed method. Multi-Dimensional KD. Multi-Dimensional KD is applied between two ViT models, where the teacher model consists of 12 transformer blocks, and the student model consists of 3 transformer blocks.