Table 1 Main hyperparameter settings.

NOTES: r represents the dimensionality reduction ratio of DMSCA, and K is the multi-scale convolution kernel. The learning rate and batch size on ImageNet are usually adjusted according to the number of GPUs used and the total batch size, for example, using a linear scaling rule³¹. The optimal value of the temperature coefficient τ may vary depending on the dataset and the model, and it will be adjusted in the experiments.

Quick links

Search