Table 1 Main hyperparameter settings.

From: DMSCA: dynamic multi-scale channel-spatial attention for enhanced feature representation in convolutional neural networks

Hyperparameter

CIFAR-10/CIFAR-100

ImageNet

Optimizer

SGD

SGD

Initial Learning Rate

0.1

0.1

Learning Rate Schedule

Cosine Annealing

Cosine Annealing

Batch Size

64

128

Weight Decay

5 × 10⁻⁴

1 × 10⁻⁴

Momentum

0.9

0.9

Epochs

200

100

r

16

16

K

{3, 5, 7}

{3, 5, 7}

τ

1.0

1.0

  1. NOTES: r represents the dimensionality reduction ratio of DMSCA, and K is the multi-scale convolution kernel. The learning rate and batch size on ImageNet are usually adjusted according to the number of GPUs used and the total batch size, for example, using a linear scaling rule31. The optimal value of the temperature coefficient τ may vary depending on the dataset and the model, and it will be adjusted in the experiments.