Table 4 Mean variations of F1-score by each dataset and architecture.
Architecture | Dataset | Hyperparameter | ||||
|---|---|---|---|---|---|---|
Batch size | Learning rate | Weight decay | Momentum | Iteration | ||
ViT | RS + RT | 0.17 | 0.23 | 0.09 | 0.11 | 0.15 |
AS + RT | 0.39 | 0.35 | 0.87 | 0.05 | 1.96 | |
RS + AT | 0.15 | 0.57 | 1.07 | 0.05 | 0.37 | |
AS + AT | 0.12 | 0.30 | 0.06 | 0.36 | 1.83 | |
Swin transformer | RS + RT | 0.43 | 0.74 | 0.02 | 0.22 | 0.86 |
AS + RT | 0.20 | 0.52 | 0.13 | 0.82 | 0.92 | |
RS + AT | 0.52 | 0.18 | 0.30 | 0.22 | 1.91 | |
AS + AT | 0.12 | 0.62 | 0.26 | 0.28 | 1.64 | |
PVT | RS + RT | 0.57 | 0.17 | 0.18 | 0.26 | 0.98 |
AS + RT | 0.14 | 0.13 | 0.25 | 0.32 | 1.46 | |
RS + AT | 0.47 | 0.02 | 0.24 | 0.13 | 0.53 | |
AS + AT | 0.23 | 0.41 | 0.27 | 0.35 | 1.92 | |
MobileViT | RS + RT | 0.01 | 0.06 | 0.13 | 0.19 | 0.30 |
AS + RT | 0.13 | 0.48 | 0.12 | 0.28 | 1.13 | |
RS + AT | 0.05 | 0.57 | 0.67 | 0.22 | 1.05 | |
AS + AT | 0.23 | 0.07 | 0.03 | 0.21 | 2.13 | |
Axial transformer | RS + RT | 0.01 | 0.13 | 0.16 | 0.07 | 0.19 |
AS + RT | 0.22 | 0.06 | 0.16 | 0.14 | 2.62 | |
RS + AT | 0.16 | 0.14 | 0.62 | 0.52 | 1.94 | |
AS + AT | 0.01 | 0.01 | 0.17 | 0.23 | 2.11 | |