Table 8 Swin transformer hierarchical architecture.
From: Attention-Enhanced CNNs and transformers for accurate monkeypox and skin disease detection
Stage | Transformer Blocks | Feature Dimension | Patch Resolution/Down sampling |
---|---|---|---|
Patch | Patch Embedding (Conv) | 96 | Initial patch size (4 × 4 patches) |
Stage 1 | Swin Transformer Blocks (W-MSA & SW-MSA) | 96 | No down sampling |
Stage 2 | Swin Transformer Blocks | 192 | Down sampled by 2 |
Stage 3 | Swin Transformer Blocks | 384 | Down sampled by 2 |
Stage 4 | Swin Transformer Blocks | 768 | Down sampled by 2 |
Head | Global Average Pooling/MLP | - | - |