Table 1 Training hyper-parameters
From: Modeling attention and binding in the brain through bidirectional recurrent gating
n-Epochs | Batch-size | Learning rate η | η Scheduler milestones | scheduler γ | L2-rate λ | |
|---|---|---|---|---|---|---|
MNIST | 96 | 128 | 5 × 10−4 | [32, 64] | 0.2 | 1 × 10−6 |
COCO | 48 | 128 | 5 × 10−4 | 0.25 | OneCycleLR | 1 × 10−5 |
CelebA | 32 | 128 | 2 × 10−4 | [16, ] | 0.1 | 5 × 10−4 |
Contrast Detect. | 32 | 64 | 1 × 10−4 | 0.25 | OneCycleLR | 5 × 10−4 |
Contrast Discrim. | 32 | 64 | 1 × 10−4 | 0.25 | OneCycleLR | 5 × 10−4 |
Ori. Change Detect. | 64 | 64 | 5 × 10−4 | 0.25 | OneCycleLR | 1 × 10−4 |
Fig-Grnd-Sep | 64 | 64 | 2 × 10−4 | 0.125 | OneCycleLR | 1 × 10−4 |
Curve Tracing | 64 | 128 | 5 × 10−4 | – | – | 1 × 10−6 |
CIFAR-100 | 64 | 64 | 5 × 10−4 | 0.125 | OneCycleLR | 1 × 10−4 |
Multi-Modal Search | 64 | 64 | 1 × 10−4 | – | – | 5 × 10−5 |