Table 2 Hyperparameter tuning.
Parameter | Search space | Optimal value |
---|---|---|
Weight decay | [0,1e–2, 1e–3, 1e–4, 2e–4, 5e–4] | 2e–4 |
Learning rate | [1e–2, 1e–3, 2e–3, 1e–4] | 1e–2 |
Dropout | [3e–1,4e–1,5e–1] | 3e–1 |
Optimizer | SGD, ADAM | ADAM |
Parameter | Search space | Optimal value |
---|---|---|
Weight decay | [0,1e–2, 1e–3, 1e–4, 2e–4, 5e–4] | 2e–4 |
Learning rate | [1e–2, 1e–3, 2e–3, 1e–4] | 1e–2 |
Dropout | [3e–1,4e–1,5e–1] | 3e–1 |
Optimizer | SGD, ADAM | ADAM |