Table 3 Hyperparameter configurations used for training.
Parameter | Value |
---|---|
Training epochs | 50 |
Batch size | 128 |
Optimizer | Adam |
Learning rate | 0.001 (adaptive) |
Loss function | Cross-Entropy |
Dropout rate | 0.2 |
Hidden layers | 3 dilated RNN layers |
Hidden units/layer | 128 (tuned by MSGO) |
Attention mechanism | Softmax over the temporal axis |