Table 4 Hyperparameters for TCN + BiLSTM + Attention mechanism Training.
Component | Parameter | Value |
|---|---|---|
Network Parameters | ||
TCN | Kernel Sizes | [5, 7, 9] |
Dilation Rates | [1, 2, 4, 8] | |
Number of Filters | 64 per branch | |
Dropout Rate | 0.2 | |
BiLSTM | Hidden Units | 256 (each direction) |
Number of Layers | 2 | |
Dropout Rate | 0.3 | |
Attention | Number of Heads | 8 |
Key Dimension | 64 | |
Dropout Rate | 0.1 | |
Training Parameters | ||
Optimization | Optimizer | Adam |
Initial Learning Rate | 3 × 10^−4 | |
Weight Decay | 1 × 10^−5 | |
Gradient Clip Norm | 1.0 | |
Training Process | Batch Size | 64 |
Gradient Accumulation Steps | 4 | |
Number of Epochs | 157 | |
Early Stopping Patience | 15 | |
Loss Function | MSE Weight (\(\:{\alpha\:}_{1}\)) | 0.6 |
Temporal Weight (\(\:{\alpha\:}_{2}\)) | 0.3 | |
Attention Weight (\(\:{\alpha\:}_{3}\)) | 0.1 | |
Data Processing | Sequence Length | 10,000 timesteps |
Sampling Rate | 1000 Hz | |
Window Overlap | 50% | |
Model Settings | Random Seed | 42 |
Cross-validation Folds | 5 | |
Test Set Ratio | 0.2 | |