Table 5 Training parameters and configuration.
From: Unbalanced power anomaly detection model based on improved transformer and countermeasure encoder
Hyperparameters | Values | Optimization objectives / descriptions |
|---|---|---|
Batch Size | 256 | Balances GPU memory usage and gradient stability |
Optimizer | AdamW | - |
Learning Rate | \(1 \times {10^{ - 4}}\) | Initial value with cosine annealing scheduling |
Weight Decay | 0.01 | - |
Gradient Clipping Threshold | 1.0 | Prevents gradient explosion |
Training Epochs | 100 | Early stopping patience = 15 |