Table 5 Training parameters and configuration.

From: Unbalanced power anomaly detection model based on improved transformer and countermeasure encoder

Hyperparameters

Values

Optimization objectives / descriptions

Batch Size

256

Balances GPU memory usage and gradient stability

Optimizer

AdamW

-

Learning Rate

\(1 \times {10^{ - 4}}\)

Initial value with cosine annealing scheduling

Weight Decay

0.01

-

Gradient Clipping Threshold

1.0

Prevents gradient explosion

Training Epochs

100

Early stopping patience = 15