Table 4 DQN training process configuration.
Quantity | Value / Description |
|---|---|
Optimizer | Adam |
Learning rate | 1 × 10⁻³ |
Discount factor γ | 0.99 |
Replay buffer size | 100 transitions |
Target network update frequency | every 500 environment steps (hard update) |
ε-greedy exploration: initial ε | 1.0 |
ε-greedy exploration: decay schedule | max(0.1, 1 - episode / total episodes)) |
Episode length | 25 vortex-shedding periods per episode |
Number of training episodes (baseline runs) | 300 episodes |
Extended training (sensitivity study in Fig. 11) | 150 / 300 / 500 episodes |
Reward function | aerodynamic objective combining lift increase and drag reduction (Eq. (4) in main text) |
Environment solver | CFD solver (compressible formulation) coupled in-the-loop |