Table 4 DQN training process configuration.

From: Application of deep reinforcement learning for aerodynamic control around an angled airfoil via synthetic jet

Quantity

Value / Description

Optimizer

Adam

Learning rate

1 × 10⁻³

Discount factor γ

0.99

Replay buffer size

100 transitions

Target network update frequency

every 500 environment steps (hard update)

ε-greedy exploration: initial ε

1.0

ε-greedy exploration: decay schedule

max(0.1, 1 - episode / total episodes))

Episode length

25 vortex-shedding periods per episode

Number of training episodes

(baseline runs)

300 episodes

Extended training

(sensitivity study in Fig. 11)

150 / 300 / 500 episodes

Reward function

aerodynamic objective combining lift increase and drag reduction

(Eq. (4) in main text)

Environment solver

CFD solver (compressible formulation) coupled in-the-loop