Table 5 Key hyperparameter values and rationale.

From: Reinforcement learning-driven dynamic optimization strategy for parametric design of 3D models

Hyperparameter

Symbol

Value(s)

Rationale

HLA Update Interval

K

10

Provides temporal abstraction, allowing the low-level agent to execute a sequence of 10 fine-tuning actions for each high-level strategic goal. This balances strategic stability with adaptive control.

Reward Weights

λ₁ (Weight)

λ₂ (Stress)

λ₃ (Manufacturability)

0.4

0.4

0.2

Prioritizes safety and lightweighting as primary objectives while maintaining manufacturability as a critical secondary constraint. The sum is normalized to 1.

Encoding Sensitivity

\(\:{\gamma\:}_{j}\)

1.0 (uniform)

A default scaling factor that preserves the natural output range of the tanh function (−1 to 1), ensuring stable gradient flow during backpropagation. Can be tuned per parameter if sensitivity analysis dictates.

Discount Factor

γ

0.99

Encourages long-horizon planning by making the agent highly value future rewards, which is essential for complex design tasks where the consequences of early actions unfold over time.

PPO Clipping Range

ε

0.2

A standard value that prevents destructively large policy updates, ensuring stable and monotonic policy improvement throughout training.