Table 5 Key hyperparameter values and rationale.
From: Reinforcement learning-driven dynamic optimization strategy for parametric design of 3D models
Hyperparameter | Symbol | Value(s) | Rationale |
|---|---|---|---|
HLA Update Interval | K | 10 | Provides temporal abstraction, allowing the low-level agent to execute a sequence of 10 fine-tuning actions for each high-level strategic goal. This balances strategic stability with adaptive control. |
Reward Weights | λ₁ (Weight) λ₂ (Stress) λ₃ (Manufacturability) | 0.4 0.4 0.2 | Prioritizes safety and lightweighting as primary objectives while maintaining manufacturability as a critical secondary constraint. The sum is normalized to 1. |
Encoding Sensitivity | \(\:{\gamma\:}_{j}\) | 1.0 (uniform) | A default scaling factor that preserves the natural output range of the tanh function (−1 to 1), ensuring stable gradient flow during backpropagation. Can be tuned per parameter if sensitivity analysis dictates. |
Discount Factor | γ | 0.99 | Encourages long-horizon planning by making the agent highly value future rewards, which is essential for complex design tasks where the consequences of early actions unfold over time. |
PPO Clipping Range | ε | 0.2 | A standard value that prevents destructively large policy updates, ensuring stable and monotonic policy improvement throughout training. |