Table 1 Hyperparameters for LSPI (Unknown p(t) Distribution).
From: Reinforcement learning-based optimal control for stochastic opinion dynamics
Parameter | Value | Meaning | Rationale |
|---|---|---|---|
Convergence tolerance \(\epsilon\) | \(10^{-3}\) | Feedback gain precision | |
Sample size M | 3000 | data coverage | Sufficient for policy fitting |
Max iterations \(N_{\max }\) | 30 | LSPI convergence limit | Prevents overfitting |
Exploration noise \(\eta (t)\) | \(\mathscr {N}(0, 0.0025)\) | Exploration diversity | Std=0.05, no decay33 |