Table 1 Hyperparameters for LSPI (Unknown p(t) Distribution).

From: Reinforcement learning-based optimal control for stochastic opinion dynamics

Parameter

Value

Meaning

Rationale

Convergence tolerance \(\epsilon\)

\(10^{-3}\)

Feedback gain precision

Aligned with previous studies32,33

Sample size M

3000

data coverage

Sufficient for policy fitting

Max iterations \(N_{\max }\)

30

LSPI convergence limit

Prevents overfitting

Exploration noise \(\eta (t)\)

\(\mathscr {N}(0, 0.0025)\)

Exploration diversity

Std=0.05, no decay33