Fig. 3 | Scientific Reports

Fig. 3

From: Reinforcement learning-based optimal control for stochastic opinion dynamics

Fig. 3The alternative text for this image may have been generated using AI.

Cumulative discounted cost trajectories of the RL policy and the theoretical optimal controller from a fixed initial condition \(x_0=[1,2]^\top\) (mean Âħ one standard deviation over 5 rollouts). The vertical axis shows the time-accumulated cost \(\sum _{k=0}^{t} \delta ^k c_k\), which reflects transient cost evolution along representative trajectories. The percentage values indicate the relative difference in the expected total discounted cost\(\Delta = (J_{\textrm{RL}}-J_{\textrm{opt}})/J_{\textrm{opt}}\times 100\%\), where \(J=\mathbb {E}[\sum _{k=0}^{T-1}\delta ^k c_k]\) is estimated by Monte-Carlo simulation over randomly sampled initial states.

Back to article page