Fig. 5
From: Reinforcement learning-based optimal control for stochastic opinion dynamics

Opinion values of two agents as a function of simulation time step under the converged LSPI policy. The red solid line denotes Agent 1âs opinion; the blue dashed line denotes Agent 2âs opinion; the black dotted line denotes the target opinion \(s = 1.2\).