Figure 13
From: Safe reinforcement learning under temporal logic with reward design and quantum action selection

The generated trajectories from the optimal policy learned via QSQ-learning.
From: Safe reinforcement learning under temporal logic with reward design and quantum action selection

The generated trajectories from the optimal policy learned via QSQ-learning.