Table 1 The number of times the robot visits unsafe states during learning.
From: Safe reinforcement learning under temporal logic with reward design and quantum action selection
maximum times | minimum times | |
|---|---|---|
Q-learning without safety values | 160,937 | 157,192 |
QSQ-learning | 2,294 | 962 |