Table 1 The number of times the robot visits unsafe states during learning.

From: Safe reinforcement learning under temporal logic with reward design and quantum action selection

 

maximum times

minimum times

Q-learning without safety values

160,937

157,192

QSQ-learning

2,294

962