Fig. 6: Schematic for reinforcement learning (RL) with deep deterministic policy gradient (DDPG) algorithm.
From: Single-atom exploration of optimized nonequilibrium quantum thermodynamics by reinforcement learning

This algorithm includes a replay buffer and the target network with actor \({\mu }_{{\theta }^{{\prime} }}^{{\prime} }\) and critic \({Q}_{{\omega }^{{\prime} }}^{{\prime} }\) in addition to the main network involving actor μθ and critic Qω. The learning agent (i.e., the network) acts on the environment (i.e., the trapped-ion qubit and/or the vibrational degree of freedom of the ion)) and updates the actions based on the obtained feedback from the environment.