Fig. 1
From: Universal quantum control through deep reinforcement learning

Overview of the RL implementation: at the iteration time step n + 1, the policy NN proposes a control action in the form of the system Hamiltonian \(\hat H_{n + 1}\), the training environmenttakes the proposed action and evaluates the Schrödinger equation under a noisy implementation \(\hat H_{n + 1} + \delta \hat H_{n + 1}\) for time duration \({\triangle} t\) to obtain a new unitary gate \(U_{n + 1}\) and calculates the associated cost function, both of which are fed into an RL agent. The policy NN and value NN of the RL agent are updated jointly based on the trajectory of the simulated unitary gate, controlaction and associated control cost