Fig. 2: The deep reinforcement learning agent learns how to approximate a single-qubit gate. | Communications Physics

Fig. 2: The deep reinforcement learning agent learns how to approximate a single-qubit gate.

From: Quantum compiling by deep reinforcement learning

Fig. 2

a Best sequences of gates discovered by the agent during the training at different epochs. The dashed lines connecting the Bloch spheres to the Episode axis indicate the episode at which the sequences were found for the first time. Each approximating sequence is represented by two trajectories of states (colored points) on the Bloch sphere. They are obtained by applying the unitary transformations associated with the circuit at the time step n on two representative states, namely \(\left|0\right\rangle\) and \(\left|+\right\rangle\) respectively. The agent is asked to transform the starting state (green arrows) in the corresponding ending state (red arrows), i.e., \(\left|0\right\rangle\) to \({{{{{{{\mathcal{U}}}}}}}}\left|0\right\rangle\) and \(\left|+\right\rangle\) to \({{{{{{{\mathcal{U}}}}}}}}\left|+\right\rangle\) respectively, where \({{{{{{{\mathcal{U}}}}}}}}\) corresponds to the unitary target. b Performance of the agent during training. The plot represents the percentage of episodes for which the agent was able to find a solution (blue line) and the average number of the sequence of gates (orange line). The agent learns how to approximate the target after about 104 episodes and then improves the solution over time.

Back to article page