Fig. 12: Time to reach 1 million training steps.

Execution time of training trajectories of 4 parallel agents (in a single GPU) with identical hyperparameters as those shown in Fig. 11 with different number of physical qubits n and code distance d (but keeping the number of logical qubits k = 1).