Fig. 11: Example of a training run for [[7, 1, 3]] code discovery.

a Return and circuit size during training, b Details of the data calculation pipeline and complete set of hyperparameters used for this run. Here, 4 parallel agents each interact with batches of 64 circuits processed in parallel. Each agent finds a different encoding circuit, and the training finishes in 20 s on a single GPU. The meaning of every hyperparameter is explained in Methods.