Fig. 2: Illustration of gadgetization in TensorGame. | Nature Machine Intelligence

Fig. 2: Illustration of gadgetization in TensorGame.

From: Quantum circuit optimization with AlphaTensor

Fig. 2

Nodes in the tree correspond to states, and the number in each node is the immediate reward associated with the action leading to that node. In state (i), the last played action is labelled ‘a’, and state (ii) is reached after playing another action (labelled ‘b’). From state (ii), playing action ‘c’ leads to state (iii), incurring an additional −1 reward. If action ‘ab’ is played instead from state (ii), the reward leading to state (iiib) is 0 because the move completes a CS gadget (blue path). Similarly, the sequence of moves in the green path completes a Toffoli gadget, and thus the immediate reward in state (vii) is +4 so that the last 7 actions jointly receive a reward of −2 (see Methods for details on the gadgetization patterns). With its ability to plan, from state (i) the agent may decide to play actions in the green or blue paths and benefit from the adjusted rewards, or to play other actions that are not part of a gadget (black dashed paths). In this way, AlphaTensor-Quantum can automatically find trade-offs between playing actions that are part of a gadget and actions that are not.

Back to article page