Extended Data Fig. 4: The pipeline of the RL training used in our work.
From: Avoiding fusion plasma tearing instability with deep reinforcement learning

First, random plasma profiles are selected from experimental data to be fed to both the dynamic model and the AI controller. The AI controller observes the plasma profiles and determines the action. Then, the dynamic model predicts the future βN and tearability. Lastly, the reward is estimated from the predicted state to optimize the AI controller.