Fig. 3: Demonstration of application 2: ramp-down feedback control. | Communications Physics

Fig. 3: Demonstration of application 2: ramp-down feedback control.

From: Active ramp-down control and trajectory design for tokamaks with neural differential equations and reinforcement learning

Fig. 3

A comparison of RAPTOR simulation results from a naive baseline feed-forward trajectory (red) against the PPO trained policy running in closed loop (blue). The goal and constrain trajectories are shown in (a), while the action trajectories are shown in (b). The PPO policy clearly yields a considerable reduction in constraint violation for βp, li, Γ. While the nominal SPARC PRD ramp-down has a constant ramp-rate of 1MA/s, our baseline was selected to have the same average ramp-rate as the PPO policy to better highlight how more complex time traces can yield a reduction in constraint violation while keeping the same ramp-down time. Note that while the nominal action space is the rates of change of Paux and gs, the action trajectories plot (right) shows the time-integrated values for interpertability. Simulation of the stored energy and vertical field rate of change involve spikes at sawtooth and confinement regime transition events which are due to RAPTORs approach to handling boundary conditions at these events.

Back to article page