Extended Data Fig. 5: Cursor copilot learns the center-out 8 task during synthetic training.
From: Brain–computer interface control with artificial intelligence copilots

a, PPO policy loss, value function loss, and rewards over training for the cursor copilot. The copilot increases rewards through training, as well as b, in an evaluation test environment, where the copilot was frozen every 8192 training steps and evaluated on the center-out and back task. c, Success percentage, d, trial time, e, target hit rate, and f, Fitts ITR on the center-out 8 task over the course of training. Figure c, d, e, and f represent cumulative results of 8 copilots trained under the identical hyperparameter settings. The dark gray represents the mean, while the light gray band shows the standard error of the mean (SEM). These demonstrate that the copilot learns to use the surrogate KF signals to perform the center-out 8 task. Please note that these numbers are in general lower (for example, success percentage does not reach 100%) because the copilot task was more challenging, having a 2 second target hold time to encourage goal acquisition behavior (see Methods).