Fig. 2: Transfer and Interference in ANNs.

ANNs were trained on participant-matched trial sequences (one network per participant). a–c, Learning curves for network in the three conditions. Each network is trained sequentially on task A followed by task B (full supervision with mean-squared error loss), and then retested on task A. During retest, model weights are not updated after winter trials (analogous to participants receiving feedback only for summer but not winter stimuli). a shows networks trained in the same condition (tasks with identical rules), b shows networks trained in the near condition (tasks with similar rules) and c shows networks trained in the far condition (tasks with opposite rules). Dashed lines show task change points, showing the introduction of task B stimuli and the return to task A stimuli, respectively. d, The number of principal components needed to capture 99% variance of the activity at the network’s hidden layer when exposed to all inputs. This is shown split by condition, both after training on only task A (purple) and after training on task B as well (green). e, Visualization of the two-dimensional representation of task stimuli at the network’s hidden layer, after training on task A stimuli. PCA (with two components) was performed on the network’s hidden layer activity when exposed to all inputs. f, Visualization of hidden layer stimuli representations after training on task B in the Same condition. g, The same as f after training the network to perform task B in the Near condition. h, The same as f after training the network to perform task B in the Far condition (see Supplementary Fig. 2 for additional visualizations of subspaces). i, Principal angles between task subspaces in the Same, Near and Far conditions. PCA (n = 2 components) was performed on ANN hidden layer activity for stimuli from task A versus task B, and the angle between subspaces computed. Larger angles indicate greater orthogonality between subspaces.