Extended Data Fig. 3: Training connectivity with model mismatch, related to Fig. 4.
From: Prediction of neural activity in connectome-constrained recurrent networks

A Teacher with model mismatch in the activation function, from Fig. 4a-c. B Example traces of one recorded neuron and one unrecorded neuron in the teacher and after training the student with mismatch in the β parameter. The students networks were trained with 20 recorded neurons (left) and with 150 recorded neurons (right). C Teacher-student framework with mismatch. We train the connectivity of the student, given the teacher’s connectivity as initial condition. The single-neuron parameters are the same in teacher and student, while there is a mismatch in the activation function. Same network as in Fig. 4. D The activation function is a smooth rectification but with different degrees of smoothness, parameterized by a parameter β. Teacher RNN from Fig. 2. E Errors in the activity of recorded (left) and unrecorded (right) neurons for different values of model mismatch between teacher and student. We observe a minor decrease in the error in unrecorded neurons when recording from a large number of neurons, M ≈ 150. F Error in the recorded activity (loss function) for three different mismatch values as a function of training epochs (β = 1. means no mismatch). G Error in the unrecorded activity (loss function) for three different mismatch values as a function of training epochs. H Removing the mismatch in activation by training an additional parameter. We train a student network with the same connectivity as the teacher and different single-neuron parameters. However, the student also does not know the smoothness parameter β. The trained parameters are therefore the gains and biases of each neuron and the smoothness β. I Error in unrecorded activity after training on a subset of M recorded units, similar to C. Training the smoothness parameter of the nonlinearity provides the student with the same prediction power as students without mismatch (see Fig. 2). J Estimated parameter β during training (average and SEM over 10 different initializations). Networks do not retrieve the exact teacher value (β* = 1) although converge to values not far from it on average. Students have a bias towards estimating sharper activation functions (β > 1). Both bias and variance are reduced as the number of recorded neurons is increased.