Extended Data Fig. 6: Compressing models of V1, V4, and IT neurons as well as DNN units.
From: Compact deep neural network models of the visual cortex

a-c. Given the large amount of compression observed for shared compact models trained on the predicted responses of task-driven DNN models for our V4 response dataset (Fig. 3c), we wondered to what extent we could compress these models on different response datasets from brain areas along the visual stream. To this end, we compressed ResNet50-robust predicting macaque V1, V4, and IT datasets (Fig. 3d–f); here, we measure the compressibility of three other task-driven DNNs: ResNet50, CORnet-S, and VGG19 (the data-driven shared compact models, purple traces, are the same as in Fig. 3d–f). We trained shared compact models via distillation, varying the number of filters in the first three “core” layers (see Methods). We then compared the noise-corrected R2 of these shared compact models with the full prediction performance of their task-driven DNN counterparts (rightmost dots). Interestingly, some compact models outperformed the full task-driven DNNs (b, blue, green, mauve traces above corresponding dots), sometimes observed with knowledge distillation16. Overall, the general trends of ResNet50-robust hold for all of other task-driven DNNs: Task-driven DNNs of cortical neurons are highly compressible, with the most compression achieved for V1, followed by V4, and then IT. We verified that these compact models achieve similar prediction as large task-driven DNNs but are orders-of-magnitude smaller by submitting the compact models to the public benchmark BrainScore; the compact models are now state-of-the-art predicting the corresponding V4 and IT datasets (Supplementary Table 2). Traces and dots denote medians; error bars denote bootstrapped 90% confidence intervals, n = 115, 88, 168 neurons for a, b, c, respectively. Arrows denote the smallest number of filters needed to achieve a prediction performance no less than 5% of that for 200 filters/core layer. d. We also evaluated the compressibility of the task-driven DNNs on the V4 response dataset from Bashivan et al.20 (n = 131 neurons), which we analyzed in comparison to our own V4 response dataset (Extended Data Fig. 2). Similar to all other datasets (Fig. 3c–f), we found high compressibility (arrows on 10-15 filters). These results provide further evidence that task-driven DNN models of V4 neurons can be greatly reduced in size by orders of magnitude. e. Given the compressibility of models of V4 neurons, we wondered to what extent we could compress a model of responses from DNN units (that is, internal units from an intermediate layer of a task-driven DNN), treating these units as V4 neurons. To test this, we trained shared compact models to predict responses of 219 DNN internal units (whose number equaled that of the number of our recorded V4 neurons in Fig. 3c). DNN units were chosen as the hidden units centrally located in the activity map (that is, for 14 × 14 pixel activity maps, we chose hidden units with spatial locations [7,7]) with the largest activity variance over 5,000 natural images; layers matched those most predictive of our V4 responses (see Methods). We trained these shared compact models (same architecture as those in a–d) to predict DNN unit responses taken directly from a task-driven DNN to 12 million images and varied the number of filters in each of the “core” layers. We computed the raw R2 between DNN unit responses and the predicted responses of the shared compact models; because the DNN units were deterministic, we did not need a noise-corrected R2 metric. Despite there being no noise in the DNN unit responses, the 200-filter shared compact models failed to achieve good prediction performance for most task-driven DNNs (‘200’, traces below R2 = 0.6; traces indicate median R2s and error bars denote 90% confidence intervals over 219 units). Beyond this, the DNN units were not easily compressible (arrows at 75-100 filters). That an individual DNN unit is not compressible likely arises from the fact these DNN units must carry out complex computations but must read out from only ~ 1,000 channels from an upstream layer to aid in performing object recognition; V4 comprises hundreds of millions of neurons and may not require each individual V4 neuron to carry out many computations on its own. f. We wondered whether the DNN units were truly more compressible than V4 neurons or that they were less compressible because our R2 metric did not include a noise-ceiling; in other words, the DNN unit responses had spurious nonlinearities that would otherwise be hidden by a noise-corrected R2. To see if this were the case, we first fit factorized linear mappings from the activity of task-driven DNNs (same layers and including the same DNN units as in e) to the 219 V4 neurons from the 4 test recording sessions. In other words, this setting differs from e only with a linear mapping appended to all DNN units from that layer of the task-driven DNN. We then distilled this model with shared compact models, following the same procedure as in e. In contrast to the DNN units, we observed strong compressibility: The shared compact models with 200-filters have good prediction (rightmost end of traces, all traces have R2 > 0.6), strong plateauing for larger numbers of filters (for example, between 50 and 200 filters/core layer), and only a small number of filters were needed to be within a 5% drop from the performance of 200 filters/core layer (arrows), consistent with the V4 results across datasets (b and d, as well as Fig. 3c). These results rule out the possibility that DNN units fail to compress because of small, spurious nonlinearities. Instead, it appears that linear readouts from DNN units that align with V4 neurons are more compressible than the DNN units themselves. An exciting possibility is that V4 may comprise expressive yet simple functions to ensure accurate stimulus encoding with robustness to noise, and that task-driven DNNs have only partially arrived at these compressible representations (that is, they are still susceptible to adversarial noise, etc.). By optimizing linear readouts of DNN units to achieve high levels of compression, one may achieve a task-driven representation more similar to that of V4. Future experiments are needed to ensure subtle nonlinearities of V4 neurons are not hidden behind repeat-to-repeat variability.