Fig. 1: Two-layer models of visual responses in mouse and monkey V1.

a 32,440–52,868 natural images were shown to mice during two-photon calcium imaging recordings in V1. b Architecture of the prediction model including four convolutional layers and a neuron-specific readout layer, parameterized as a rank-1 decomposition of weights across x-pixels (wx), y-pixels (wy), and convolutional channels (wc). c Example neural activity and predictions on held-out test images. d Distribution of the fraction of explainable variance explained across all neurons (FEVE, see “Methods”, N = 14,504). e Performance as a function of training images (N = 6 mice). Error bars represent standard error of the mean (s.e.m.). f Performance as a function of model depth compared to the Sensorium model (green)24, and compared to a linear-nonlinear (LN) model (dashed line) (N = 6 mice). Error bars represent s.e.m. g Example readout weights wx and wy as well as their combined spatial map Wxy. h Pooling diameter distribution, estimated from wx and wy. i Natural and generated stimuli presented during neural recordings in monkey V1, figure from ref. 15. j–n Same as (c–h) for the our models fit to the monkey V1 dataset. l includes the baseline model from ref. 15 which has 5 layers. Error bars represent s.e.m.