Extended Data Fig. 1: Exploring the image statistics and representational flexibility of the generators. | Nature Neuroscience

Extended Data Fig. 1: Exploring the image statistics and representational flexibility of the generators.

From: Neuronal tuning aligns dynamically with object and texture manifolds across the visual hierarchy

Extended Data Fig. 1: Exploring the image statistics and representational flexibility of the generators.

A. Left: Representative synthetic images generated from DeePSim and BigGAN, illustrating visual characteristics produced by each generator. Right: Fréchet Inception Distance (FID) comparing each generator’s image distribution to natural image datasets. BigGAN more closely matched natural image statistics than DeePSim. For reference, the FID between ImageNet and other natural image datasets (EcoSet, THINGS-object, MS-COCO) is also shown. Each FID value reflects the empirical distance computed from a fixed sample of generated images (n = 50,000 / condition] synthetic images per generator) and an equal number of natural comparison images. Bottom: Objectness quantification using the YOLOv8 object-detection model. “Object detect rate” shows the fraction of samples in which at least one object was detected, and “detection confidence” reports the maximum confidence score per image. Sample sizes were the same as for the FID analysis. BigGAN and ImageNet showed highly similar objectness distributions, whereas DeePSim exhibited markedly reduced object detectability. Violin plots show the full distribution of values. Box plots inside violins indicate the median (center line), the 25th and 75th percentiles (lower and upper box edges), and whiskers extending to 1.5× the interquartile range or to the most extreme datapoint within that range. Outliers beyond the whiskers are shown as individual points. B. Low-level statistics of images optimized in DeePSim and BigGAN spaces, optimizing the same hidden units in AlexNet. Specifically, images were optimized for units in AlexNet conv5 (with receptive fields placed at the center, N = 91 units); a preferred image for each unit was concurrently optimized using the DeePSim- and the BigGAN generators. Analyses are based on all images in the last evolved generation (N = 3640 total). Each point represents a sampled image from the latent space of each generator. Values are mean ± SEM. C. Image Inversion Experiment. As the next test of representational flexibility, we replaced activation maximization with a different objective and process: an image inversion test. The goal was to determine how well DeePSim and BigGAN could approximate randomly sampled photographs by optimizing their latent codes. By comparing the similarity between the randomly sampled images and their reconstructions from each GAN, we examined differences in representational flexibility. For this experiment, we sampled 101 photographs randomly from the EcoSet dataset across all categories. Synthetic images were optimized using the Adam algorithm to minimize the mean squared error (MSE) between the original image and the GAN reconstruction over 401 iterations. Per visual examination, many images were well-matched by DeePSim, whereas BigGAN often matched overall color but instead of objects or textured scenes, it frequently placed a single central humanoid object in the image center (Fig. 1). We measured the similarity between original images and their GAN-generated inversions using Pearson correlation (over pixel space). DeePSim achieved a median similarity of 0.89 ± 0.01 (SE), while BigGAN achieved a median similarity of 0.60±0.02. This difference was statistically significant (P < 0.00001, Wilcoxon signed-rank test, N = 101, Fig. 1). Overall, these results suggest that, given a fixed optimization process and reconstruction objective, DeePSim provides substantially more accurate approximations compared to BigGAN, likely reflecting differences in the flexibility of their respective latent spaces. (right) The first row shows randomly sampled images from the EcoSet dataset. The second and third rows show gradient-based reconstructions by DeePSim (blue) and BigGAN (red). (left) Similarity to original images. Scatterplots show the Pearson correlation between the original randomly sampled images and their GAN-based reconstructions for DeePSim (blue) and BigGAN (red). D. Activation maximization scores. To measure how generators compared in their ability to represent different types of images, we performed a series of computational experiments. First, we used gradient-based methods to optimize images for hidden units. This allowed us to isolate the role of optimization type and assess each generator’s inherent capacity for activation maximization. We used two ResNet architectures, ResNet-50 and a ResNet-50 variant trained to withstand adversarial attacks (ResNet-50-robust or ResNet-50linf826), conducting 97,730 experiments using units from multiple layers. In each experiment, we optimized the latent vectors of each generator to maximize the activation of a given individual unit. The analysis involved several optimization strategies (for example, Adam variants, stochastic gradient descent variants), different learning rates (for example, 0.001, 0.01, 0.1), and other settings such as Hessian-based modifications (for example, Adam001Hess, SGD001Hess). We found that DeePSim consistently led to higher activation scores compared to BigGAN, regardless of model (vanilla or robust), layer, or optimization method. Across all experiments, the median activation score achieved through DeePSim was 5.22 ± 0.04, while the median activation score achieved through BigGAN was 1.115 ± 0.005. These values were statistically different per a Wilcoxon rank sum test (P < 0.00001, r = −0.533). Each violin plot shows the maximum activation scores achieved by ResNet-50 (left) and ResNet-50linf8 (robust) hidden units after gradient-based optimization of images, using BigGAN (red) and DeePSim (blue). X-axis labels show the layer of the hidden units tested.

Source data

Back to article page