Extended Data Fig. 2: Sign-constrained compression for clustered and distributed representations.

a: Distribution of the excitatory compression weights that maximize the \({{{{\rm{SNR}}}}}_{{{{\rm{c}}}}}\propto \dim (c){(1-{\Delta }_{{{{\rm{c}}}}})}^{2}\), in the presence of a distributed input representation. b: Standard deviation of the out-degree of the input for the same compression matrix as in a, averaged across 10 realizations (red dashed line). The gray histogram represents the distribution of the same quantity for a compression matrix with the same sparsity but shuffled entries. c, d: Performance of a network with purely excitatory compression in the presence of a distributed input representation. Solid lines and shaded areas indicate the mean and standard deviation of the fraction of errors across network realizations, respectively. Parameters are the same as in Fig. 3e. c: Fraction of errors on a random classification task as a function of the redundancy in the input representation N/D. d: For fixed N/D = 10, network performance for different network architectures, as in Fig. 2a. ‘Excitatory’ indicates a network whose compression weights are trained to maximize the Hebbian SNR at the compression layer, that is \({{{{\rm{SNR}}}}}_{{{{\rm{c}}}}}\propto \dim (c){(1-\Delta c)}^{2}\), while unconstrained indicates a network trained on the same objective but without sign constraints on the weights. Excitatory and optimal compression are not statistically different for n = 10). The training procedure is the same used in Fig. 2a. The box boundary extends from the first to the third quartile of the data. The whiskers extend from the box by 1.5 times the inter-quartile range. The horizontal line indicates the median. e, f: Increasing input redundancy yields a smaller benefit when considering clustered input representations. All the parameters are the same as c, d, except for the type of input representation. e: Same as c, but for a clustered input representation. f: Same as d, but for a clustered input representation. Purely excitatory compression does not achieve the performance of whitening (two-sided Welch’s t-test, t-statistics = 10.615, p = 2.54 ⋅ 10−11, n = 10) nor of unconstrained compression trained with the same objective (two-sided Welch’s t-test, t-statistics = 8.563, p =9.19 ⋅ 10−8, n = 10). In panels c, e the shaded regions indicate the standard deviation across 10 network realizations.