Fig. 6: Dissociating the effect of frequency and information in supervised learning tasks.
From: Efficient neural codes naturally emerge through gradient descent learning

a We trained 3-layer nonlinear neural networks to classify the orientation of sinusoidal gratings into 3∘ bins, varying either input frequency or output noise. b We controlled the informativeness of input orientations by injecting noise into the labels as a function of orientation (specifically, the binary labels are multiplied by a Bernoulli dropout with a rate that depends on orientation). The sensitivity of the first layer to input orientation is shown over learning. With uniform statistics, the more informative features are preferentially learned. c The effect of varying input frequency without applying label noise. In this case, the more frequent features are preferentially learned. d We then balanced noise and frequency such that the total information in the input dataset about each output label is uniform (see Methods). Learning with gradient descent still prefers common angles.