Fig. 6: Generalization in single-hidden-layer networks on MNIST for linear (top), sigmoidal (middle), and ReLU (bottom) nonlinearity.
From: Coding schemes in neural networks learning classification tasks

a Activations of all neurons on 100 test inputs for a given weight sample; for ReLU only the ten most active neurons are shown. b Posterior-averaged kernel on 100 test inputs. c Mean predictor for class 0 from sampling (gray) and theory (black dashed). d Generalization error for each class averaged over 1000 test inputs from sampling (gray bars), theory (black circles), and GP theory (back triangles). gen. error, generalization error.