Fig. 2: Coding schemes in the feature layer for linear (top), sigmoidal (middle), and ReLU (bottom) nonlinearity.
From: Coding schemes in neural networks learning classification tasks

a Sample of the readout weight vectors of all three classes. For ReLU, only readout weights of the nine most active neurons are shown. b Activations on all training inputs for a given weight sample. For ReLU, only the nine most active neurons are shown. c Network output for the weighted sample shown in (a, b). The output perfectly matches the one-hot coding of the task where the first half of the inputs belong to the first class, and the remaining inputs belong to the two remaining classes with equal proportions.