Fig. 6: Comparison of our theory with finite width neural network experiments. | Nature Communications

Fig. 6: Comparison of our theory with finite width neural network experiments.

From: Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Fig. 6: Comparison of our theory with finite width neural network experiments.The alternative text for this image may have been generated using AI.

a 2-layer NTK regression and corresponding neural network training using NeuralTangents package55 with 50000 hidden units for D = 25 with varying noise levels chosen according to g(λ). Target function is a single degree mode \(\bar{f}({\bf{x}})={c}_{k}{Q}_{k}^{(D-1)}({\boldsymbol{\beta }}\cdot {\bf{x}})\), where ck is a constant, β is a random vector, and \({Q}_{k}^{(D-1)}\) is the k-th Gegenbauer polynomial (see Supplementary Note 5 and 6). Here we picked k = 1 (linear target). Solid lines are the theory predicted learning curves (Eq. (4)), dots represent NTK regression and × represents Eg after neural network training. Correspondence between NN training and NTK regression breaks down at large sample sizes P since the network operates in under-parameterized regime and finite-size effects become dominating in Eg. Error bars represent standard deviation of 15 averages for kernel regression and 5 averages for neural network experiments. b \({\tilde{\lambda }}_{l}\) dependence to mode l across various layer NTKs. The weight and bias variances for the neural network are chosen to be \({\sigma }_{W}^{2}=1\) and \({\sigma }_{b}^{2}=0\), respectively.

Back to article page