Fig. 3: Learning curves and double-descent phase diagram for kernels with white band-limited spectra.

We simulated N = 800 dimensional uncorrelated Gaussian features \({\boldsymbol{\phi }}({\bf{x}})={\bf{x}} \sim {\mathcal{N}}(0,{\bf{I}})\) and estimated a linear function \(\bar{f}({\bf{x}})={{\boldsymbol{\beta }}}^{\top }{\bf{x}}\) with ∣∣β∣∣2 = N. Error bars describe the standard deviation over 15 trials. Solid lines are theory (Eq. (7)), dots are experiments. a When λ = 0 and σ2 = 0, Eg linearly decreases with α and when σ2 > 0 it diverges as α → 1. b When σ2 = 0, explicit regularization λ always leads to slower decay in Eg. c For nonzero noise σ2 > 0, there is an optimal regularization λ* = σ2 which gives the best generalization performance. d Double-descent phase diagram where the colored squares correspond to the curves with same color in c. Optimal regularization (λ* = σ2) curve is shown in yellow dashed line which does not intersect the double-descent region above the curve defined by g(λ) (Eq. (8)).