Fig. 2: Ab initio sufficient training avoids overfitting and yields prediction uncertainty distributions. | Nature Communications

Fig. 2: Ab initio sufficient training avoids overfitting and yields prediction uncertainty distributions.

From: Sufficient is better than optimal for training neural networks

Fig. 2

Ensembles of models sampled at finite temperature yield smooth decision boundaries (white lines in panel (a)) and average predictions (dark magenta curve in panel (b)) that are not skewed by noisy training data (star, triangle and square black markers in panel (a), and round black markers in panel (b)). Test data (star, triangle and square gray markers in panel a, and round gray markers in panel (b)) are overlayed to show how the ensemble predictions (decision boundaries and average curve in panels (a, b), respectively) generalize to unseen data. The background in panel a is shaded using a weighted average of the ensemble votes for each class at each point in the feature space, showing regions of confident ensemble prediction (regions of bright purple, teal, or orange in panel (a)) vs. uncertain prediction (intermediate colored regions in panel (a)). Analogously, panel b shows the density of predicted curves (transparent magenta curves in panel b) around the ensemble average (dark magenta curve in panel (b)). For classification problems, panels c and d show the ensemble’s decision-making confidence at different points in the data feature space via the proportion of ensemble votes for each class (c.f. panels (c, d) correspond to pink markers labeled (c, d) on panel (a)). For regression problems, we can compare the distributions of sampled predictions with the ensemble average at different input values (c.f. pink solution distribution and dark magenta point on panels (e, f), sampled at two different inputs indicated in panel (b)) and assess how the data noise distribution affects predictions throughout the feature space. Ab initio sufficient training produces correspondingly sufficiently descriptive predictions alongside insight into the ensemble prediction process that is inaccessible with a singular, optimized model.

Back to article page