Table 7 Training algorithms evaluated for the MLP57,58.
Algorithm | Name | Optimization principle | Advantages | Disadvantages |
|---|---|---|---|---|
Levenberg–Marquardt | Trainlm (LM) | Damped Gauss–Newton step that blends fast second-order moves with gradient descent for stability. | Very fast convergence; typically achieves very low training error on small/medium models. | High memory demand; can overfit with limited data; not ideal for large networks. |
Scaled conjugate gradient | Trainscg (SCG) | Conjugate-gradient directions with an internal scaling step; avoids explicit line searches. | Low memory footprint; scales to larger networks; generally stable. | Usually requires more epochs; final error can be higher than LM/BFGS; sensitive to input scaling. |
Bayesian regularization | Trainbr (BR) | Trains on a penalized objective that automatically balances data fit and weight size to improve generalization. | Strong resistance to overfitting; little need for early-stopping heuristics; smooth, stable training on modest datasets. | More computation per epoch than LM; memory-heavy than SCG; slower on very large datasets. |