Table 7 Training algorithms evaluated for the MLP^57,58.

Algorithm	Name	Optimization principle	Advantages	Disadvantages
Levenberg–Marquardt	Trainlm (LM)	Damped Gauss–Newton step that blends fast second-order moves with gradient descent for stability.	Very fast convergence; typically achieves very low training error on small/medium models.	High memory demand; can overfit with limited data; not ideal for large networks.
Scaled conjugate gradient	Trainscg (SCG)	Conjugate-gradient directions with an internal scaling step; avoids explicit line searches.	Low memory footprint; scales to larger networks; generally stable.	Usually requires more epochs; final error can be higher than LM/BFGS; sensitive to input scaling.
Bayesian regularization	Trainbr (BR)	Trains on a penalized objective that automatically balances data fit and weight size to improve generalization.	Strong resistance to overfitting; little need for early-stopping heuristics; smooth, stable training on modest datasets.	More computation per epoch than LM; memory-heavy than SCG; slower on very large datasets.

Quick links

Search