Table 7 Training algorithms evaluated for the MLP57,58.

From: Machine learning and response surface analysis of mean drop size and dispersed phase holdup in an L-shaped pulsed sieve plate column

Algorithm

Name

Optimization principle

Advantages

Disadvantages

Levenberg–Marquardt

Trainlm (LM)

Damped Gauss–Newton step that blends fast second-order moves with gradient descent for stability.

Very fast convergence; typically achieves very low training error on small/medium models.

High memory demand; can overfit with limited data; not ideal for large networks.

Scaled conjugate gradient

Trainscg (SCG)

Conjugate-gradient directions with an internal scaling step; avoids explicit line searches.

Low memory footprint; scales to larger networks; generally stable.

Usually requires more epochs; final error can be higher than LM/BFGS; sensitive to input scaling.

Bayesian regularization

Trainbr (BR)

Trains on a penalized objective that automatically balances data fit and weight size to improve generalization.

Strong resistance to overfitting; little need for early-stopping heuristics; smooth, stable training on modest datasets.

More computation per epoch than LM; memory-heavy than SCG; slower on very large datasets.