Table 5 Performance comparison of DRN under different gradient-based optimizers on the MyPJ dataset.

From: A comparative evaluation of gradient-based optimization algorithms for short-term load forecasting using deep residual networks

Optimizer algorithm

MAPE

RMSE

MAE

MSE

NMSE

R

R2

SGD

0.059559

0.049746

0.031838

0.002475

0.086920

0.959048

0.913080

SGD-momentum

0.054527

0.047531

0.027916

0.002259

0.079353

0.960828

0.920647

NAG

0.059803

0.049361

0.032160

0.002437

0.085581

0.961268

0.914419

AdaGrad

0.059418

0.050279

0.031233

0.002528

0.088792

0.957359

0.911208

AdaDelta

0.070161

0.058211

0.036986

0.003389

0.119019

0.941602

0.880981

RMSProp

0.057384

0.049013

0.029796

0.002402

0.084378

0.958517

0.915622

Adam

0.053519

0.047223

0.027487

0.002230

0.078326

0.960838

0.921674

Adamax

0.054368

0.045021

0.027147

0.002027

0.071191

0.964814

0.928809

Nadam

0.057242

0.051119

0.029541

0.002613

0.091783

0.953467

0.908217

AMSGrad

0.051152

0.046104

0.026392

0.002126

0.074657

0.962117

0.925343

AdamW

0.057474

0.050824

0.028926

0.002583

0.090730

0.954530

0.909270

RAdam

0.057834

0.049964

0.029953

0.002496

0.087682

0.957563

0.912318

AdaBelief

0.054198

0.045700

0.029288

0.002088

0.073356

0.965968

0.926644