Table 2 Adam optimization algorithm.
From: Logistics demand prediction using fuzzy support vector regression machine based on Adam optimization
Input: sample set \(\left({x}_{i},{y}_{i}\right),\,\left({x}_{i},{y}_{i}\right)\in {R}^{n}\times R\) \(\left(i=\mathrm{1,2},...,m\right),\) penalty parameter C = 100, maximum number of iterations \(T=50000\). |
Initialization: Let \(t=0\), \(b=0\), \({m}_{t}={v}_{t}=0\), \({w}\) is equal to the \(m* m\) dimensional matrix of all ones. |
Iteration: |
For \(t=\mathrm{1,2},\,\cdot \cdot \cdot\,,T\) do |
â‘ Calculate the gradient of the loss function with respect to the weight vector \(w\) and the bias \(b\): |
\({g}_{w,t}={\nabla }_{w}f\left({w}_{t-1},{b}_{t-1}\right){,g}_{b,t}={\nabla }_{b}f\left({w}_{t-1},{b}_{t-1}\right)\) |
â‘¡Calculate the first-order moment of the gradient: |
\({m}_{w,t}={\beta }_{1}\cdot {m}_{w,t-1}+\left(1-{\beta }_{1}\right)\cdot {g}_{w,t},{m}_{b,t}={\beta }_{1}\cdot {m}_{b,t-1}+\left(1-{\beta }_{1}\right)\cdot {g}_{b,t}\) |
â‘¢Calculate the second-order distance of the gradient: |
\({v}_{w,t}={\beta }_{2}\cdot {v}_{w,t-1}+\left(1-{\beta }_{2}\right)\cdot {g}_{w,t}^{2},{v}_{b,t}={\beta }_{2}\cdot {v}_{b,t-1}+\left(1-{\beta }_{2}\right)\cdot {g}_{b,t}^{2}\) |
â‘£Correcting the first order moment \({m}_{w,t},{m}_{b,t}:{\hat{m}}_{w,t}=\frac{{m}_{w,t}}{1-{\beta }_{1}^{t}},{\hat{m}}_{b,t}=\frac{{m}_{b,t}}{1-{\beta }_{1}^{t}}\) |
⑤Correcting the second-order moment \({v}_{w,t},{v}_{b,t}:{\hat{v}}_{w,t}=\frac{{v}_{w,t}}{1-{\beta }_{2}^{t}},{\hat{v}}_{b,t}=\frac{{v}_{b,t}}{1-{\beta }_{2}^{t}}\) |
â‘¥Update the weight vector w and the bias b with the following formula: |
\({w}_{t}={w}_{t-1}-\rho \cdot \frac{{\hat{m}}_{w,t}}{\left(\sqrt{{\hat{v}}_{w,t}}+\xi \right)},{b}_{t}={w}_{t-1}-\rho \cdot {\hat{m}}_{b,t}/\left(\sqrt{{\hat{v}}_{b,t}}+\xi \right)\) |
⑦Calculate \({f}_{t}=f\left({w}_{t},{b}_{t}\right)\) and \({f}_{t+1}=f\left({w}_{t+1},{b}_{t+1}\right)\) according to formula(2.12) |
If \(\Vert {f}_{t+1}-{f}_{t}\Vert < \varepsilon\) |
end |
\({\rm{else}}\) \(t=t+1\), turn to â‘ |
end for |
Return \({w}^{* }={w}_{t+1}\), \({b}^{* }={b}_{t+1}\) |
Output: the optimal weight vector \({w}^{* }\) and optimal bias \({b}^{* }\) and the decision function \(f\left(x\right)={w}^{* }x+{b}^{* }.\) |