Table 1 RL hyperparameters used in the SAC agent.

From: Coordinated operation and multi-layered optimization of hybrid photovoltaic-small modular reactor microgrids

Hyperparameter

Symbol

Value

Algorithm

SAC

Learning rate

\(\eta\)

\(3\times 10^{-4}\)

Discount factor

\(\gamma\)

0.99

Replay buffer size

\(|\mathscr {D}|\)

\(1.0\times 10^{6}\)

Batch size

B

256

Target smoothing coef.

\(\tau\)

\(5\times 10^{-3}\)

Entropy temperature

\(\alpha\)

Auto-tuned

Target entropy

\(H_{\text {tgt}}\)

\(-\dim (\mathscr {A})\)

Learning starts

\(10^4\) steps

Training steps

\(N_{\text {train}}\)

\(2\times 10^{5}\)

Network architecture

(256, 256) MLP

Optimizer

Adam