Table 5 Key experimental parameters used in SAC-based power management.

From: Adaptive power-saving mode control in NB-IoT networks using soft actor-critic reinforcement learning for optimal power management

Parameter

Purpose

Value

DRX timeout

Time in inactivity before DRX mode

2 s

PSM entry delay

Minimum idle duration to enter PSM

10 s

Learning rate

Actor/Critic optimizer learning rate

0.0003

Discount factor (\(\gamma\))

Future reward decay

0.99

Batch size

Number of transitions per training update

256

Max episodes

Training duration

1000 episodes

Sleep mode preference

Biasing probability toward eDRX and PSM

eDRX: 0.7, PSM: 0.3