Fig. 3
From: Learning uplinks and downlinks transmissions in RF-charging IoT networks

Multi-Q includes downlink, uplink, and stateless layer. In the downlink layer, the HAP employs Algorithm 1, which is denoted as \(A_{1}\) in the figure, to learn downlink power allocation. In the uplink, each user independently employs Algorithm 2, which is denoted as \(A_{2}\), to learn its own slot selection and transmission probability. Then the stateless layer collects the reward of both uplink and downlink for one epoch and then employs Algorithm 3 to determine the frame size and power split ratio.