Table 2 Commonly used notations and variables.

From: Deep reinforcement learning framework for joint optimization of multi-RAT UAV location and user association in heterogeneous networks

Notation

Description

Notation

Description

\(\mathcalligra{\scriptstyle K} \hspace{000.1cm}, \mathcalligra{\scriptstyle L} \hspace{000.1cm}, \mathcalligra{\scriptstyle W} \hspace{000.1cm},\mathcalligra{\scriptstyle U} \hspace{000.1cm}, \mathcalligra{\scriptstyle G} \hspace{000.1cm}\)

The set of BSs, FBSs, WAPs, UBS, and GDs.

XYH

The UAVs’ X-axis, Y-axis, and altitude.

KLUN

The number of BSs, LTE-BSs, UBSs, and GDs.

\(P^w,P_{i,k}^u\)

Wi-Fi card, and average uplink power consumption.

\(N^K, N_k^L,\) \(N_k^W\)

The number of associated GDs with MBS, FBS, and WAP k.

\(R_i^u,\Gamma ^u\)

GD’s uplink average traffic generation rate, and target SNR.

\(P_k\left( V_k\right) ,E_k\)

UAV’s power consumption and energy consumption.

\(\mathcalligra{\scriptstyle S} \hspace{000.1cm},\mathbb {A},\mathbb {O}\)

Set of the state, joint action, and joint observation spaces.

s(t), a(t), r(t)

The state, action, and reward at time t.

\(\gamma ,\pi ,\pi ^*\)

Discount factor, UAV’s policy, and optimal policy.

\(\theta , \theta ^-\)

\(\mathcalligra{\scriptstyle Q} \hspace{000.1cm}\)-network, and target network weights.

\(\mathcalligra{\scriptstyle Q} \hspace{000.1cm}^\pi (s,a)\)

The UAV’s state-action value function.

\(M^{ep}\)

The number of episodes of the Q-learning algorithm.

\(\hspace{000.2cm} \mathcalligra{\scriptstyle J} \hspace{000.2cm}\)

The regret-matching game.

\(R_{ik}^d,R_{ik}^{WPHY},S_i\)

The average downlink data rate, downlink WLAN physical data rate, GD’s satisfaction.

\(D_i^t\left( m_i,m_i^\prime \right)\)

The payoff for GD i if it had played action \(m_i\) instead of \(m_i^\prime\).

\(T_{SC}^{LTE}\)

The duration of an LTE subframe.

\(\psi _i^{t+1}\left( m_i\right)\)

The probability distribution of GD i choosing an action at time t.

\(C_{i,k}^{LMCS}, C_{i,k}^{WMCS}\)

The coding rate of LTE BSs and WAPs.

\({\bar{z}}_t\)

The empirical distribution of joint actions \(\Sigma\) of all GDs until t.

A

Association matrix between GDs and base stations.

\(\Sigma ^*\)

Optimal joint strategies.