Scientific Reports

Table 2 Commonly used notations and variables.

From: Deep reinforcement learning framework for joint optimization of multi-RAT UAV location and user association in heterogeneous networks

Notation	Description	Notation	Description
\(\mathcalligra{\scriptstyle K} \hspace{000.1cm}, \mathcalligra{\scriptstyle L} \hspace{000.1cm}, \mathcalligra{\scriptstyle W} \hspace{000.1cm},\mathcalligra{\scriptstyle U} \hspace{000.1cm}, \mathcalligra{\scriptstyle G} \hspace{000.1cm}\)	The set of BSs, FBSs, WAPs, UBS, and GDs.	X, Y, H	The UAVs’ X-axis, Y-axis, and altitude.
K, L, U, N	The number of BSs, LTE-BSs, UBSs, and GDs.	\(P^w,P_{i,k}^u\)	Wi-Fi card, and average uplink power consumption.
\(N^K, N_k^L,\) \(N_k^W\)	The number of associated GDs with MBS, FBS, and WAP k.	\(R_i^u,\Gamma ^u\)	GD’s uplink average traffic generation rate, and target SNR.
\(P_k\left( V_k\right) ,E_k\)	UAV’s power consumption and energy consumption.	\(\mathcalligra{\scriptstyle S} \hspace{000.1cm},\mathbb {A},\mathbb {O}\)	Set of the state, joint action, and joint observation spaces.
s(t), a(t), r(t)	The state, action, and reward at time t.	\(\gamma ,\pi ,\pi ^*\)	Discount factor, UAV’s policy, and optimal policy.
\(\theta , \theta ^-\)	\(\mathcalligra{\scriptstyle Q} \hspace{000.1cm}\)-network, and target network weights.	\(\mathcalligra{\scriptstyle Q} \hspace{000.1cm}^\pi (s,a)\)	The UAV’s state-action value function.
\(M^{ep}\)	The number of episodes of the Q-learning algorithm.	\(\hspace{000.2cm} \mathcalligra{\scriptstyle J} \hspace{000.2cm}\)	The regret-matching game.
\(R_{ik}^d,R_{ik}^{WPHY},S_i\)	The average downlink data rate, downlink WLAN physical data rate, GD’s satisfaction.	\(D_i^t\left( m_i,m_i^\prime \right)\)	The payoff for GD i if it had played action \(m_i\) instead of \(m_i^\prime\).
\(T_{SC}^{LTE}\)	The duration of an LTE subframe.	\(\psi _i^{t+1}\left( m_i\right)\)	The probability distribution of GD i choosing an action at time t.
\(C_{i,k}^{LMCS}, C_{i,k}^{WMCS}\)	The coding rate of LTE BSs and WAPs.	\({\bar{z}}_t\)	The empirical distribution of joint actions \(\Sigma\) of all GDs until t.
A	Association matrix between GDs and base stations.	\(\Sigma ^*\)	Optimal joint strategies.

Back to article page

Search

Advanced search

Quick links