Task offloading decision making for IoV based on deep reinforcement learning

Su, Jing; Liu, Yuejun

doi:10.1038/s41598-025-22499-w

Download PDF

Article
Open access
Published: 04 November 2025

Task offloading decision making for IoV based on deep reinforcement learning

Jing Su¹ &
Yuejun Liu¹

Scientific Reports volume 15, Article number: 38586 (2025) Cite this article

1633 Accesses
Metrics details

Subjects

Abstract

With the popularization and development of in-vehicle applications, the limitations of computing resources, storage resources, and energy on vehicles have become increasingly prominent. To meet the growing demand for compute-intensive applications, cloud-edge collaborative computing has emerged as a key scheme. However, existing challenges still urgently need to be addressed: current task offloading schemes under cloud-edge collaboration are generally limited to the assumption of full offloading, failing to address the demand for partial offloading in practical scenarios such as segmented data processing in autonomous driving and this makes it difficult to determine the optimal offloading rate. Furthermore, most schemes fail to establish a priority model based on the resource requirements of tasks, struggling to balance efficient offloading and rational resource allocation.To address these issues, this paper designs a communication model, an energy consumption model, a cost model, a priority model, and a task offloading model. It also proposes a task offloading decision scheme based on deep reinforcement learning algorithms, enabling the selection of optimal offloading strategies in dynamic environments. Experimental results demonstrate that in comparison with existing schemes reported in the literature, the proposed scheme achieves significantly optimized performance. After the algorithm converges, Compared with DQN-based scheme and DDPG-based scheme, IDDPG-based scheme has reduced latency by 59.46% and 67.39%, reduced energy consumption by 18.37% and 11.76% respectively.

Introduction

With the continuous development of intelligent transportation, the application of artificial intelligence technology in the automotive field has gradually increased. The promotion of 5G technology enables vehicles to communicate and transmit data more quickly, and the Internet of Vehicles (IoV) connects vehicles to the Internet, enabling information sharing between vehicles and between vehicles and infrastructure¹. As the core communication technology of the Internet of Vehicles (IoV), Vehicle-to-Everything (V2X) further breaks the boundaries of information interaction. Here, “X” covers all key entities in the IoV environment, including Vehicles, Roadside Units, Infrastructure, and Pedestrians, corresponding to the formation of typical communication types such as Vehicle-to-Vehicle (V2V), Vehicle-to-Roadside (V2R), Vehicle-to-Infrastructure (V2I), and Vehicle-to-Pedestrian (V2P)^2,3. It also supports specific interactions between Roadside Units and Infrastructure. In this study, “Infrastructure” specifically refers to computing-capable environmental carriers such as edge computing, fog computing, and cloud computing; in addition, Infrastructure-to-Infrastructure (I2I) communication, as an important supplement to the V2X communication system, can provide key communication support for cross-node resource collaborative scheduling^4,5. The development of IoV technology has further promoted the surge in in-vehicle applications such as autonomous driving, in-vehicle entertainment, and collision warning. However, for resource-constrained vehicles, meeting the requirements of these computationally intensive tasks is a huge challenge ⁶.

The emergence of Multi-Access Edge Computing (MEC) provides a good scheme to the problem of insufficient resources in the Internet of Vehicles^7,8. MEC transfers data processing and storage functions from the centralized cloud to the edge of the network, allowing computing tasks to be executed on edge devices ⁹ instead of being processed centrally in the cloud. This approach can reduce latency and energy consumption, reduce cloud load, prevent network congestion caused by uploading large amounts of data to cloud servers, and help improve data privacy protection. Therefore, it is of great research value to study the problem of offloading computing tasks in the Internet of Vehicles based on MEC, design reasonable and effective computing task offloading and resource management schemes, and minimize the total system latency and energy consumption.

The main contributions of this article are as follows: (1) This article uses V2X communication technology and uses the Analytical Hierarchy Process (AHP) to prioritize tasks, and designs a task priority model and delay model and energy consumption model suitable for Internet of Vehicles scenarios. In this way, the urgency and importance of individual tasks can be more accurately assessed and prioritized. (2) This paper proposes a task offloading decision-making method based on the improved deep deterministic policy gradient (IDDPG) algorithm. This method can dynamically adjust and optimize task offloading decisions to achieve efficient task processing and resource allocation. (3) This paper reasonably allocates computing resources according to the designed task priority model. Through the priority model, it can ensure that high-priority tasks are processed first, reduce task delays and improve resource utilization. (4) This paper verifies the proposed method through a series of experiments. Experimental results show that the designed task priority model and IDDPG-based task offloading decision-making method have significant advantages in reducing task delays, reducing energy consumption and improving system resource utilization. These experimental results further prove the effectiveness and practicability of this method, and provide a new scheme for task offloading in the 5G Internet of Vehicles environment.

Related works

In recent years, more and more researchers have begun to pay attention to the problem of task offloading in the Internet of Vehicles. With the diversification and complexity of Internet of Vehicles applications, vehicles need to handle a large number of computing tasks, such as autonomous driving, in-vehicle entertainment, collision warning, real-time navigation, etc. In the face of these computing-intensive tasks, how to reasonably allocate computing resources and ensure the real-time and effectiveness of tasks has become a key issue in the Internet of Vehicles system ¹⁰. In the Internet of Vehicles environment, the importance and urgency of different tasks vary. For example, autonomous driving and collision warning tasks require higher real-time and reliability, while in-vehicle entertainment and information service tasks are relatively less important. In order to optimize resource utilization and system performance, it is necessary to prioritize tasks in the Internet of Vehicles and schedule them reasonably according to the urgency of the tasks and resource requirements. In addition, by caching content at the edge of the network, not only can the latency of obtaining content requested by users be shortened ¹¹, but the network burden and link resources can also be reduced. Reference¹² prioritizes safety messages based on the analytic hierarchy process, thereby establishing an optimal task offloading model with latency and energy loss related to priority. Reference¹³ defines an importance model and designs a task priority mechanism algorithm to achieve task priority division based on the different requirements of different tasks for latency, energy consumption and required resources during the offloading process.

With the continuous expansion of Internet of Vehicles tasks, vehicles are facing increasing computing task demands and resource constraints. How to effectively allocate task resources and reasonably allocate and schedule various computing tasks under limited computing resources, storage resources and communication bandwidth to maximize system performance and user experience has become an important issue that needs to be solved in Internet of Vehicles systems ¹⁴. Reference¹⁵ proposed an adaptive joint resource allocation scheme that decouples the multi-resource collaborative allocation strategy into three sub-strategies: uplink, computing and downlink resource allocation. By adjusting these sub-strategies in real time, the system performance in terms of capacity, service delay and energy consumption is optimized. Reference¹⁶ proposed a system optimization model for joint computing offloading and power allocation to minimize the average energy consumption of the system and task data processing delay. Reference¹⁷ proposed an Internet of Vehicles resource allocation strategy based on evolutionary strategy and Hungarian algorithm. This strategy constructs slices, optimizes power and bandwidth allocation, and achieves efficient channel matching through the Hungarian algorithm. Reference¹⁸ proposed a V2X collaborative caching and resource allocation mechanism, which uses the graph coloring model to allocate channels and achieves effective allocation of computing, caching and communication resources within the network.

Deep reinforcement learning combines the advantages of deep learning and reinforcement learning, and demonstrates powerful autonomous decision-making capabilities. Through continuous interaction and learning with the environment, DRL can gradually optimize strategies and adapt to complex and dynamic system environments. Applying DRL to the offloading of edge computing tasks in the Internet of Vehicles can intelligently decide on task offloading strategies, optimize resource allocation, and improve system performance in an uncertain and dynamically changing network environment. Reference¹⁹ proposed a computing offloading strategy based on reinforcement learning to achieve task offloading prediction and computing resource allocation. Based on the offloading node discovery mechanism, a Q-learning method was used to propose an intelligent node selection offloading algorithm to solve the optimization problem and achieve task offloading. Reference²⁰ used a long short-term memory network with an attention mechanism and a deep deterministic policy gradient algorithm to solve the problem. Reference [21] Considered factors such as task size and priority delay, and introduced a multi-agent MADDPG algorithm to determine the vehicle task offloading location. Reference²¹ proposed a computing offloading algorithm based on a deep deterministic policy gradient algorithm to optimize the problem for dynamic offloading environments and random task requests.

In summary, existing studies have extensively explored IoV task offloading, with deep reinforcement learning (DRL) widely used to optimize resource utilization and system performance. These works have made progress in task prioritization, joint resource allocation, and DRL-based dynamic adaptation, contributing to reduced latency, lower energy consumption, and improved resource efficiency. However, critical gaps remain: most methods assume full task offloading—either local execution or complete offloading to edge or cloud servers—and fail to address partial task offloading. This is a practical need for real-world IoV tasks, such as segmented data processing in autonomous driving, leaving the key issue of determining optimal offloading rates unresolved. Additionally, under limited computing, storage, and communication resources, existing approaches struggle to balance efficient offloading and rational resource allocation. To fill these gaps, this paper proposes two core schemes: (1) An AHP-based task priority model that evaluates tasks by computing demand, storage demand, and maximum tolerable latency to prioritize resource allocation, ensuring rational edge resource utilization; (2) An improved DDPG (IDDPG) algorithm integrated with LSTM and experience replay, which dynamically optimizes partial offloading rates to adapt to the dynamic IoV environment. Together, these innovations address the above limitations, reducing latency and energy consumption while improving resource utilization.

System model and problem definition

System model

This paper considers the problem of Internet of Vehicles task offloading in the scenario of Internet of Vehicles cloud-edge-end collaborative computing. First, the various components of the architecture and their functions will be described in detail, including the collaborative relationship between edge servers and vehicle terminal devices. Subsequently, the task priority model, delay model, and energy consumption model will be introduced respectively. The task offloading system model in the Internet of Vehicles environment is shown in Fig. 1.

This paper proposes a cloud-edge-end collaborative task offloading architecture. In this architecture, I2I communication serves as a critical inter-infrastructure connection: when an RSU-equipped MEC server is overloaded or fails, it can transmit task offloading requests and resource status data to adjacent MEC servers, cloud servers, or core infrastructure nodes via I2I communication. With this cross-node collaboration, dynamic resource scheduling is realized. For example, an overloaded edge server can not only offload some low-priority tasks to adjacent edge servers with lighter loads, but also respond to computationally intensive tasks that exceed the edge’s processing capacity by requesting cloud computing resources. Such I2I-enabled collaboration not only avoids task blocking caused by single-node resource constraints but also optimizes the overall resource utilization of the IoV system. In this architecture, the vehicle’s tasks can be offloaded to the edge server for processing or calculated locally. Assume that there are M vehicles in the system and each roadside unit (RSU) is equipped with a MEC server. RSU only provides task offloading computing services to vehicles within its coverage area.

At the beginning of each time slot, vehicle tasks arrive randomly, and then the vehicle sends a task offloading request to the RSU. The RSU decides whether to accept the task offloading request and the specific amount of task offloading based on the received task information. The random arrival process of vehicle tasks follows a Poisson process with an average arrival rate of $\lambda$. The Probability Mass Function PMF that describes the number of task arrivals k within a single time slot is defined in Eq. (1)²²:

$$P(k)=\frac{{{{(\lambda T)}^k}{e^{ - \lambda T}}}}{{k!}},k=0,1,2,...$$

(1)

where, T is the duration of a single time slot.Assume that the task characteristic information is represented as$Tas{k_i}=\{ {c_i},{o_i},t_{i}^{{max}}\}$where${c_i}$denotes the computational resources required for task i,${o_i}$denotes the storage resources required for taski,and$t_{i}^{{max}}$denotes the maximum tolerable latency. The state information of the edge server is represented as$\{ {f^{mec}},{o^{mec}}\}$, where${f^{mec}}$denotes the computational resources of the edge server and${o^{mec}}$denotes the storage resources of the edge server. Let the position of the RSU (Roadside Unit) be the coordinate origin, and the vehicle’s position be represented as$({x_i},{y_i})$, with the height of the RSU denoted ash. The distance between the vehicle and the RSU is shown as Eq. (2)

$${d_i}=\sqrt {x_{i}^{2}+y_{i}^{2}+{h^2}}$$

(2)

The communication between the vehicles and the edge server is based on wireless communication technology. The data transmission rate for task i is given by Eq. (3)²⁴.

$${R_i}=Blo{g_2}(1+\frac{{{P_2}G}}{{{\sigma ^2}}})$$

(3)

where,${P_2}$is the transmission power of the vehicle, ${\sigma ^2}$is the noise power, and σ²is the channel gain related to the distance between the vehicle and the RSU.

In the Internet of Vehicles environment, it is assumed that${\rm X}$represents the offloading decision vector of the task generated by the vehicle terminal device, where${x_i}$is the offloading decision variable for the computationally intensive task i, as shown in Eq. (4).

$${\rm X}=\{ {x_1},{x_2},...,{x_m}\}$$

(4)

For each task, the definition of the offloading decision variable${x_i}$is shown in Eq. (5).

$${x_i}=\left\{ \begin{gathered} 0{\text{ }}local \hfill \\ 1{\text{ }}offload \hfill \\ {x_i} \in (0,1){\text{ }}partial{\text{ }} \hfill \\ \end{gathered} \right.$$

(5)

If${x_i}=0$, the task is completely computed locally in the vehicle; if${x_i}=1$, task is completely offloaded to the edge server for execution; if${x_i} \in (0,1)$, task is partially offloaded to the edge server for execution and partially executed locally in the vehicle.

Priority modeling

Vehicle tasks differ in terms of computing resources, storage resources, and maximum tolerable latency. To prioritize these tasks, this paper employs the Analytic Hierarchy Process (AHP), considering three primary factors: the computing resources required by the task, the storage resources required, and the maximum tolerable latency. Based on the significance of these factors, the priorities are as follows: maximum tolerable latency, computing resource demand, and storage resource demand. First, construct a hierarchical analysis matrix as shown in⁶.

$${a_{ij}}=\left\{ \begin{gathered} 1{\text{ }}i=j \hfill \\ \frac{1}{{{a_{ji}}}}=n{\text{ }}i \ne j \hfill \\ \end{gathered} \right.$$

(6)

Where$n=\{ 1,2,...,9\}$, the weight matrix corresponding to the three factors of M vehicles is shown in⁷.

$$\Delta =\left( {\begin{array}{*{20}{c}} {{u_{11}}}&{{u_{12}}}&{{u_{13}}} \\ \begin{gathered} {u_{21}} \hfill \\ \vdots \hfill \\ \end{gathered} &\begin{gathered} {u_{22}} \hfill \\ \vdots \hfill \\ \end{gathered} &\begin{gathered} {u_{23}} \hfill \\ \vdots \hfill \\ \end{gathered} \\ {{u_{M1}}}&{{u_{M2}}}&{{u_{M3}}} \end{array}} \right)$$

(7)

${u_{rk}}=\frac{{\sum\limits_{{j=1}}^{n} {{a_{rj}}} }}{{\sum\limits_{{i=1}}^{n} {\sum\limits_{{j=1}}^{n} {{a_{ij}}} } }}$

Where, k represents the number of influencing factors considered in the decision-making process, r represents the task generated by the vehicle, and according to the hierarchical analysis matrix, the eigenvalue corresponding to the weight is calculated as shown in⁸.

$${\lambda _k}=\frac{1}{k}\sum\limits_{{i=1}}^{k} {\frac{{\sum\limits_{{i=1}}^{k} {{a_{rj}}} }}{{\sum\limits_{{j=1}}^{k} {{a_{ij}}{u_j}} }}} {\text{ }}k=1,2,3$$

(8)

Then the priority of the task is shown in⁹.

$${\rm P}=\Delta \times \Lambda$$

(9)

According to the priority of the task, the computing resources are allocated to the task by the edge server as shown in¹⁰.

$$f_{i}^{{mec}}=\frac{{{p_i}}}{{\sum\limits_{{j=1}}^{m} {{p_j}} }}{f^{mec}}$$

(10)

Time consumption and energy consumption models

When the task is calculated locally in the vehicle, assuming that the computing power of the local vehicle is$f_{i}^{{local}}$, the local computing delay$T_{i}^{{local}}$of task i is shown in¹¹²⁵.

$$T_{i}^{{local}}=\frac{{(1 - {x_i}){c_i}}}{{f_{i}^{{local}}}}$$

(11)

The energy consumption of local computing of task is shown in¹²^[26].

$$E_{i}^{{local}}=T_{i}^{{local}}{P_1}$$

(12)

When the task is offloaded to the edge server for calculation, the delay includes the transmission delay and the computing delay of the edge server. The transmission delay of task is shown in¹³.

$$T_{i}^{{tran}}=\frac{{{o_i}}}{{{R_i}}}$$

(13)

Assuming that the computing power allocated to the edge server for task i is$f_{i}^{{mec}}$, the computing delay of the task offloaded to the edge server is shown in¹⁴.

$$T_{i}^{{mec}}=\frac{{{x_i}{c_i}}}{{f_{i}^{{mec}}}}$$

(14)

The total delay of the task offloaded to the edge server is shown in¹⁵.

$$T_{i}^{{off}}=T_{i}^{{mec}}+T_{i}^{{tran}}$$

(15)

The energy consumption of the task offloaded to the edge server is equal to the product of the transmission delay of the task and the vehicle transmission power. The total energy consumption of the task offloaded to the edge server is shown in¹⁶.

$$E_{i}^{{off}}=T_{i}^{{mec}}{P_2}$$

(16)

Considering the task delay and energy consumption, the total delay${T_i}$ of task i is the maximum value of the local execution delay and the delay of offloading to the edge server, as shown in formula¹⁷.

$${T_i}=max\{ T_{i}^{{local}},T_{i}^{{off}}\}$$

(17)

The total energy consumption of the task is the sum of the energy consumption of the task when it is executed locally and the energy consumption of offloading to the edge server, as shown in¹⁸.

$${E_i}=E_{i}^{{local}}+E_{i}^{{off}}$$

(18)

The total cost function of the task is defined as the weighted sum of the total delay and the total energy consumption, where$\alpha$and$\beta$represent the weight factors of the task delay and energy consumption respectively$0<\alpha ,\beta <1$and$\alpha +\beta =1$. The cost function of task is shown in¹⁹.

$${\Phi _i}=\alpha {T_i}+\beta {E_i}$$

(19)

Definition of the problem

Considering that the vehicle task can be calculated locally or offloaded to the edge server, the goal of the problem is to minimize the total cost function, and the problem modeling can be expressed as:

$$min{\text{ }}\sum\limits_{{i \in M}} {{\Phi _i}}$$

(20)

$$s.t.{\text{ }}C1:\sum\limits_{{i \in M}} {f_{i}^{{mec}} \leqslant {f^{mec}}}$$

(21)

$$C2:f_{i}^{{mec}} \leqslant {f^{mec}}$$

(22)

$$C3:\sum\limits_{{i \in M}} {{o_i}} \leqslant {o^{mec}}$$

(23)

$$C4:{o_i} \leqslant {o^{mec}}$$

(24)

$$C5:{T_i} \leqslant T_{i}^{{max}}$$

(25)

$$C6:0 \leqslant {x_i} \leqslant 1$$

(26)

Among them, constraints C1 and C2 ensure that the computing resources allocated to a single task and the computing resources of all tasks will not exceed the total computing resources of the edge server, constraints C3 and C4 ensure that the total storage resources of the task will not exceed the storage resources of the edge server, constraint C5 ensures that the processing delay of the task cannot exceed the maximum delay that the task can tolerate, and constraint C6 specifies the value range of the task offloading rate.

Deep reinforcement Learning-based decision scheme for task offloading

This paper adopts deep reinforcement learning algorithm to solve the problem of task offloading in the Internet of Vehicles. According to the described problem model, an improved deep deterministic policy gradient algorithm is proposed to offload vehicle tasks with minimal delay and energy consumption. The algorithm combines value-based and policy-based methods and can effectively learn and select optimal actions in continuous state and action space.

DRL

State space: According to the optimization problem mentioned above, this paper defines the state space of several key parameters of the entire system. The state includes the location information of the vehicle, the task computing resources, the required storage resources, and the edge server computing resources^[27]. The state of the system in the time slot is shown in²⁷.

$$S(t)=\{ d(t),c(t),o(t),{f^{mec}}(t)\}$$

(27)

Action space: The system selects an action based on the state of the current environment. The action space in this paper is the offloading rate of the vehicle task, as shown in²⁸^[28].

$$A(t)=\{ {x_1}(t),{x_2}(t),...,{x_m}(t)\}$$

(28)

Reward: The immediate reward brought by each state and action. Our goal is to minimize delay and energy consumption, so the reward function can be designed as a negative total cost function value, that is, the smaller the delay and energy consumption, the higher the reward.

$$r(t)= - \Phi$$

(29)

Under the framework of deep reinforcement learning, the vehicle obtains immediate rewards by continuously interacting with the environment, and accumulates these rewards to form a return. This cumulative return not only takes into account the rewards of the current time period, but also reflects the rewards that may be obtained in the future, thereby providing long-term optimization. The long-term return of the system is shown in (30), where $\gamma$represents the discount factor^[29].

$$R(t)=\sum\limits_{{l=t}}^{T} {{\gamma ^{l - t}}r(t)}$$

(30)

Improved DDPG algorithm

This paper proposes an IDDPG algorithm to solve the task offloading problem of the Internet of Vehicles. The IDDPG algorithm consists of an Actor-Critic network. Specifically, the Actor part accepts the current state as input through a neural network and outputs a certain action. This action represents the best behavior that the agent should take in the current state. The Critic part evaluates and criticizes the current strategy. It processes the rewards obtained from the environment and calculates a value function (Q value) in combination with the current state and action to evaluate the pros and cons of the current strategy. The feedback from the Critic part is used to guide the strategy update of the Actor part so that it can continuously optimize the decision.

Due to the complexity and dynamic changes of the Internet of Vehicles environment, vehicles often cannot obtain complete environmental information when performing tasks. This partial observability will affect the performance of deep reinforcement learning algorithms. To this end, the long short-term memory network (LSTM) is introduced into DDPG to improve the effectiveness of decision-making. LSTM can capture and memorize historical information, enabling the Actor-Critic network to make more accurate decisions in complex dynamic environments, optimize task offloading and resource allocation.

In traditional reinforcement learning, the agent immediately updates the strategy using the current state transfer after each step of interaction with the environment. This method has problems such as strong data correlation and large variance, which leads to unstable training process. Introducing the experience replay cache mechanism into the DDPG algorithm can solve the above problems. By storing and randomly sampling state transition data, data correlation can be broken, data utilization can be improved, and the training process can be smoothed, thereby significantly improving the stability and efficiency of training. The pseudo code of the IDDPG algorithm is shown in Algorithm 1.

The IDDPG algorithm is centered on the Actor-Critic framework, integrating LSTM network and experience replay mechanism to address the dynamic characteristics of IoV, thereby achieving adaptive optimization of task offloading decisions. The core design logic of the algorithm, the corresponding relationship between key pseudocode steps and the significance of each operation are as follows: The core advantage of this algorithm lies in solving the problems of “decision bias” and “unstable training” of traditional reinforcement learning in IoV scenarios through dual-network collaboration and temporal information processing. The Actor network is responsible for generating continuous offloading rate actions defined in Eq. (28), the Critic network evaluates the quality of actions through Q-values, and the LSTM and experience replay mechanisms respectively ensure adaptability to dynamic environments and training stability. These three components work together to achieve the goal of “minimizing the total system cost (Eq. (19))”.

From the perspective of pseudocode steps, the parameter initialization phase (Steps 1–2) lays the foundation for algorithm training: Initializing the parameters of Actor/Critic networks and target networks can avoid the bias of initial policies; the settings of experience replay buffer size D and batch size N need to be combined with the experimental parameter in Table 2, which not only ensures that the sampled sample size in Step 10 meets the requirements of gradient descent, but also balances efficiency and sample diversity through batch training to reduce the risk of overfitting. The environment interaction and experience storage phase (Steps 3–8) is deeply integrated with actual IoV scenarios: Each training episode covers a multi-time-slot task processing process, which is synchronized with the “vehicle tasks arriving randomly by time slot” mechanism in "System model"; the agent (vehicle) selects an offloading rate action based on the system state (Eq. (27)) in each time slot, converts “latency-energy consumption” into a reward value through Eq. (29) after execution, and then stores the state transition data into the experience buffer. This accumulates diverse samples covering “different vehicle positions and different edge loads”, providing scenario generalization support for subsequent training.

The network training and update phase (Steps 9–15) is the core of algorithm optimization, and each step is closely correlated with the mathematical models in the previous sections: When the number of samples in the experience buffer meets the standard, the LSTM processing in Step 11 can mine the temporal characteristics of the state (Eq. (27)); the Q-value calculation in Step 12 needs to substitute the total cost function (Eq. (19)), converting the quantitative results of “latency-energy consumption” into optimizable value indicators to ensure that the Critic network can accurately feedback the quality of decisions; the network updates in Steps 13–14 aim to “minimize Q-value error” and “maximize the long-term return (Eq. (30))”, respectively. The gradient direction of the Actor network is guided by the Q-value of the Critic network, ensuring that the policy update always converges to the “low-cost” direction; the soft update of the target network in Step 15 controls the update amplitude through the coefficient, avoiding the interference of current network fluctuations on the target Q-value calculation and ensuring the stability of the training process.

Experimentation and analysis

Simulation experiment setting

In order to verify the effectiveness of the scheme proposed in this paper, this paper designed a comparative experiment to compare the scheme in this paper with the scheme based on the DDPG algorithm and the scheme based on the DQN algorithm. Through the comparative experiment, the performance of the scheme in this paper is evaluated under different indicators, including latency, energy consumption, and total cost. The specific design of the experimental simulation parameters is shown in Table 1, and the algorithm simulation parameters are shown in Table 2.

Table 1 Simulation parameters.

Full size table

Table 2 Algorithm parameters.

Full size table

Experimental results analysis

The impact of learning rate on IDDPG algorithm

The choice of learning rate is crucial for optimizing the energy consumption of the algorithm in the Internet of Vehicles environment. A suitable learning rate can speed up the convergence of the algorithm and improve the adaptability of the model, while an inappropriate learning rate may lead to poor algorithm performance. This paper sets the learning rates to 1e-5, 1e-6, 1e-7, and 1e-8 respectively. The reward changes of the IDDPG algorithm under different learning rates are shown in Fig. 2.

As can be seen from Fig. 2, when the learning rate is 1e-8, the reward value has almost no significant increase, indicating that the learning progress is very slow; when the learning rate is 1e-7, the reward value rises rapidly and remains constant after about 300 episodes. At a higher level, the performance is the best; the learning rate of 1e-6 also performs well, but is less stable; while the learning rate of 1e-5 has fast convergence speed, but large fluctuations. Therefore, a moderate learning rate can better optimize IoV task offloading decisions to improve computing efficiency and resource utilization.

Comparison of reward values for three schemes

This article will conduct comparative experiments based on the IDDPG algorithm scheme, the DQN algorithm scheme, and the DDPG algorithm scheme. The changes in reward values of the three algorithm schemes under different training rounds are shown in Fig. 3.

The higher the reward value, the smaller the delay and energy consumption, and since the reward value is a negative number, the smaller the absolute value, the better the performance. It can be clearly seen from Fig. 3 that the absolute value of the reward obtained by the IDDPG algorithm during the training process is the smallest, which shows that among the three algorithm schemes, the IDDPG algorithm scheme performs best in optimizing the delay and energy consumption when offloading vehicle tasks. Specifically, the IDDPG algorithm scheme showed significant performance improvement after 300 rounds, and the reward value increased significantly and eventually stabilized, indicating that it effectively reduced the system delay and energy consumption. In contrast, although the reward values of the DDPG and DQN algorithm schemes improved during the training process, they were always lower than the IDDPG algorithm scheme, indicating that their task offloading optimization effects were not as good as the IDDPG scheme.

Comparison of latency for three schemes

In the Internet of Vehicles, latency is one of the key indicators for evaluating system performance. The local computing power of the vehicle is limited, and it may take a long time to process complex tasks, resulting in high latency. Edge servers have powerful computing capabilities and can process tasks faster, thereby reducing task processing delays. The latency comparison of the three algorithm schemes of IDDPG, DDPG and DQN in different training rounds is shown in Fig. 4. In the early stages of training, the delays of the three algorithms are relatively high and stable. As training progresses, the delay of the IDDPG algorithm scheme is significantly reduced after about 300 rounds, and eventually stabilizes and remains at a low level. The IDDPG algorithm scheme shows significant delay optimization effects.

In contrast, the DDPG algorithm scheme has reduced latency during the training process, but it is not as significant as IDDPG. Although the latency of DDPG decreased after 300 rounds, it subsequently fluctuated, and the final latency level was still higher than that of the IDDPG scheme. The delay of the DQN algorithm scheme changes little during the entire training process and basically remains at a high level, indicating that it has the worst effect in delay optimization. The traditional DQN algorithm lacks effective processing capabilities for continuous states and action spaces, resulting in poor latency optimization. Although the DDPG algorithm can reduce latency in some cases, its overall performance is slightly inferior because its adaptability to complex environments is not as good as IDDPG.After the algorithm converges, the average latency of the three algorithm schemes of IDDPG, DDPG and DQN is 1500,4600 and 3700, respectively. Compared with DQN-based scheme and DDPG-based scheme, IDDPG-based scheme has reduced latency by 59.46% and 67.39% respectively.

Comparison of energy consumption for three schemes

The changes in energy consumption of the three algorithm schemes under different training rounds are shown in Fig. 5.

As can be seen from Fig. 5, the IDDPG algorithm scheme significantly reduces energy consumption in the later stages of training, which shows that the IDDPG algorithm has higher efficiency and lower energy consumption in optimizing vehicle task offloading. Compared with the DQN algorithm scheme, the energy consumption performance of the DDPG algorithm scheme is relatively stable during the training process, but is still higher than the IDDPG algorithm, indicating that the DDPG algorithm has certain advantages in task offloading optimization, but is not as good as the IDDPG algorithm. The energy consumption of the DQN algorithm is at a high level throughout the training process. The DQN algorithm fails to fully optimize the task offloading strategy, resulting in high energy consumption. After the algorithm converges, the average energy consumption of the three algorithm schemes of IDDPG, DDPG and DQN is 1200,1360 and 1470, respectively. Compared with DQN-based scheme and DDPG-based scheme, IDDPG-based scheme has reduced energy consumption by 18.37% and 11.76% respectively.

Comparison of latency and energy consumption of different tasks

This section conducts a detailed comparative experiment by studying the cost, latency, and energy consumption performance of the three algorithm schemes under different task numbers. The total cost of the three algorithm schemes under different task numbers is shown in Fig. 6.

As can be seen from Fig. 6, as the number of tasks increases, the costs of the three algorithms show an upward trend. However, the cost of the IDDPG algorithm scheme is lower than that of the DQN and DDPG algorithms under all the number of tasks. Especially when the number of tasks is large, the advantages of IDDPG are more obvious. Specifically, when the number of tasks reaches 250, the cost of IDDPG is significantly lower than that of DQN and DDPG, indicating that under high load conditions, IDDPG can allocate resources more effectively and reduce costs. The delay comparison of the three algorithm schemes under different number of tasks is shown in Fig. 7.

As can be seen from Fig. 7, as the number of tasks increases, the delays of the three algorithms increase to varying degrees. However, the latency of the IDDPG algorithm scheme is lower than that of the DQN and DDPG algorithm schemes under all the number of tasks. When the number of tasks is large, the latency of IDDPG is significantly lower than the other two algorithms. The energy consumption comparison of the three algorithm schemes under different number of tasks is shown in Fig. 8.

As can be seen from Fig. 8, as the number of tasks increases, the energy consumption of the three algorithms shows an upward trend. The energy consumption of the IDDPG algorithm scheme is significantly lower than that of the DQN and DDPG algorithm schemes in most number of tasks. This shows the superiority of the IDDPG algorithm in Internet of Vehicles resource management. It can not only effectively allocate resources, but also significantly reduce the energy consumption of the system and improve the overall energy efficiency performance.

Conclusion

This paper studies the task offloading problem in the Internet of Vehicles environment and proposes an improved deep deterministic policy gradient algorithm. The algorithm combines a long short-term memory network (LSTM) and considers task priorities to optimize resource allocation. Through comparative experiments, it is found that the improved DDPG algorithm performs superiorly in task offloading. Experimental results show that the algorithm is superior to the traditional DQN and original DDPG algorithms in terms of latency, resource utilization and energy consumption, especially under high load conditions. In summary, the method in this paper has significant advantages in the efficiency and resource utilization of offloading tasks in the Internet of Vehicles, proving its potential and value in practical applications. The next step of research will focus on the adaptability of heterogeneous network environments, the impact of vehicle dynamic movement and changes in network status on task offloading and resource allocation, to further improve the versatility and stability of the algorithm in various practical application scenarios.

Data availability

Sequence data that support the findings of this study is available by email aylyj163@163.com .

References

Zhao, L. et al. Stackelberg-Game-Based Dependency-Aware task offloading and resource pricing in vehicular edge Networks. IEEE Internet Things J. 11 (19), 32337–32349 (2024).
Article Google Scholar
Awada, U. et al. Resource-aware multi-task offloading and dependency-aware scheduling for integrated edge-enabled IoV. J. Syst. Architect. 141, 102923 (2023).
Article Google Scholar
Shakarami, A., Ghobaei-Arani, M. & Shahidinejad, A. A survey on the computation offloading approaches in mobile edge computing: A machine learning-based perspective. Comput. Netw. 182, 107496 (2020).
Article Google Scholar
Shakarami, A. et al. Resource provisioning in edge/fog computing: A comprehensive and systematic Review. J. Syst. Architect. 122, 102362 (2022).
Article Google Scholar
Shakarami, A. et al. A survey on the computation offloading approaches in mobile Edge/Cloud computing environment: A Stochastic-based Perspective. J. Grid Comput. 18, 639–671 (2020).
Article Google Scholar
Lu, L. Y. et al. A3C–based load–balancing scheme for computation offloading in SDN–enabled vehicular edge computing networks. Peer-to-Peer Netw. Appl. 16 (2), 1242–1256 (2023).
Article Google Scholar
Zhang, J. et al. Task offloading in vehicular edge computing networks: A load-balancing scheme. IEEE Trans. Veh. Technol. 69 (2), 2092–2104 (2019).
Article ADS Google Scholar
Zhang, S. et al. Low-latency and fresh content provision in information-centric vehicular networks. IEEE Trans. Mob. Comput. 34 (1), 122–136 (2020).
Google Scholar
Yang, J. S. et al. DDPG-based computation offloading and resource allocation in MEC-enabled internet of vehicles. Journal Chongqing Univ. Posts Telecommunications(Natural Sci. Edition). 36 (02), 259–267 (2024).
Google Scholar
Raju, M. R. & Mothku, S. K. Somesula.MITS: Mobility-Aware intelligent task scheduling in vehicular fog Networks. IEEE Trans. Veh. Technol. 73 (3), 3079–3093 (2024).
Article ADS Google Scholar
Shu, C. et al. Multi-user offloading for edge computing networks: A dependency-aware and latency-optimal approach. IEEE Internet Things J. 7 (3), 1678–1689 (2019).
Article Google Scholar
Zhao, H. et al. Research on Content-aware classification offloading algorithm based on mobile edge calculation in the internet of vehicles. J. Electron. Inform. Technol. 42 (01), 20–27 (2020).
Google Scholar
Zhang, H. et al. An offloading mechanism based on software defined network and mobile edge computing in vehicular networks. J. Electron. Inform. Technol. 42 (03), 645–652 (2020).
Article ADS Google Scholar
Zeng, F., Zhang, Z. & Chen, Z. Computation offloading and resource allocation strategy based on deep reinforcement learning. J. Commun. 44 (07), 124–135 (2023).
Google Scholar
Zhao, J. et al. Adaptive resource allocation for mobile edge computing in internet of vehicles: A deep reinforcement learning Approach. IEEE Trans. Veh. Technol. 73 (4), 5834–5848 (2024).
Article ADS Google Scholar
Wang, L., Liang, H. & Zhao, D. Deep-Reinforcement-Learning-Based computation offloading and power allocation within dynamic platoon Network. IEEE Internet Things J. 11 (6), 10500–10512 (2024).
Article Google Scholar
Zhu, G., Qi, N. & Guo, Z. Resource allocation strategy based on evolutionary strategy algorithm and Hungarian algorithm in vehicle networking . J. Xi’an Univ. Posts Telecommun.. 11 (6), 120–126 (2024).
Google Scholar
Li, F., Zhang, H. & Wang, Z. V2X collaborative caching and resource allocation in MEC-based IoV . J. Commun. 42 (02), 26–36 (2021).
CAS Google Scholar
Zhang, J. et al. A reinforcement learning-based offloading strategy for internet of vehicles edge computing . J. Chongqing Univ. Posts Telecommunications. 34 (03), 525–534 (2022).
Google Scholar
Ming, Y. et al. Edge computing task offloading optimization for a UAV-Assisted internet of vehicles via deep reinforcement Learning. IEEE Trans. Veh. Technol. 73 (4), 5647–5658 (2024).
Article ADS Google Scholar
Liu, J. et al. QoS-aware task offloading and resource allocation optimization in vehicular edge computing networks via MADDPG. Computer Networks, 242, https://doi.org/10.1016/j.comnet. (2024).
Li, H. et al. Deep deterministic policy Gradient-Based algorithm for computation offloading in IoV. IEEE Trans. Intell. Transp. Syst. 25 (3), 2522–2533 (2024).
Article Google Scholar
Wang, X. et al. Generative AI-Based Dependency-Aware task offloading and resource allocation for UAV-Assisted IoV. IEEE Open. J. Commun. Soc. 6, 3932–3949 (2025).
Article Google Scholar
Shakarami, A., Shahidinejad, A. & Ghobaei-Arani, M. A review on the computation offloading approaches in mobile edge computing: A game-theoretic perspective. Software: Pract. Experience. 50 (9), 1719–1759 (2020).
Google Scholar
Li, Y. et al. Mobility and dependency-aware task offloading for intelligent assisted driving in vehicular edge computing networks. Veh. Commun. 45, 100720 (2024).
Google Scholar
Faraji-Mehmandar, M., Ghobaei-Arani, M. & Shakarami, A. A cost-aware IoT application deployment approach in fog computing. Cluster Comput. 28, 199 (2025).
Article Google Scholar
Wu, J. et al. Dependency-Aware task offloading strategy via heterogeneous graph neural network and deep reinforcement Learning. IEEE Internet Things J. 12 (13), 22915–22933 (2025).
Article Google Scholar
Shakarami, A., Shahidinejad, A. & Ghobaei-Arani, M. An autonomous computation offloading strategy in mobile edge computing: A deep learning-based hybrid approach. J. Netw. Comput. Appl. 178, 102974 (2021).
Article Google Scholar
Shakarami, A. et al. A survey on data replication schemes in cloud Computing. Cluster Comput. 24, 2545–2579 (2021).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Software School of Anyang Normal University, Anyang, 455002, Henan, China
Jing Su & Yuejun Liu

Authors

Jing Su
View author publications
Search author on:PubMed Google Scholar
Yuejun Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

SU jing and LIU Yuejun wrote the main manuscript text and all figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yuejun Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Su, J., Liu, Y. Task offloading decision making for IoV based on deep reinforcement learning. Sci Rep 15, 38586 (2025). https://doi.org/10.1038/s41598-025-22499-w

Download citation

Received: 18 April 2025
Accepted: 29 September 2025
Published: 04 November 2025
Version of record: 04 November 2025
DOI: https://doi.org/10.1038/s41598-025-22499-w