Abstract
Water injection is a key technology to maintain oil reservoir pressure and guarantee high and stable oilfield production. In this paper, we propose an intelligent control method based on reinforcement learning for layered water injection columns, which achieves accurate control by constructing a layered water injection model and algorithm. A water injection column flow model incorporating the structure of the water injection device, column pressure loss, and flow characteristics is established, and the intelligent control system is designed by combining it with the SAC algorithm. The training environment and simulation platform are built using PyTorch, and the performance of SAC, PPO, and DDPG algorithms are compared and verified in different water injection sections. The experimental results show that the average regulation error of the SAC algorithm under the 5% error threshold is only 5%, significantly better than PPO’s 37%; the qualified rate of regulation reaches 81%, which is much higher than PPO’s 45% and DDPG’s 60%; the average number of adjustment steps is 42% fewer than PPO’s and 28% fewer than DDPG’s; SAC algorithm exhibits more substantial stability and adaptability under complex working conditions, and its regulation accuracy, qualified rate and response speed are better than that of PPO and DDPG. This study provides theoretical support for intelligent layered water injection technology, which has significant reference value for improving the development efficiency of oilfields. Our code will be available online at: https://github.com/HJZ-hub/SACIntelligentLayeredWaterInjection.
Similar content being viewed by others
Introduction
A decrease in reservoir pressure is one of the main causes of production decline during oilfield development. In order to maintain reservoir pressure and improve oilfield recovery, waterflooding has been the main development method in Chinese oilfields, accounting for 74% of total production1,2,3. With the deepening of water-driven development, China’s onshore oilfields have one after another entered the stage of High-water-cut stage, with highly dispersed residual oil and great difficulty in controlling water4. Layered water injection technology is an effective method to improve the efficiency of oilfield development. By setting up multiple water injection segments in oil wells, differentiated water injection can be carried out in different layers to achieve optimal water injection efficiency and reservoir pressure balance, and to increase the degree of balanced utilization of each oil layer5. With the promotion and application of the fourth-generation layered water injection device, the layered water injection technology is digitalized. It can monitor the status of injection wells in real time and regulate the flow rate of each layered segment6water injection efficiency and water injection conformance have been significantly improved. However, flow scheduling in layered water injection systems is a complex optimization problem. The downhole environment is complex, and the dynamic adjustment of water nozzle openings in multi-layer sections involves the influence of multiple factors, such as wellhead pressure, formation pressure, and the structure of the water injection device. Traditional flow scheduling methods usually use manual or empirical formulas that lack adaptivity and intelligence. The inability to effectively cope with reservoir variations and water injection inhomogeneities leads to unsatisfactory flow scheduling. With the deepening of intelligent oilfield construction and the increasing number of water injection wells, the original way of water injection well management can no longer meet the needs of oilfield production, so solving the problem of water injection devices in the incomplete system of wells, the difficulty of achieving the goal of multi-layer segments of flow rate dispensing, is an important topic for oilfield production.
In recent years, many scholars have conducted in-depth studies on the flow dynamics of water injection devices, An Runze et al.7 used the fluid-solid coupling method to reveal the dynamic characteristics of the valve spool; Zhao Guangyuan et al.8 established a water injection volume calculation model to improve the field applicability ; Zhou Lizhi et al.9 proposed an injection algorithm based on the physical property parameters; Jiang Xiufang et al.10 verified the power-law relationship between flow rate and differential pressure. These studies provide an important reference for the study of this paper, but they exhibit significant limitations in dealing with the complex nonlinear problems of the actual water injection process. Traditional methods are inadequate in dealing with the strong nonlinearity caused by time-varying downhole pressure, fail to establish accurate mathematical models of complex downhole environments, and cannot adapt to uncertain, dynamically changing reservoir conditions In contrast, deep reinforcement learning (DRL) shows unique advantages. it can directly learn the optimal control strategy from the system interaction without relying on accurate physical models and enable online adaptive adjustment of the dynamic environment through the inherent nonlinear processing ability of deep neural networks. This ability to overcome the limitations of traditional methods and solve the problem of nonlinear adaptive control of water injection system is the main driving force of our research.
With the rapid development of artificial intelligence and machine learning technology, deep learning algorithms are increasingly being used in the field of intelligent control. Among them, reinforcement learning is a method employed to study intelligent control systems by continuously interacting with the environment to learn, enabling the system can adjust its behavior according to the feedback of the environment, and finally achieve the optimal control strategy11. In recent years, deep reinforcement learning has gained significant attention in the field of robotics12where the agent model serves as the controller of the robot and the robot itself along with its surroundings constitute the deep reinforcement learning environment. A robotic agent operates in a randomized environment by sequentially selecting actions over a series of time steps. This framework allows robotic agents to learn skills and solve tasks autonomously through trial and error13. This approach shows potential for applications in disciplines such as multi-intelligent body collaboration14,15autonomous vehicles16,17,18and robot control19,20,21.
Duy Quang Tran et al.17 proposed a deep reinforcement learning model that integrates the Flow framework with the PPO algorithm and demonstrated that fully automated driving significantly improves the efficiency compared with purely manual driving while effectively suppressing the start-stop wave phenomenon at unsignalized intersections. Chengyi Zhao et al.19 proposed an inverse kinematics solving method for a robotic arm based on the MAPPO algorithm, which significantly improves the generalization and computational efficiency compared with the traditional algorithms, supports the real-time unique solution generation, and enables path planning and intelligent obstacle avoidance in dynamic environments. Jiao Huanyan et al.22 proposed a reinforcement learning-based air conditioning control strategy for metro stations, which uses neural networks to simulate the environment and achieves temperature control and energy saving by training the intelligences, and simulation experiments show that the strategy can effectively reduce energy consumption .
Compared with other reinforcement learning methods, the Soft Actor-Critic (SAC) algorithm has remarkable advantages in layered water injection control23. Its entropy maximization mechanism enables more efficient exploration in the continuous action space, making it suitable for fine-tuned water injection24. The twin Q-network solves the over-estimation problem of traditional value functions. The automatic temperature adjustment mechanism can dynamically balance exploration and exploitation. These advantages make the SAC algorithm perform better in terms of stability and control when dealing with water injection control problems that feature sparse rewards and high-dimensional action spaces.
In order to solve the flow scheduling problem in layered water injection systems, this paper proposes a reinforcement learning-based adaptive control algorithm as shown in Fig. 1. The proposed method models the flow scheduling problem as a Markov decision process, uses a deep neural network as a value function approximator, and learns the optimal water nozzle opening strategy by interacting with the environment.
Adaptive control of water injection device, in which the water injection device was modeled using SolidWorks 2024 (Dassault Systèmes, https://www.solidworks.com); the Deep RL Model was created with Visio 2019 (Microsoft Corporation, https://www.microsoft.com/visio); and the water injection well &others were created in PowerPoint 2019 (Microsoft Corporation, https://www.microsoft.com/zh-cn/microsoft-365/powerpoint).
Hydrodynamic model of layered water injection string
Structure and working principle of water injection device
Due to differences in permeability, pressure, and other dispensing conditions of reservoirs in injection wells as well as varying dispensing volumes in each reservoir.
In order to realize layered water injection, a packer is used to separate the formation into several layer segments, and a water injection device is placed in each layer segment to ensure the smooth advancement of the water drive front by adjusting the injection volume for each segment. The structure of the layered water injection well is shown in Fig. 2.
Layered water injection is an effective method for improving the structure of injection and extraction during oilfield exploitation and increasing the oilfield recovery rate. This technique is widely used in the development of high-water-cut oilfields. The water injection device is located between two packers, and the injection volume for each layer segment is adjusted by changing the water nozzle opening. The injection well parameters are shown in Table 1.
Water injection string pressure loss
Considering the water injection process as an incompressible flow, when the water nozzle is no longer changed, the velocity and pressure of the fluid will not change with time. According to Bernoulli’s principle, the total energy of the fluid per unit mass is kept constant in any two streamlines. Thus, the hydraulic equilibrium relationship between the wellhead and each layer segment’s nozzle outlet can be derived from Bernoulli’s Equation25:
.
Where,\({p_0}\)is the wellhead pressure, Pa; \(\rho\)is the density of the injected fluid, kg/m³; \({\text{g}}\)is the gravitational acceleration, m/s2; \({p_{ai}}\)is the stratigraphic pressure of the layer, Pa; \({v_i}\)is the average flow velocity at the outlet of the nozzle of the i-th layer, m/s; hiis the stratigraphic depth from the wellhead to layer i, m; \({h_{wi}}\)is the head loss from the wellhead to layer i, m. The head loss consists of frictional and localized losses and expressed as:
Where, hfiis the frictional head loss from the wellhead to layer \(\:i\),m; \({h_{ji}}\) is the local loss at the exit of layer i,m.
The local pressure loss at the water injection unit’s nozzle outlet, caused by pipe diameter variation26,27:
.
Where, \({\zeta _i}\)is the local loss coefficient for the i-th layer of spigot; \({v_i}\)is the average flow velocity at the outlet of the nozzle of the i-th layer, m/s。.
The along-travel resistance loss is the energy loss due to relative motion within the liquid and viscous friction between the liquid and the pipe wall28expressed as:
Where,\({\lambda _i}\) is the loss coefficient along the i-th section, and \({d_0}\) is the diameter of the water injection column, m; \({v_{oi}}\)is the average velocity in the water injection pipe column of the i-th layer, m/s. The flow state of the fluid, which has a large impact on the along-stream loss, is usually described by the Reynolds number of the flow condition of the fluid, and the expression of the Reynolds number of the i-th section is:
Where, \(\rho\) is the fluid density, kg/m2; \({v_{oi}}\) is the average velocity in the i-th layer of the injection column, m/s; \(\mu\) the hydrodynamic viscosity, \({\text{Pa}} \cdot {\text{s}}\); \(\nu\) is the fluid kinematic viscosity, m²/s; \({d_0}\) is the diameter of the injection column, m.
The pipe flow is transformed from laminar flow to transitional flow and turbulent flow, and the along-travel loss coefficient \({\lambda _i}\) of its pipe has a certain relationship with the Reynolds number \(R{e_i}\) and the relative roughness of the pipe in different flow states, and the expression of the relative roughness of the water injection pipe column is:
Where, the relative roughness dimensionless numbers \(\overline {\varepsilon }\), \(\varepsilon\) are the roughness of the inner surface of the pipe, m; and \({d_0}\) is the diameter of the water injection pipe column, m.
When the Reynolds number\(Re<2300\), the water injection column for laminar flow; when the Reynolds number\(Re \geqslant 2300\), the flow development of turbulent flow, this time along the resistance coefficient using the Haaland (Haaland) formula calculation10. For the i-th layer flow, along with the loss coefficient, \({\lambda _i}\) can be expressed as:
Flow model of water injection device
The fluid velocity \({v_{oi}}\) fluid continuity equation in the injection column is expressed as:
Where \({v_i}\) is the average flow velocity at the outlet of the i-th nozzle, m/s; indicates the overflow area of the nozzle, m².
Bringing Eq. (8) into Eq. (4) yields a functional relationship between the outlet flow rate \({v_i}\) at the nozzle and the along-track loss, and bringing Eqs. (4) and (3) into Eq. (1) yields an expression for the outlet flow rate:
The nozzle outlet flow rate \({v_i}\), brought into Eq. (1) can be calculated to obtain the i-th layer injection volume expression as:
.
The coefficient of loss along the injection column \({\lambda _{oi}}\) is a function of the velocity within the injection column \({v_{oi}}\), which cannot be derived when the flow rate at the nozzle outlet is unknown. An iterative method is used to solve for the loss coefficient along; the water injection column. The calculation process is shown in Algorithm 1:
Reinforcement learning control algorithm
Water injection device state space
The environmental state information includes ground information, information about the structure of the layered water injection unit, real-time sensor information, formation information, and information about the target injection volume, and the state space is defined as:
Where, \({p_0}\) indicates the wellhead injection pressure, Pa; \({x_i}\) denotes the percentage of nozzle opening at i-th layer; \({h_i}\) denotes the depth of water dispenser at i-th layer, m; \({p_{ai}}\) denotes the formation pressure at i-th layer, Pa; \({p_{bi}}\) denotes the pressure in the injection column at i-th layer, Pa;i denotes the target injection volume at i-th layer i, m2/d.
Water injection device action space
Layered water injection device is by controlling the size of the nozzle opening to change the nozzle pressure difference to achieve the requirements of different layer segments of the water injection flow, here the nozzle movement time \(r{o_i}\) is a parameter of the \(\:agent\) continuous action space, the continuous action space \(\:A\) is defined as:
Wherein, \(r{o_i} \in [ - 5,5]\), when \(r{o_i}>0\) means that the spool in the water distributor moves in the direction of increasing the overflow area for \(r{o_i}\) seconds, when \(r{o_i}=0\) means that the spool does not move, and when \(r{o_i}<0\) means that the spool in the water distributor moves in the direction of decreasing the overflow area for \(r{o_i}\) seconds.
Water injection device reward function
In the design of the water injection algorithm, the intelligent body (water injection device) contains two objectives, the smallest possible injection error in each layer segment and the smallest possible number of adjustments.
Motion reward \({R_r}\) is a control reward for the direction of motion of the nozzle, where the nozzle opening is proportional to the flow rate, and a reward for the direction of motion of the nozzle is given based on the target injection volume \({q_{ta}}\) and the actual injection volume \({q_{re}}\). The motion reward is expressed as:
Where, \({q_{ta}} \in \{ {q_{ta1}},{q_{ta2}}, \cdots ,{q_{ai}}\}\) denotes the set of target injections for the layer segment, m³/d; \({q_{re}} \in \{ {q_{re1}},{q_{re2}}, \ldots ,{q_{rei}}\}\) indicates the set of actual injections in the layer segment, which is data used in the regulation process of the waterflooding model, m³/d. \(r{o_i}\) indicates the time and direction of rotation of the i-th layer, when \(r{o_i}>0\) means \(r{o_i}\) second of rotation in the direction of increasing spout opening. Conversely, rotate for \(r{o_i}\) seconds in the opposite direction.
Position reward \({R_l}\) is to prevent the water nozzle movement to the critical value, the maximum or minimum situation, the opening is zero when the water injection device layer segment injection is zero, and seriously affects the life of the water injection device, in order to prevent the occurrence of this situation, when the \({x_i}=0\) or \({x_i}=100\) when the \(\:agent\) penalties is given. The position reward is expressed as :
The purpose of time reward \({R_t}\) is to expect \(\:agent\) to reach the goal in as few adjustment steps as possible, and different numbers of layer segments have different time rewards, n denotes the number of layer segments and the reward is denoted as:
The error reward \({R_e}\) indicates the distance between the actual value and the target value, the continuous reward can be a good response, \(\:agen\)t the distance between the target value, the error indicates the distance between the actual value and the target value, the error and the percentage of the target value of ten times as the penalty value of the error reward, the error reward is expressed as:
The target reward \({R_{ta}}\) is the reward when \(\:agent\) reaches the error range of the target value, measured in terms of the mean absolute error, and the mean absolute error of the injection wells is expressed as:
When \(\:agent\) reaches the target value, a reward is given, and the target reward table is:
Based on motion reward, position reward, time reward, error reward and target reward then the total reward function is expressed as:
Combined with SAC algorithm
The SAC algorithm (Soft Actor-Critic) is a maximum entropy model-free deep reinforcement learning algorithm, which can be trained in an offline environment and can solve the reinforcement learning problem in discrete and continuous action spaces well29,30,31,32.
The network structure of SAC algorithm increases the exploration space of the network and avoids falling into local optimum. It can maximize the trade-off between expected return and entropy, and has achieved leading results in a number of standard environments33,34,35the structure of the SAC algorithm is shown in Fig. 3.
The policy network receives the state s from the environment as input. For the water injection environment with n layer segments, the number of input parameters is \(1+n \times 5\), for different number of layer segments the hidden layer is n fully connected layers, 64 neurons for one and two layer segments, and 128 neurons for three layer segments, and LeakyReLU and Tanh are used as the activation functions for the hidden and output layers.
Entropy denotes the degree of randomness with respect to a random variable and entropy is defined as36,37:
The SAC algorithm maximizes the cumulative expected reward while making the strategy more stochastic, and the optimization objective of the strategy is defined as38.
Where, \(r \in (0,1)\) discount factor, which responds to the effect of future rewards on the current harvest; \(\alpha \in (0,1)\) temperature coefficient, which controls the importance of entropy; and \(\mathcal{H}(\pi ( \cdot |{s_t}))\) denotes the degree of stochasticity of strategy \(\pi\) in state s.
SAC uses two action value functions \({Q_{\omega i}}({s_t},{a_t})\) and picks the network with the smaller Q value each time the Q network is used, thus mitigating the problem of high Q values, and the loss function for any of the functions Q is39,40:
Where \(\mathcal{D}\) is the data collected by the strategy in the past, \({Q_{\omega _{i}^{ - }}}({s_{t+1}},{a_{t+1}})\) is the Q target network used to approximate it, \({Q_{\omega _{j}^{ - }}}\) is the target Q network with parameter \({\omega ^ - }\), and the update method is denoted as:
Where \(\tau\) is the learning rate, the original target network and the corresponding Q network after iterative learning are assigned a weighted average to update the target network. The Q network is updated using gradient descent and the gradient expression for the Q network is:
Where \(\mathcal{B}\) denotes the selection of a fixed batch size of \(\mathcal{B}\) samples from buffer \(\mathcal{D}\), and for each sample a target network is used to compute \({y_i}\) equation:
The policy network is a state-to-action mapping, with policy \(\pi\) over minimizing the scatter (KL) to update, and a policy \(\pi\) loss function expressed as:
Since the process of sampling a Gaussian distribution is not derivable, the SAC algorithm uses a reparameterization to make the sampling process derivable for the policy function, which is denoted by the policy function\({a_t}\)
where \({\epsilon _t}\) random variable, and considering both Q functions, the loss function of the rewrite strategy is:
The expression for the gradient \({\nabla _\theta }{L_\pi }\) of the policy network at time slot t is:
The temperature coefficient of entropy is important in the SAC algorithm, and different sizes of temperature coefficients are chosen for different states. In order to automatically adjust the temperature coefficient of entropy, the SAC algorithm constructs an optimization problem with constraints as:
.
The transformation of Eq. (30) into a dyadic problem through the Lagrangian dual method leads to the loss function41,42 at the time slot t:
.
Where \({\mathcal{H}_0}\) denotes the minimum policy entropy threshold.
Simulation results and analysis
Example verification of iterative calculation method for layer segment flow rate
In the intelligent regulation system for layered water injection, accurate calculation of segment flow is the core foundation for constructing the training environment of reinforcement learning. The flow in multiple downhole segments is affected by strongly coupled factors such as pressure, pipe diameter, and nozzle opening, making it difficult for traditional analytical methods to characterize non-linear flow characteristics. Therefore, an iterative calculation model is constructed based on Bernoulli’s equation and fluid mechanics theory. By iteratively solving the coupling relationship between the along - track loss coefficient and flow rate, it provides flow state feedback of water injection wells for reinforcement learning algorithms, and supports the training of intelligent regulation strategies of algorithms such as SAC in dynamic environments. The real well data of a two - segment well in a certain oilfield in Daqing are shown in Table 2. The correctness of the iterative calculation method for segment flow is verified by comparing with the real well data.
To verify the accuracy of the iterative calculation method for layer-segment flow, a calculation model is constructed based on the real-well parameters in Table 2. This well includes Wellbore 1 (with a well depth of 890.25 m) and Wellbore 2 (with a well depth of 921.1 m). The inner diameter of the tubing is 0.062 m and the roughness is 0.2 m. Two sets of water nozzles are equipped, and the opening range of each set is 0–0.002 m, which is divided into 100 scales. The flow coefficient\({C_d}=0.31 - 0.01d+1.59 \times {10^{ - 4}}{d^2} - 7.43 \times {10^{ - 7}}{d^3}\), and the corresponding layer-segment flow range is 0–80 m³/d. The injection flow rate at the wellhead is 50.2–77.29 m³/d, the pressure is 11.01–11.09 MPa, the fluid density is 980 kg/m³, and the kinematic viscosity is 0.001 \({\text{Pa}} \cdot {\text{s}}\). The simulated values of the layer-segment flow output by the model is compared with the actually measured flow data at the wellhead. The comparison results between the actual values and the simulated values are shown in Table 3.
-
1.
The water quality of the water injection well affects the fluid viscosity, which changes the flow state.
-
2.
The roughness of the pipe wall causes deviations in the calculation of long-track resistance.
-
3.
The flow coefficient fitting is not completely matched, leading to inaccurate calculation of nozzle flow.
These factors collectively cause errors in the iterative calculation method for layer - segment flow. The absolute errors under various working conditions are between 1% and 6%, which can reflect the flow change trend of each layer segment in the real environment.
During the actual water injection process, sensor data often exhibits fluctuating characteristics due to factors such as equipment vibration and sensor accuracy. To simulate this working condition, a randomly generated coefficient error of -0.05 to 0.05 is added to the flow coefficient. The flow-pressure difference relationship with fluctuating characteristics is shown in Fig. 4.
Training environment and parameters
To determine the optimal hyperparameters of the SAC algorithm, we first identify the learnable learning rates. Through extensive testing, the model converges only when the learning rates are 3e-5, 3e-10, and 3e-15 under different layer segment attribute environments. Therefore, with the learning rate determined, we analyze the neural network parameters by training for 500 steps with 32–512 neurons and 1–4 hidden layers and statistically calculate the average reward of the last 50 steps to determine the optimal neural network parameters. The statistical results are shown in Table 4.
In a single-layer environment, the average reward reaches its maximum when the number of neurons is 64 and the number of hidden layers is 3. When the number of neurons increases to 128, the reward value shows little change. In two-layer and three-layer environments, the reward value fluctuates as the number of neurons increases. When the number of neurons reaches 256 and 1024, the average reward values in both two-layer and three-layer environments decrease significantly, indicating overfitting of the model. Based on the comparison results, a hierarchical water injection environment is constructed using Python for agent training. The optimal neural network parameters are shown in Table 5.
The size of the experience replay buffer increases with the number of layer segments (50,000 for three layer segment) to store high-dimensional data and enhance generalization. The sample size is increased from 64 to 512 to reduce gradient variance. A soft update parameter of 0.005 balances the retention of historical experience and the absorption of new policy information. With 500 training cycles and a maximum of 200 steps per cycle, a discount factor of 0.9 emphasizes short-term rewards. These hyperparameters, optimized for layered water injection well regulation via the SAC algorithm, were determined through extensive trial training.
Training result analysis
Each agent were placed in a water-filled environment for 500 training steps, with a stopping condition of achieving less than 5% error. At the same time, in order to verify the performance of the algorithm, under the same water injection environment, the test randomly generates 100 groups of water injection environment under the completion of different algorithms, comparative analysis of the algorithms in the regulation of the error, the adjustment of the number of steps and other aspects of the performance of the performance differences. Some of the initial data of the injection environment are shown in Table 6.
In order to analyze the performance of different continuous-action-space reinforcement learning for flow regulation in layered water injection wells, the rewards of PPO, DDPG, and SAC algorithms are compared and analyzed during the training process, as shown in Fig. 5, which demonstrates the changes of the reward values for 500 steps of iterative training in the environment of 1 ~ 3 layer segment.
The PPO, DDPG, and SAC algorithms all reached the reward ceiling and converged after 500 steps of training. With the increase in the number of layer segments, the reward fluctuations of the algorithms increases significantly. As shown in Fig. 4a, the maximum reward value of each algorithm is close to 0, but PPO fluctuates the most drastically; Fig. 4b shows that SAC rewards are concentrated in the range of -500 ~ 0 (with the slightest fluctuation), DDPG is in the range of -2500 ~ 0, and PPO extends to -5000~-500; in Fig. 4c, SAC rapidly rises to a stable value near 0 in the first 50 steps, DDPG is in the − 2500~ -500range with small fluctuations, while PPO oscillates violently at -8000~-1500. As the complexity of the water injection environment increases due to the increase in the number of layer segments, the overall reward value of each algorithm rises. Still, the stability varies significantly: SAC maintains the highest reward and the lowest fluctuation, DDPG ranks second highest, and PPO has the lowest reward and the largest fluctuation.
Regulation error analysis
In order to analyze the performance of each algorithm error absolute rate under different water injection layer segments, the same water injection environment was used with 0% regulation error and 50 consecutive control steps. The error changes of PPO, DDPG and SAC algorithms were analyzed under different water injection layer segments, and the error curves for different water injection layer segments are shown in Fig. 6. The horizontal axis represents the number of adjustment steps, and the vertical axis represents the relative absolute error (RAE).
As shown in Figs. 5a-i, the experimental results show that the convergence process of the PPO algorithm is accompanied by significant oscillations in the single-layer segment scenario, and both the DDPG and SAC algorithms can converge to zero relative absolute error within 20 steps; In the two-layer segment environment, the minimum error of the PPO is as high as 0.2, while the DDPG and SAC achieve simultaneous dual-channel convergence, with SAC converging faster. In the three-layer segment scenario, the PPO shows a linear decrease in error, but the interlayer error difference is significant (0.1–0.4). In contrast, both DDPG and SAC show exponential convergence characteristics, with SAC achieves near-zero error control multiple times in the layer 2 section, demonstrating better multi-objective coordination capability.
In order to compare the robustness of each algorithm’s regulation performance, the relative absolute error distribution under 100 sets of random initial conditions, the maximum number of adjustment steps is 100 steps, the stopping error is 5% are analyzed, and the regulation errors of different algorithms are shown in Fig. 7. SAC algorithm performs optimally in terms of the convergence stability (average relative absolute error 0.05 ± 0.005) and adaptability to complex environments, and PPO algorithm presents the maximum error fluctuation (0.37 ± 0.1). The difference in error between DDPG and SAC in the single-layer segment scenario is only 0.01, but as the number of layer segments increases to three, the error increase for both is 0.07 and 0.04, respectively, which are significantly lower than the 0.26 increase for PPO.
The results of the pass-rate comparison are shown in Table 7, with SAC leading in the single-layer segment scenario with a 98% pass rate (DDPG 95%, PPO 79%). As the layer-segment complexity increases to three layers, SAC still maintains an 81% pass rate (mean 88%), significantly higher than DDPG (32%/60%) and PPO (14%/45%). The data indicates that the SAC algorithm has stronger robustness advantage in multi-objective cooperative control scenarios.
Step analysis
The regulatory performance analysis is shown in Fig. 7. With the nozzle opening as the core control variable, its dynamic response characteristics, adjustment frequency, and convergence trajectory directly determines the service life of the water injection device and the measurement and adjustment efficiency. This experiment compares the regulation performance of each algorithm under the zero-error constraint and reveals the optimization mechanism of deep reinforcement learning for the dynamic characteristics of the actuator.
As shown in Fig. 8a, the opening of PPO algorithm fluctuates significantly but is close to the optimal value for multiple times, and both DDPG and SAC achieve accurate tracking. Figure 8b shows that for PPO, layer segment 1 exhibits a descending trend, whereas the deviation of layer segment 2 is prominent. Meanwhile, DDPG converges to the optimal opening in about 60 steps, and SAC achieves convergence in 20 steps ahead of time. Figure 8c indicates that the deviation of PPO layer segments 1 and 3 is about 30%, and the deviation of layer segment 2 reaches 60%. By contrast, DDPG maintains a stable deviation of 2%, while SAC nearly coincides with the optimal opening, demonstrating the optimal regulation performance.
Taking the relative absolute error of 5% as the adjustment target, analyze the distribution of adjustment steps for different algorithms. The distribution of adjustment steps for each algorithm is shown in Fig. 9.
The average adjustment step of the SAC algorithm is 16.04 steps in a single-layer segment environment, 50.40 steps in a two-layer segment environment, and 55.32 steps in a three-layer segment environment, resulting in an increase of 39.26 steps. The DDPG algorithm requires averaged of 23.85 steps in a one-layer segment environment and 85.19 steps for three-layer segment, an increase of 61.34 steps of adjustment. The PPO algorithm requires 36.17steps in a one-layer segment environment and 95.02 steps in three-layer segment with an adjusted step growth of 58.85 steps. The average number of adjustment steps for the SAC algorithm in different environments is 40.59 steps, which is much smaller than 68.86 steps for the PPO algorithm and 61.34 steps for the DDPG algorithm.
As shown in Table 8, the SAC algorithm has a probability of completion within 1–20 steps of 31% and within 1–50 steps of 69%, which is much higher than PPO (33%) and DDPG (47%). The SAC algorithm distribution probability in the range of 90–100 steps is (15%) much lower than PPO (55%) and DDPG (41%). Thus, The SAC algorithm adjusts faster than both the PPO and DDPG algorithms.
Conclusion
This study compares and analyzes the performance of three reinforcement learning algorithms, PPO, DDPG, and SAC, in hierarchical water injection regulation. Experiments show that all three algorithms can converge to a stable state in basic training, but the SAC demonstrates outstandingly advantages in complex scenarios, including the fastest convergence of its regulation error, with an average error 5% (standard deviation of 0.005), and an average qualification rate of 88% when the water injection error is < 5% as the qualification criterion, which is significantly higher than that of PPO (45%) and DDPG (60%). Meanwhile, SAC achieves the target with fewer adjustment steps (69% completion probability within 1–50 steps under the 5% error threshold), and performs optimally in control accuracy, stability, and environmental adaptability, providing an efficient solution for water injection regulation.
This study is currently limited to the theoretical simulation stage of intelligent regulation for layered water injection wells. However, the real-world application of intelligent regulation remains essential. To advance its practical implementation, the intelligent regulation algorithm will be deployed locally on near-wellbore surface hosts or embedded water distribution systems via serial communication, thereby avoiding IoT-related latency. Meanwhile, we will explore model optimization techniques to enhance edge deployment feasibility and further develop a hardware testing platform and field trials for intelligent control of layered water injection wells.
Data availability
The datasets generated and/or analysed during the current study are available in the GitHub repository, and the data is publicly accessible. The persistent web link to the datasets is as follows: https://github.com/HJZ-hub/SACIntelligentLayeredWaterInjection.
References
Liu He, X. et al. Current status and trend of separated layer water flooding in China. Pet. Explor. Dev. 40 (6), 785–790 (2013).
Liu He, Z. et al. Development and prospect of downhole monitoring and data transmission technology for separated zone water injection. Pet. Explor. Dev. 50 (1), 191–201 (2023).
Dong Lifei, Z. et al. Experimental investigation on layer subdivision water injection in multilayer heterogeneous reservoirs. ACS Omega. 8 (46), 43546–43555 (2023).
Yunchang Zhao. Study on Mechanism and Characterization Method of Interlayer Physical Property Interference (China University of Petroleum (East China), 2020).
Liu He, Z. et al. Development and prospect of separated zone oil production technology. Pet. Explor. Dev. 47 (5), 1103–1116 (2020).
Liu He, P. et al. Connotation, application and prospect of the fourth-generation separated layer water injection technology. Pet. Explor. Dev. 44 (4), 644–651 (2017).
Runze An. Flow Field Analysis and Structure Optimization of Eccentric Injection Mandrel (Beijing Jiaotong University, 2022).
Zhao, G. et al. Allocation mothed and calculation of layered injection rate of liquid control intelligent layered water injection process. Fault-Block Oil Gas Field. 28 (02), 258–261 (2021).
Lizhi, Z. G. et al. Faqing Determining method of nozzle size used for concentric layered water injection. Oil Drilling Prod. Technol.. 37(05), 95–99 (2015).
Xiufang Jiang. How to solve the choke loss formula of water regulator nozzles by experiments. J. Jianghan Petroleum Univ. Staff Workers. 24(03), 49–52 (2011).
Arulkumaran, K. et al. Deep reinforcement learning: A brief survey. IEEE. Signal. Process. Mag. 34 (6), 26–38 (2017).
Xiang, G. & Su, J. Task-oriented deep reinforcement learning for robotic skill acquisition and control. IEEE Trans. Cybernetics. 51 (2), 1056–1069 (2019).
Beltran-Hernandez, C. C. et al. Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach. Appl. Sci. 10 (19), 6923 (2020).
Phan, B. C. & Lai, Y. C. Control strategy of a hybrid renewable energy system based on reinforcement learning approach for an isolated microgrid. Appl. Sci. 9 (19), 4001 (2019).
Liu, D. & Li, L. A traffic light control method based on multi-agent deep reinforcement learning algorithm. Sci. Rep. 13 (1), 9396 (2023).
Tampuu, A. et al. Multiagent Cooperation and competition with deep reinforcement learning. PloS One. 12 (4), e0172395 (2017).
Quang Tran, D. & Bae, S. H. Proximal policy optimization through a deep reinforcement learning framework for multiple autonomous vehicles at a non-signalized intersection. Appl. Sci. 10 (16), 5722 (2020).
Kiran, B. R. et al. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 23 (6), 4909–4926 (2021).
Zhao, C. et al. Inverse kinematics solution and control method of 6-degree-of-freedom manipulator based on deep reinforcement learning. Sci. Rep. 14 (1), 12467 (2024).
Lindner, T., Milecki, A. & Wyrwał, D. Positioning of the robotic arm using different reinforcement learning algorithms. Int. J. Control Autom. Syst. 19 (4), 1661–1676 (2021).
Lindner, T. & Milecki, A. Reinforcement learning-based algorithm to avoid Obstacles by the anthropomorphic robotic arm. Appl. Sci. 12 (13), 6629 (2022).
Huanyan, J. et al. Energy saving control for subway station air conditioning systems based on reinforcement learning. Control Decis. 37 (12), 3139–3148 (2022).
AlMahamid, F. & Grolinger, K. Reinforcement learning algorithms: An overview and classification. In 2021 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE). IEEE 1–7 (2021).
Duan, J. et al. Distributional soft actor-critic: Off-policy reinforcement learning for addressing value Estimation errors. IEEE Trans. Neural Networks Learn. Syst. 33 (11), 6584–6598 (2021).
Tiancheng, F. et al. Multi-scale mechanics of submerged particle impact drilling. Int. J. Mech. Sci. 285, 109838 (2025).
Zhang, L. et al. Production optimization for alternated separate-layer water injection in complex fault reservoirs. J. Petrol. Sci. Eng. 193, 107409 (2020).
Houben, G. J. Hydraulics of water wells–flow laws and influence of geometry. Hydrogeol. J. 23 (8), 1633 (2015).
Tiancheng, F. et al. Carbon neutrality perspective: enhanced failure performance and efficiency of Circulating particle jet impact drilling in deep-hard rock. Geoenergy Sci. Eng. 247, 213666 (2025).
Mnih, V. et al. Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013).
Wang, H. & Wang, J. Enhancing multi-UAV air combat decision making via hierarchical reinforcement learning. Sci. Rep. 14 (1), 4458 (2024).
Liang, P. P., Zadeh, A. & Morency, L. P. Foundations and trends in multimodal machine learning: Principles, challenges, and open questions. arXiv:2209.03430 (2022).
Haarnoja, T. et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR 1861–1870 (2018).
Wang, M. et al. Spectrum-efficient user grouping and resource allocation based on deep reinforcement learning for MmWave massive MIMO-NOMA systems. Sci. Rep. 14 (1), 8884 (2024).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518 (7540), 529–533 (2015).
Silver, D. et al. Mastering the game of go without human knowledge. Nature 550 (7676), 354–359 (2017).
Cai, C. & Wei, M. Adaptive urban traffic signal control based on enhanced deep reinforcement learning. Sci. Rep. 14 (1), 14116 (2024).
Raj, R. & Kos, A. Intelligent mobile robot navigation in unknown and complex environment using reinforcement learning technique. Sci. Rep. 14 (1), 22852 (2024).
Gao, J. et al. An improved sac-based deep reinforcement learning framework for collaborative pushing and grasping in underwater environments. IEEE Trans. Instrum. Meas. 73, 1–14 (2024).
Xu, Y. et al. Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces. Neurocomputing 537, 141–151 (2023).
Yang, L., Bi, J. & Yuan, H. Intelligent path planning for mobile robots based on SAC algorithm. J. Syst. Simul. 35 (8), 1726–1736 (2023).
Haarnoja, T. et al. Reinforcement learning with deep energy-based policies. In International conference on machine learning. PMLR 1352–1361 (2017).
Haarnoja, T. et al. Soft actor-critic algorithms and applications. arXiv:1812.05905 (2018).
Acknowledgements
This work was supported by the National Major Science and Technology Project of China (Grant No. 2024ZD1406502). The authors would like to thank all contributors for their valuable efforts in this research.
Author information
Authors and Affiliations
Contributions
J. Hu. developed the methodology, conducted experiments, and drafted the manuscript. D. J. secured funding, contributed to software implementation and data validation. S. L. performed formal analysis, created visualizations, and assisted in manuscript revision. W. W. provided conceptual guidance, resources, and research supervision. F. R. conceived the study, managed the project, and supervised the work while revising the manuscript. All authors reviewed and approved the final version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hu, J., Jia, D., Liu, S. et al. Research on intelligent regulation of layered water injection based on reinforcement learning SAC algorithm. Sci Rep 15, 33044 (2025). https://doi.org/10.1038/s41598-025-11521-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-11521-w