Abstract
In the field of modern petroleum industry, fault diagnosis and classification of pumping units are among the most crucial research topics. The conventional diagnostic method involves installing sensors on the pumping unit to analyze the dynamometer cards(DCs). However, due to the sensor cannot work stably for a long time and is easily damaged, a more stable and effective data source is required for diagnostics. Compared with the DCs, the parameters of the driving motor of the pumping unit can also effectively reflect the running state of the pumping unit. And the electrical parameters can be easily collected for a long time. However, due to the difficulty in obtaining sufficient fault electrical parameter samples, intelligent methods find it difficult to learn the characteristics and patterns of electrical parameter data. Therefore, this paper proposes a new method for fault diagnosis. Firstly, establish a mechanism model to convert the existing faulty DC into electrical parameter data. Subsequently, an improved Neural Hierarchical Interpolation for Time Series Forecasting (N-HiTS) method was used to make multiple predictions of the fault state to expand the dataset. Finally, use OMNI-SCALE Convolutional Neural Network (OS-CNN) for fault diagnosis. According to experimental results, this proposed method effectively addresses insufficient electrical parameter data issues and achieves superior classification performance.
Similar content being viewed by others
Introduction
Oil is one of the most important energy sources and a necessity for social production, with many oil fields being exploited around the world1,2,3. At present, most of the oil fields in China use the rod pumping unit for oil production. Due to harsh oil field environments, complex underground conditions, and prolonged operation leading to various faults in pumping units, maintenance often requires halting production. This process incurs significant economic and time costs due to remote oil field locations. Therefore, in order to improve the production efficiency and reduce the maintenance cost, a reliable and stable way is needed to monitor the running condition of the rod pumping unit in real time. Moreover, it can determine the fault type in time when the pumping unit fails, and help technicians to arrange maintenance plans.
In the early oil production site, the diagnosis methods of the fault types of pumping units are mainly manual inspection and expert experience. The fault type of the pumping unit can be determined by experienced technicians through analyzing the sound emitted during its operation and evaluating the operating cycle condition but it is difficult to judge the fault type with high accuracy.
With the development of electronic technology, the dynamometer cards4,5 collected by the sensor installed on the pumping unit has become the main judgment basis. At the same time, some intelligent methods are increasingly applied to analyze DCs, which accurately reflect subsurface conditions and enable effective fault classification in rod pumping units. For example, Wang6 extracted characteristic parameters from the DCs, combined with relevant data to construct a database, and compared it with typical working conditions, to obtain judgment rules for fault diagnosis of pumping units. However, the DC rely on sensors installed on pumping units, which have short lifespans due to harsh field conditions causing erosion and damage. Frequent sensor replacements require production halts, raising costs, necessitating more stable and cost-effective data sources for fault diagnosis.
The motor is used as the power source of the beam rod pumping unit, and the operation of the motor is coupled with the working state of the pumping unit7,8. The malfunction of the underground part of the pumping unit will interfere with the rotation of the motor, and abnormal motor speed will further affect operation. Therefore, motor parameters can serve as indicators for fault analysis and diagnosis.For example, Wilamowski et al9. proposed fault diagnosis by measuring the terminal parameters of the drive motor and using a neural network model to learn to extract the characteristics of the pumping unit. By analyzing the internal relationship between motor and pumping unit, Chen et al10. proposed a new method for fault diagnosis by combining motor parameters and DCs. Srinivas et al11. proposed using discrete wavelet transform for feature extraction of current data, and using artificial neural networks for fault diagnosis of motors.
However, for the majority of AI-based data-driven fault detection methods for pumping units based on electric parameters, a significant issue lies in the insufficiency of information resulting from the limited number of fault data samples. The failure frequency of oil pumping units is low, and there is neither the equipment nor the requirement for collecting electrical parameters under faulty conditions. Eventually, only limited fault data was collected.
In order to solve this problem, generating adversarial network(GAN)12,13 is a typical solution. However, as there are very few electrical parameter samples regarding the failures of oil pumping units, the effectiveness of the trained models is rather unsatisfactory.
To address the aforementioned issue, this paper introduces a fault sample generation method. First, a mechanistic model is established to map dynamometer card data to the electrical parameters of the pumping unit. Using dynamometer cards corresponding to known fault conditions, the electrical parameters are simulated to generate realistic fault samples. This approach effectively alleviates the problem of limited electrical parameter samples. The generated electrical data closely match the actual measurements from oil pumping units, enabling the effective training of a deep learning-based fault detection model. Furthermore, this paper proposes a novel fault detection method for pumping unit electrical parameters based on augmented data using the N-HiTS framework. In N-HiTS14, presents a neural network-based time series prediction model that incorporates hierarchical interpolation and multi-rate data sampling techniques. This framework is designed to automatically produce step-by-step forecasts. By applying interpolation following hierarchical sampling and prediction, the number of sample dataset is expanded. This not only enhances long-term prediction accuracy but also significantly reduces computational complexity.
The contributions of this paper are as follows.
-
1)
In response to the difficulty in obtaining sufficient fault electrical parameter samples and the situation that intelligent methods have trouble learning the characteristics and patterns of such data, a mechanism model is first established to convert the existing faulty dynamometer cards into electrical parameter data, thus providing a data foundation for subsequent diagnosis.
-
2)
The improved Neural Hierarchical Interpolation (N-HiTS) method is adopted to make multiple predictions on the fault state so as to expand the data set and solve the problem of insufficient data.
-
3)
The OS-CNN15 is used for fault diagnosis. Eventually, it is verified through experiments that this method effectively solves the problem of insufficient electrical parameter data and achieves better classification performance.
Methods
Electric parameter transformation model
At present, most of the domestic oil fields use the rod pumping unit driven by motor as the main ground equipment of the pumping system. As shown in Figure 1, this type of rod pumping unit is mainly composed of four parts: motor, reducer box, crank-link-travel beam and auxiliary device. The motor is connected by a belt and a gearbox, which in turn is connected to a crank. When the motor is started, high-speed rotation is generated, and the crank is driven by the reducer box for slow rotation. By connecting the crank to the travel beam, the low-speed rotation of the crank drives the rod to reciprocate in the vertical direction, so that the rod drives the pump to draw the underground crude oil to the surface.
The rod completes a reciprocating movement divided into upper stroke and lower stroke. The down stroke mainly relies on the power of the motor to drive the crank to rotate counterclockwise, so as to achieve the drop of the rod. The upstroke mainly relies on the inertia of the crank to lift the rod, and the speed of the motor is reduced at this time. In the case of normal, the time of the up stroke and the down stroke should be the same, so that the pumping unit can maintain the overall mechanical balance when working. Although the rotation of the crank is non-uniform, the change of the crank speed shows an obvious periodicity, so the change of the motor parameters is also periodic. When the downhole conditions change, the rod load changes, and then the mechanical balance of the pumping unit is affected, and finally the motor speed produces abnormal fluctuations. Therefore, the fault type of the pumping unit in the process of oil production can be analyzed and judged by the change of motor parameters.
As shown in Figure 2, the DCs reflects the relationship between suspension point load and displacement of pumping unit. The oil extraction process of a pumping unit consists of an upstroke and a downstroke.In an ideal state, the upstroke starts from the bottom dead point A. At this point, both the fixed valve and the traveling valve are in the closed state, and the pumping rod is in a lifting motion. Therefore, the liquid column load acts on the plunger, resulting in an increase in the suspension point load. When the suspension load increases to point C, the fixed valve opens. Due to changes in pressure inside and outside the pump, liquid begins to be drawn into the pump barrel. The suspension load no longer increases until the top dead point D, at which point the upstroke ends and the fixed valve closes again. The downstroke begins, with the pumping rod moving downward. Due to the buoyancy of the fluid column, the suspension point load begins to decrease. After reaching point B, the traveling valve opens, allowing the fluid column load to transfer to the tubing, and the pump barrel begins to discharge fluid. The polished rod load remains unchanged until the stroke reaches the bottom dead point A again. Therefore, the ideal dynamometer card for normal operating conditions is a parallelogram. However, in real-world scenarios, due to factors such as the inertia of the pumping rod and the vibration of the rod column, the edges of the quadrilateral form a wavy shape.
In this paper, the mechanism model from the driving motor to the suspension point of the pumping unit is set up, and the change of the suspension point load is reflected to the change of the electric power of the driving motor, so as to convert the DCs into the electric power diagram.
First, the power \(P_{motor}\) of the motor driven by the rod pumping unit can be expressed as:
Where \(n_m\) is the speed of the motor, the unit is minute; \(M_{cr}\) refers to the net torque of the crank shaft; i is the transmission ratio from the motor to the gearbox; \(\eta\) is transmission efficiency. i and \(\eta\) are calculated as follows:
where \(i_1\) is the transmission ratio of the gearbox; D is the diameter of the gearbox pulley wheel,d is the diameter of the motor pulley; \(\eta _1\) is the rotation efficiency of the belt, \(\eta _2\) is the transmission efficiency of the gearbox. The net torque of the crank shaft \(M_{cr}\) is equal to the torque \(M_p\) of the crankshaft caused by the suspension load P minus the torque \(M_c\) caused by the crank balance weight, that is:
Where \(M_p\) is defined as:
Where B is the unbalance value of the pumping unit structure; \({\bar{TF}}\) is the torque factor, and the corresponding torque factor is different for different types of pumping units. The geometric size of rod pumping units is shown in Figure 3. The calculation method of \({\bar{TF}}\) is as follows:
Where a and b are the length of the forearm and the back arm of the beam respectively; \(\alpha\) is the angle between the crank and the connecting rod, \(\beta\) is the angle between the back arm of the beam and the connecting rod; r is the radius of the crank; l is the length of the connecting rod; K is the distance from the center of the crank to the center of the beam; \(\varphi\) is the crank angle, \(\varphi _0\) is the angle between K and the 12 o ’clock position; I is the horizontal distance from the center of the beam shaft to the center of the crank shaft, H is the vertical distance from the center of the beam shaft to the base, G is the vertical distance from the center of the crank shaft to the base; \(\psi\) is the angle between b and K, \(\psi\) can be decomposed into the sum of the angle \(\kappa\) and the angle \(\rho\), \(\kappa\) and \(\rho\) can be calculated by the triangular relationship in Figure 3. J is the distance from the center of the beam to the center of the crank pin.
The connection between the motor and the pulley of the gearbox converts the high-speed rotation of the motor into the low-speed rotation of the crank, so the rotation angle of the crank shaft \(\varphi\) can be defined as:
Where \(\omega\) is the angular speed of the crank shaft, and t is the timing that begins when the crank shaft turns from its initial position. Where \(M_c\) is defined as:
\(M_{cmax}\) is the maximum balance torque of the crank, which is calculated from the total weight of the crank balance block \(W_{cb}\), the crank weight \({W_c}\), the crank balance radius R and the crank center of gravity radius \(R_c\). It can be seen from the above formula that the net torque of the crank \(M_{cr}\) can also be expressed as:
The relationship between motor power and suspension load can be obtained from Equation (1) and (8). Therefore, the change of motor power can be obtained according to the change of suspended point load in the DCs, and the typical fault DC can be converted into electric power curve.
Electrical parameter data expansion model
The DCs is to record the relationship between suspension point displacement and load at different times, and the content of the DC is not time-dependent. Under normal conditions, the motor power changes periodically with the change of time. The acquisition of DCs generally has the same time interval, so the converted motor power can be composed as a time series. The time series part of the fault state is predicted by using the neural network prediction model many times, and the electric power data set can be extended.
The structure of the N-HiTS neural network prediction model is composed of M stacks connected, each stack contains L blocks connected. The treatment of time series by the N-HiTS model is mainly dependent on the structure of the blocks and the connection between the blocks. Each block of N-HiTS consists of 4 layers of MLP with MaxPool layer of size k. However, for time series prediction, MLP is not sensitive to local patterns in the time series and has poor dependence on the time series. And the motor power has obvious periodicity and trend changes, so this paper proposes using convolutional neural networks as the main prediction part of the block. CNN can effectively capture local features in time series and improve prediction accuracy. And an AdaptiveMaxPool layer is used instead of the original max pooling layer. Being able to adapt more conveniently and flexibly to different input samples. The N-HiTS network with improved block structure is shown in Figure 4.
Block function
For the improved block structure, the input first passes through a convolutional layer with a kernel size of \(k_l\), where l refers to the l-th block. After the convolution operation, an AdaptiveMaxpool layer is used to replace the function of the Maxpool layer in the original structure. The expression is as follows:
where \(x_{t-H:t}\) is the input of the first block; H is the length of the time series to be predicted.
The signal processed \(x_{t-H:t,l}^{(p)}\) by the AdaptiveMaxpool layer is input into the Fully connected layer, and the Fully connected layer learns the hidden vector \(h_{l} \in {\mathbb {R}}^{n \times h}\) by nonlinear aggregation. After linear projection, Fully connected layer finally outputs the forward prediction coefficient \(\theta _{l}^{f}\) and backward prediction coefficient \(\theta _{l}^{b}\) respectively, as follows:
For most neural network prediction models, the base of the prediction network is equal to the length H that needs to be predicted. In order to improve the computing power of the model, avoid the degradation of the expression power of the model, and accurately predict the desired length of the time series part. N-HiTS redefines the dimension of the prediction coefficient based on the expressibility ratio \(r_l\) of the number of parameters per unit output time, \(\left| \theta _{l}^{f}\right| =\lceil r_{l}H \rceil\), and in order to be able to predict all points in the horizon H and recover the initial sample rate of the signal, time interpolation is adopted:
Where g is a linear interpolation function, defined as:
Where \(\Gamma\) is a time partition, and \(\Gamma = \left\{ t+1, t+1+1/r_l,\cdots , t+H-1/r_l, t+H\right\}\).
Block to block connection
Different base functions can be set up separately in the stack to learn multiple features of the input data. Each stack makes a local prediction of the input time series, and finally combines it to get a global prediction result, as shown in Figure 5. Each block processes only a portion of the input signal, so the next block needs to increase its focus on the unprocessed portion of the previous block to complete the analysis of the entire time series. N-HiTS is connected between blocks using a double residual topology, where the input of each block is the difference between the input signal of the stack and the backward prediction part of the previous block, and the forward prediction part of each block is added as the output result of the stack, as follows:
Each block in the N-HiTs structure can focus on processing its own input signals, making the network structure clear and the model computing power significantly improved.
Electrical parameter data classification model
In this paper, the OS-CNN model network is used to classify the extended electric power data set. In one-dimensional convolution time series classification models, the size of receptive field (RF) is an important factor affecting the accuracy of classification tasks. The OS-CNN model is capable of covering all scales of RF and shows the best performance in the best RF. In different oil fields, the type of pumping unit is different, and the sensor that collects the signal is also different, so the format of the obtained electrical parameter data may be different. OS-CNN can adapt to a variety of different formats of time series data to obtain the best RF for accurate classification.
The structure and principle of OS-CNN are shown in Figure 6. OS-block with three-layer structure is used as the convolution layer of 1D-CNN. Each layer in OS-block is composed of a similar structure, with the first and second layers having kernel of prime size from 1 to \(p_k\), and the third layer having two kernel of size 1 and 2. The size of the kernel of layer i is \(P^{(i)}\), as follows:
In theory, when the signal is input to OS-block, there can be any combination of lines, the signal through the first two layers to cover all even size RF, after the output of the third layer can cover all integer size RF, \(\mathbb {S}\) is the set of all RF sizes, as shown below:
Where \(\mathbb {E}\) is the set of all even numbers and \(\mathbb {N^{+}}\) is the set of all positive integers. Instead of searching for the best RF size to fit the dataset, OS-CNN can cover all sizes of RF within a specified range, and models with full-size RF can achieve similar performance to models using only the best RF. In addition, OS-CNN can select the range of prime numbers according to the characteristics of the target time series, avoiding the waste of computing resources.
Results
Data conversion
In order to verify the effectiveness of the proposed method, part of the DC collected from a domestic oil field is converted into electrical parameter data. As shown in Figure 7, The pumping unit used in this oilfield is a beam type rod pumping unit, model number is CJY10-3-53HB, and the balance mode is crank balance. The electric power driving the motor is taken as the type of electrical parameter. The drive motor is a three-phase induction motor. The rated parameters of the motor are shown in the Table 1. The selected DC data set includes parting rod, insufficient fluid supply, underbalance, pump leakage, gas influence, five fault types and one normal type.
Based on the models and parameters of the pumping unit and motor, build a mechanism model in Matlab that converts the dynamometer card into a motor power curve. Build an electrical parameter data augmentation and fault diagnosis model in the Pytorch framework. The obtained six types of electric power curves and their corresponding DCs are shown in Figure 8.
For normal operating conditions, the DC is converted into a motor power curve through an electric parameter transformation model. The electric power curve presents a double hump shape, and the peak value during the upstroke stage is close to that during the downstroke stage.
For fault conditions:
-
1)
Insufficient liquid supply: due to the inability to fill the working cylinder with liquid underground in a timely manner. Therefore, during the downstroke, the pressure inside the pump is insufficient, the float valve opens slowly, and the suspension load cannot be reduced in a timely manner. And only when the plunger encounters the liquid layer, the suspension point load will rapidly decrease. Due to the complex and diverse changes in underground oil reservoirs, the severity of insufficient fluid supply will vary over time.
-
2)
Gas influence: At the end of the downstroke, the pressure inside the pump cannot be rapidly reduced due to gas interference, resulting in a delay in the opening time of the fixed valve. Therefore, the time for the increase in suspended load is prolonged. The main factor causing this fault condition is related to the size of the gap between the plunger and the pump barrel. The change in downstroke is similar to insufficient liquid supply, and the decrease in suspension load is relatively slow under different gas affected fault states. The changes in gas influence are also closely related to time. If the gas inside the pump is repeatedly compressed and expanded due to the influence of gas, it will cause gas lock phenomenon. The hanging point load will rapidly rise and then rapidly fall, seriously affecting oil recovery work.
-
3)
Pump leakage: In the initial stage of the upstroke, the floating valve is not tight enough when closed, causing the liquid on the plunger to flow below it. Therefore, the liquid column load decreases and the suspension point load cannot be increased in a timely manner.
-
4)
Unbalanced state: Due to the influence of the structure of the pumping unit, the operation of the pumping unit has always maintained a relatively balanced state. In general, if the ratio of the peak power of the motor to the peak power of the motor is not less than 0.8, it is considered to have a good balance state. The generation of imbalance is related to the downhole working conditions, the mechanical structure of the pumping unit, and the state of the motor. Moreover, changes in the equilibrium state can also be reflected in the time series, and severe imbalances can cause significant damage to the motor and pumping unit.
-
5)
Parting rod: Due to the influence of the depth of the oil reservoir, the pumping rod is designed as a slender straight rod, and some oil fields even use multi-stage rods. The slender shape makes the sucker rod highly susceptible to deformation and vibration. And due to impurities such as wax and gravel in crude oil, the sucker rod is prone to loss and corrosion, leading to fracture. The load change at the suspension point after detachment is small, and the oil well does not produce oil.
In practical oilfield production, the aforementioned faults—except for the sudden fracture of the sucker rod—are closely related to the duration of operation. As a large and expensive piece of production equipment, the pumping unit incurs significant economic losses and resource consumption when shut down for maintenance. Therefore, identifying and classifying faults of varying severity is critical for the effective diagnosis of pumping unit failures.
In summary, this study employs a mechanism model to augment fault-state samples by integrating the mechanical motion characteristics of the pumping unit with variations in electrical parameters during operation. This approach enables the generation of electrical parameter data under various fault conditions over a defined period, thereby effectively addressing the problem of sample scarcity. The dataset provides robust support for subsequent model training. Moreover, maintenance plans can be formulated based on the severity of the faults, minimizing economic losses to the greatest extent possible.
Data expansion
Although the collected DC samples contain various typical fault conditions, there is still a problem of a small sample size. The converted motor power curve is difficult to train on the OS-CNN model due to insufficient sample size. Therefore, it is necessary to expand the number of electric power samples. The DC used in the experiment is collected according to a certain time interval, and the obtained motor power curve is also arranged in time order. The converted electrical power is not a continuous signal in the strict sense. Due to the requirements of production efficiency and cost control, the design of rod pumping units usually needs to meet the long-term stable operation and low failure rate. Combined with the working mechanism of the pumping unit, the motor power is generally periodic in time. Therefore, the converted motor power curve can be directly combined into a complete time series, as shown in Figure 9.
In order to verify that the improved N-HiTS model can better predict electric power, 12 different time series were set for different fault conditions. Each time series contains 12 time periods as training data for the prediction model. Use 1 period length of faulty motor power data as the validation set. Each type of fault contains two time series for training and validation. In order to improve the accuracy and reliability of the expanded dataset, it is necessary to conduct prediction experiments on the power of faulty motors first. LSTM16, RNN17, NBEATS18 and N-HiTS were used for comparative experiments, and the iteration number of the prediction model was set to 1000. Partial prediction results are shown in Figure 10. Error(MAE), Mean Absolute Error(MSE) and Root Mean Squared Error(RMSE)19 are used as indicators of the experimental results, and the prediction results for eight different time series are shown in Table 2.
From the results in Table 1, it can be seen that the improved N-HiTS model has the lowest MAE, MSE, and RMSE values for the electric power time series, indicating the best prediction performance. The accuracy of the prediction is higher than that of the comparison model. In addition, to verify the performance of the improved N-HiTS model, RMAE was used as the metric to compare it with the original N-HiTS model. The definition of RMAE is as follows:
where \(y_{\tau }\) is the observed values; \({\hat{y}}_{\tau }\) is the predicted values of first model; \({\hat{y}}_{\tau }^{base}\) is the predicted values of baseline model;and the H is the length of predicted results. Eight experiments were conducted using the same electrical parameter dataset, and the RMAE of the experimental results showed that the improved N-HiTS model outperformed the original model. A value of RMAE less than 1 indicates that the improved model is more accurate. The experimental results are shown in Figure 11.
In order to expand the number of fault samples, an improved N-HiTS model is used to make multiple predictions on the motor sequence, and the prediction results are retained as the diagnostic dataset. The operating condition information contained in time series of different lengths is also different. Moreover, fault types such as gas influence, insufficient liquid supply, and balance issues will also vary in severity over time. Therefore, the prediction model can capture the time information in the motor power and predict the changes in faults. This article presents three sets of time series with different lengths, namely 8, 12, and 16 time periods. Each time series consists of 8 sequences. For each type of fault, predict 15 samples and 30 samples respectively as the training set for fault diagnosis. Predict 15 samples and 36 samples as the test set for fault diagnosis. The extended motor power data set is examined and unreasonable predictions are removed.
To expand the number of fault samples and address the challenge of training with small datasets, an improved N-HiTS model to perform multiple predictions on the generated electrical parameter sequences, ultimately constructing a diagnostic dataset encompassing six different operating conditions. By conducting a comprehensive analysis of prediction results across time series of varying lengths, the fault characteristics of the equipment can be captured more thoroughly and accurately. This provides a solid data foundation for the subsequent development of an efficient and precise fault diagnosis model.
Classification experiment
Divide the motor power data after multiple predictions into two datasets with different sample sizes. The training set of the first set of electrical parameter samples includes six types of operating conditions, each with a sequence length of one period. The sample size is 90. The number of samples for each operating condition type in the test set is 15, and the total number of samples in the test set is 90. For the second set of electrical parameter samples, the training set includes 6 types of operating conditions, with 30 samples for each type. The sample size for each operating condition type in the test set is 36, with a total of 216 samples. For the expanded electrical parameter dataset, the OS-CNN model is used for classification experiments. And use CNN, FCN, MLP, Resnet, and Voice2Series20 as comparative models.
Among the six models, Adaptive Moment Estimation (Adam) was used here, and the initial learning rate was 0.001. The epoch of training was 1000 and the batch size is set to 32. Each set of electrical parameter datasets is run 8 times on all models, and the best diagnostic result is recorded on the test set. As shown in Figure 12,13, the training loss curves of OS-CNN, CNN, FCN, MLP, ResNet, and Voice2Series are presented. Among them, OS-CNN, FCN, MLP, and ResNet exhibit relatively smooth convergence with minimal oscillations in accuracy during the training process. Their loss functions converge rapidly, reaching a stable state at approximately 180, 220, 360, and 370 epochs, respectively. The corresponding model accuracies are 96%, 94%, 92%, and 91%.
For the model trained on the dataset of 216 samples, 85% of the samples were used as the training set, while the remaining 15% (32 samples) served as the validation set. Figure 14 presents the confusion matrices of six models evaluated on the second group of the pumping unit fault dataset. The results indicate that the proposed model can effectively identify fault characteristics based on electrical parameters.
As can be seen from the comparison of experimental results in Figure 13, the accuracy of OS-CNN in the first group classification reached 95%, which was significantly higher than that of other classification models. With the increase in the number of samples in the second group, the classification results of OS-CNN were 96% accurate. OS-CNN still maintains a high accuracy rate in different numbers of samples, and the model is relatively stable and reliable. Compared to traditional CNN and MLP, OS-CNN demonstrates superior capability in capturing multi-scale fault features. Conventional CNN rely on single-scale convolutions, which struggle with diverse spatiotemporal patterns and often miss critical information. MLP, while offering global modeling through fully connected layers, tend to lose hidden signal patterns due to their dependence on manual feature engineering. OS-CNN overcomes these limitations by using omni-scale convolutions to extract layered features, enhancing fault decoupling and noise robustness, and making it more adaptable to variations in electrical parameters across different oil wells. Therefore, OS-CNN and Improved N-HiTS can be combined to predict the long-term state of the pumping unit and determine the operating condition for the health inspection and maintenance plan of the pumping unit.
Discussion
Intelligent diagnosis technology has been paid more and more attention in modern oilfield industry. In oil field, the distribution of pumping units is usually sparse and the geographical environment is harsh. Remote and timely fault diagnosis is helpful to arrange maintenance plan reasonably and save cost. Compared with DCs, motor parameters are a more long-term and stable data used for fault diagnosis of pumping units. This paper presents a method of transforming typical DC into motor power curve, using improved N-HiTS model to expand the power data combined into time series, and using OS-CNN classification model for fault diagnosis. After experimental analysis and verification, the improved N-HiTS model outperforms the original model in prediction accuracy and can effectively expand fault state data. The improved N-HiTS combined with OS-CNN effectively solves the problem of insufficient fault samples in pumping units.
The method proposed in this paper can monitor the change of the motor power curve of the pumping unit remotely, and diagnose the fault type of the pumping unit in time when the motor power changes abnormally, so that the technician can arrange the maintenance plan for the pumping unit, which greatly reduces the time cost and economic loss. In addition, because some fault types are long-term slow changes, and the early stage will not have a significant impact on the production of the pumping unit, and the accessories that need to be purchased have a storage time limit. Therefore, the method in this paper can predict the electric power in the long term, determine the final fault type according to the predicted results, and reasonably arrange the maintenance and purchase of spare parts, so as to avoid wasting resources and save costs.
There are still some issues with the method described in this article. Due to the lack of actual electrical parameter data, the model is temporarily difficult to directly apply to actual sites. Therefore, in future research work, more measured electrical parameter data will be collected to improve the generalization ability of the model.
Data availability
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
References
Pinto, L. F. R., Tucci, H. N. P., Mummolo, G., Neto, G. C. d. O. & Facchini, F. Circular Economy Approach on Energy Cogeneration in Petroleum Refining. Energies 15, 1713, https://doi.org/10.3390/en15051713 (2022).
Algozhina, A. Monetary policy rule, exchange rate regime, and fiscal policy cyclicality in a developing oil economy. Energy Economics 112, 106126. https://doi.org/10.1016/j.eneco.2022.106126 (2022).
Hasanov, F., Aliyev, R., Tashkin, D. & Suleymanov, E. Oil Rents and Non-Oil Economic Growth in CIS Oil exporters. the role of financial development. SSRN Electronic Journal 82, 4161492, https://doi.org/10.2139/ssrn.4161492 (2022).
Ren, T., Sun, C., Sun, W. & Kang, X. The Research of Metering Well Yield Technology Based on Dynamometer Card. In Proceedings of the 5th International Conference on Mechanical Engineering, Materials and Energy (5th ICMEME2016), https://doi.org/10.2991/icmeme-16.2016.42 (2016).
Kun Li, W. Y., Xianwen Gao et al. Multiple fault diagnosis of down-hole conditions of sucker-rod pumping wells based on freeman chain code and DCA. Petroleum Science 10, 347–360, https://doi.org/10.1007/s12182-013-0283-4 (2013).
Wang, K. Fault diagnosis of rod-pumping unit based on production rules system. Petroleum Exploration and Development 37, 116–120 (2010).
Li, W., Vaziri, V., Aphale, S. S., Dong, S. & Wiercigroch, M. Energy saving by reducing motor rating of sucker-rod pump systems. Energy 228, 120618. https://doi.org/10.1016/j.energy.2021.120618 (2021).
Li, W., Vaziri, V., Aphale, S. S., Dong, S. & Wiercigroch, M. Dynamics and frequency and voltage control of downhole oil pumping system. Mechanical Systems and Signal Processing 139, 106562. https://doi.org/10.1016/j.ymssp.2019.106562 (2020).
Wilamowski, B. & Kaynak, O. Oil well diagnosis by sensing terminal characteristics of the induction motor. IEEE Transactions on Industrial Electronics 47, 1100–1107. https://doi.org/10.1109/41.873219 (2000).
Chen, D. et al. Fault diagnosis model of the variable torque pumping unit well based on the power-displacement diagram. IOP Conference Series: Earth and Environmental Science 300, 022030. https://doi.org/10.1088/1755-1315/300/2/022030 (2019).
chikkam, S. & Singh, S. Condition Monitoring and Fault Diagnosis of Induction Motor using DWT and ANN. Arabian Journal for Science and Engineering 48, 6237–6252, https://doi.org/10.1007/s13369-022-07294-3 (2023).
Kumar, A. et al. Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification (2023). arXiv: 2304.01992.
Saad, M. M., Rehmani, M. H. & O’Reilly, R. A Self-attention Guided Multi-scale Gradient GAN for Diversified X-ray Image Synthesis (2022). arXiv:2210.06334.
Challu, C. et al. N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting (2022). arXiv:2201.12886.
Tang, W. et al. Omni-Scale CNNs: a simple and effective kernel size configuration for time series classification. Learning,Learning. https://doi.org/10.48550/arXiv.2002.10061 (2020).
Sak, H., Senior, A. & Beaufays, F. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. Cornell University - arXiv,Cornell University - arXiv. https://doi.org/10.48550/arXiv.1402.1128 (2014).
Cho, K. et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), https://doi.org/10.3115/v1/d14-1179 (2014).
Oreshkin, B., Carpov, D., Chapados, N. & Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. Learning,Learning. https://doi.org/10.48550/arXiv.1905.10437 (2019).
Zhou, H. et al. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence. 11106–11115, https://doi.org/10.1609/aaai.v35i12.17325 (2022).
Yang, H., Tsai, Y. & Chen, P. Voice2Series: Reprogramming Acoustic Models for Time Series Classification. Cornell University - arXiv,Cornell University - arXiv. https://doi.org/10.48550/arXiv.2106.09296 (2021).
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 62173073), Undergraduate Teaching Reform Project of Liaoning Province (Grant No. SBKJGYZ-2021-06), the Basic Scientific Research Project of Liaoning Provincial Education Department (Grant No. LJKMZ20220618), the Scientific Research Fund of Liaoning Provincial Education Department(Grant No. JYTMS20230215) and Liaoning Provincial Education Scientific Planning Projects(Grant No. JG22DB602), respectively.
Author information
Authors and Affiliations
Contributions
Chunhua Yuan conceived the experiments, Zhupei Liao and Xiangyu Li conducted the experiment(s), Xiangyu Li, Zhupei Liao and Chunhua Yuan analyzed the results. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yuan, C., Liao, Z. & Li, X. A fault diagnosis model for pumping units based on small sample electric parameters. Sci Rep 15, 22997 (2025). https://doi.org/10.1038/s41598-025-07913-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-07913-7
















