Fault prediction method of large forging press based on a multi scale and multi model integrated method

Yuan, Chao; Zhang, Tianmin; Tang, Yuxi; Li, Huize; Zhang, Hao; Ling, Yunhan; Zhang, Nan; Pan, Lidong; Cao, Yeyingnan; Shi, Yiqing

doi:10.1038/s41598-025-16528-x

Download PDF

Article
Open access
Published: 21 August 2025

Fault prediction method of large forging press based on a multi scale and multi model integrated method

Chao Yuan^1,2,
Tianmin Zhang³,
Yuxi Tang⁴,
Huize Li⁵,
Hao Zhang¹,
Yunhan Ling¹,
Nan Zhang¹,
Lidong Pan¹,
Yeyingnan Cao¹ &
…
Yiqing Shi¹

Scientific Reports volume 15, Article number: 30675 (2025) Cite this article

1582 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Large forging presses are commonly employed in heavy-duty forging production lines within the mechanical manufacturing sector, playing a pivotal role in advancing forging technologies across industrial domains such as automotive manufacturing, shipbuilding, and aerospace. Typically, a large forging press consists of multiple components with intricate and highly coupled relationships. Consequently, the early identification and accurate early warning of fault signals in large forging presses become exceedingly challenging. Furthermore, due to the concealed, highly nonlinear, and stochastic characteristics of the relationships between components, traditional single-model state prediction methods often suffer from overfitting or underfitting, ultimately resulting in insufficient generalization capability and subpar prediction performance. To address these issues, this paper proposes a multi-scale Autoregressive-Support Vector Regression-Long Short-Term Memory (AR-SVR-LSTM) multi-model ensemble prediction approach. This method leverages the AR model, SVR model, and LSTM neural network model to predict the states of critical components in large forging presses. By capitalizing on the strengths of each model, a hybrid prediction model with weight constraints is introduced. Finally, this study validates the proposed ensemble model using monitoring signal data of brake oil pressure from an 80MN electric screw press production line at a certain enterprise as a case study. The research findings indicate that the accuracy of the proposed model significantly falls short of the industry standards for such models in industrial settings (where the Mean Absolute Error typically needs to be controlled within 5% and the Root Mean Square Error below 1%). Nevertheless, it demonstrates distinct advantages over other comparative models, highlighting its effectiveness and superiority.

Compositional modeling of solution gas–oil ratio (Rs): a comparative study of tree-based models, neural networks, and equations of state

Article Open access 11 March 2025

Implementation of MF block in CNN for advanced REB fault diagnosis

Article Open access 25 May 2025

Development of optimized ensemble machine learning-based prediction models for wire electrical discharge machining processes

Article Open access 07 October 2024

Introduction

As important manufacturing equipment for key metal components, large forging presses are a crucial part of high-end industrial setups and play a pivotal role in various fields such as machinery, automotive, shipping, aerospace, and more^1,2,3,4. According to incomplete statistics, there are currently over 10,000 screw presses and several thousand crank presses in use within the forging industry, including hot large forging presses, trimming presses, and correction presses, which are the absolute mainstay equipment for heavy forging production lines. The large forging presses mentioned in this study specifically refer to hot large forging presses, encompassing large crank presses and large screw presses. Large forging presses typically consist of numerous components and possess multiple structural levels, making their internal mechanisms intricate to ascertain⁵. The complex and strongly coupled relationships between their diverse components often exhibit characteristics like concealment, nonlinearity, and randomness, rendering traditional fault diagnosis and prediction techniques based on mechanistic models ineffective^6,7,8. However, with the advancement of big data and the industrial internet, there is an ever-increasing variety and volume of monitoring data for large forging presses. The value of this data has garnered widespread attention and research in the academic community, further propelling the progression of Prognostics Health Management (PHM) technology. PHM has emerged as a mainstream technology in the field of reliability today and has been successfully implemented in large forging presses in aviation, new energy vehicles, and other industries^9,10,11.

Fault diagnosis focuses on the classification and analysis of fault alarms, fault types, and their underlying causes, whereas fault prediction places greater emphasis on the early warning of potential faults in high-end industrial equipment and anticipates the degradation trajectory of the equipment’s future state^12,13,14. At present, the main approaches to fault prediction are categorized into three groups: prediction methods rooted in reliability theory^15,16,17, prediction methods based on data-driven methodologies^18,19,20, and prediction methods founded on physical models^21,22,23,24. The reliability-based prediction method primarily relies on historical data of system failures and the mechanisms of failure occurrence to forecast the timing and likelihood of future failures^{25,26,27,28,29}. The data-driven prediction method involves constructing a prediction model through the analysis of system historical data, thereby enabling the prediction of the system’s future state, encompassing time series prediction, artificial neural network prediction, filter prediction, and grey model prediction^30,31,32. The prediction method based on the physical model pertains to establishing a physical or mathematical model of system failure, utilizing information such as system operating status and environmental conditions to anticipate possible future failures of the systems^33,34,35.

Currently, with the escalating demands for equipment performance stability and product quality consistency, post-diagnosis alone can no longer suffice the needs of industrial sites^36,37. Enterprise managers aspire to receive early warnings when there are merely subtle abnormal symptoms in large forging presses, enabling them to precisely anticipate the degradation trajectory of the equipment in the future, thereby procuring spare parts in advance, devising maintenance and production scheduling plans, and minimizing production downtime. In existing research on fault prediction technology for large forging presses, a single model is frequently employed to predict fault outcomes. However, a single model often encounters issues like overfitting and underfitting when processing intricate large forging press signals, resulting in inaccurate prediction results. Consequently, a multi-model integrated prediction method is imperative to furnish more precise and reliable outcomes for fault prediction of large forging presses.

This study endeavours to explore fault prediction methodologies for large forging presses, aiming to enhance production safety, equipment lifespan, and product quality. In fault diagnosis, the spotlight is on comparing signal data during normal operation and during fault occurrences. This involves scrutinizing the statistical characteristics, patterns, and trends of signal data to discern between normal conditions and faults. In the realm of fault prediction and health management for large forging presses, the emphasis shifts to comprehending the signal data that precede failures. This necessitates contemplating the temporal patterns, trends, and correlations within the signal data to identify early warning signals or precursors of faults. Hence, when making state predictions, it is imperative to account for both linear trends and nonlinear relationships within the signal data, while also exhibiting robust generalization capabilities. Additionally, it is crucial to grasp the interrelationships within the time series signals. Based on the aforementioned rationale, this paper utilizes the Autoregressive (AR) model, Support Vector Regression (SVR) model, and Long Short-Term Memory (LSTM) neural network model for state prediction of key components in large forging presses. The AR model, being a classic linear prediction model, adeptly captures linear trends in signals. The SVR model tackles nonlinear relationships through the employment of kernel tricks, demonstrating robust generalization capabilities. The LSTM model, as a recurrent neural network architecture, is proficient in grasping long-term dependencies within time series signals. Embracing the concept of ensemble learning, this paper fuses the strengths of these three models and harnesses the Dwarf Mongoose Optimization (DMO) algorithm to perform weighted integration of the prediction results, thereby augmenting prediction performance. Finally, to validate the effectiveness of the proposed multi-model ensemble prediction technique in forecasting signals from critical components of large forging presses, this study utilized oil pressure signals from the production line of an 80MN electric screw press at a certain enterprise as a case example. The research results demonstrate that the proposed multi-scale AR-SVR-LSTM ensemble learning prediction model achieves Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics of less than 5% and 1%, respectively, meeting industrial standards and underscoring the model’s effectiveness in industrial applications. By comparing the accuracy of multiple different models, the study further verifies the effectiveness and superiority of the proposed model in fault prediction for large forging presses.

The organization of the remaining content in this study is as follows: In Sect. "Prediction models and optimization", a brief introduction to different machine learning models is provided, along with an explanation of the principles underlying the DMO algorithm employed. Section "A multi-scale AR-SVR-LSTM ensemble learning prediction model" elaborates on the proposed model ensemble strategy in detail. Section "Case study on fault prediction model for large forging presses" presents the validation of the ensemble model using real-world engineering data, demonstrating the advantages of the ensemble model by comparing its predictive performance with that of individual models. Section "Discussion" offers a comparative analysis and discussion of the experimental results obtained in this study. Finally, Sect. "Conclusion" summarizes the research conducted in this paper.

Prediction models and optimization

The time series model

The autoregressive mode

Currently, time series models such as AR, Autoregressive Moving Average (ARMA), and Autoregressive Integrated Moving Average (ARIMA) are widely used to predict failures in high-end industrial equipment. Compared to ARMA and ARIMA models, the AR model offers simplicity, convenience, and parameter-free modelling. To determine the order of the AR model, three commonly used methods are the Discrimination Regression Transfer Function Criterion, the Final Prediction Error Criterion, and the Akaike Information Criterion (AIC). When these three methods are used to estimate the order of the AR model, there is no significant difference in the results³⁸.

The support vector regression model

The core idea of SVR is to maximize the model’s margin within an acceptable range of errors, thereby enhancing its generalization ability while ensuring a certain level of prediction accuracy. The $\:\epsilon\:$-insensitive loss function is a core component, as illustrated in Fig. 1. The basic idea of the $\:\epsilon\:$-insensitive loss function is that there exists a threshold $\:\epsilon\:$, and if the absolute difference between the predicted value and the true value does not exceed this threshold, the prediction is considered to have no loss.

Long short-term memory neural network model

LSTM is a special type of recurrent neural network primarily used for processing and predicting time series data. The purpose of LSTM is to address the issue of gradient vanishing or gradient explosion that traditional RNNs encounter when dealing with long sequence data^39,40. In LSTM, there are two crucial concepts: gates and cell states. Gates provide a means to transmit information on demand. LSTM is composed of structures called gates, which can add or remove information from the cell states.

Multi-model integration and fusion

Ensemble learning

Ensemble learning integrates multiple models using certain strategies to improve decision accuracy through group decision-making^41,42. By combining multiple learners, the generalization ability of an ensemble is often much stronger than that of a single learner contained in it. As shown in Fig. 2, the basic workflow of ensemble learning is to first train a group of individual learners and then combine them through some strategies^43,44.

Dwarf mongoose optimization

DMO is a heuristic algorithm inspired by the survival and hunting behavior of the dwarf mongooses⁴⁵. The entire meerkat colony is divided into the Alpha group, the Babysitter group, and the Scout group. Among them, after foraging, the Alpha group has identified the following candidate locations:

$$\:{X}_{i+1}={X}_{i}+\phi\:\times\:peep$$

(1)

where $\:\phi\:$ is a uniformly distributed random value, ranging from 0 to 1. $\:{X}_{i+1}$ denotes candidate food locations, $\:peep$ is set to 2 in this study.

The new position of an individual after babysitter exchange is given by the following formula:

$$\:{x}_{i,j}={l}_{j}+rand\times\:\left({u}_{j}-{l}_{j}\right)$$

(2)

where $\:rand$ is a random value, ranging from 0 to 1; $\:{u}_{j}$ and $\:{l}_{j}$ are the limitations of the search domain.

Additionally, during the scouting phase, the new positions of the population are described by the following formula:

$$\:{X}_{i+1}\left\{\begin{array}{c}{X}_{i}-CF\:\text{*}\:phi\:\text{*}\:rand\:\text{*}\:\left[{X}_{i}-\overrightarrow{M}\right]if\;{\phi}_{i+1}>{\phi}_{i}\\\:{X}_{i}+CF\:\text{*}\:phi\:\text{*}\:rand\:\text{*}\left[{X}_{i}-\overrightarrow{M}\right]\end{array}\right.$$

(3)

where $\:rand$ is a random value, ranging from 0 to 1; $\:CF\:$is the parameter to adjust the movement willingness of the mongoose groups, calculated as $\:CF={\left(1-\frac{iter}{Ma{x}_{iter}}\right)}^{\left(2\frac{iter}{Ma{x}_{iter}}\right)}$. $\:\overrightarrow{M}\:$is the parameter to decide the movement of the mongoose to the new sleep mound, calculated as $\:\overrightarrow{M}={\sum}_{i=1}^{n}\frac{{X}_{i}\times\:s{m}_{i}}{{X}_{i}}$.

A multi-scale AR-SVR-LSTM ensemble learning prediction model

Multi-model ensemble strategy

The displacement signal, air pressure signal and hydraulic signal, generated by large forging presses during actual production processes are impact force signal. These key parameters exhibit sudden changes, continuous fluctuations, or oscillations in the historical production process curves, and there are pronounced nonlinear characteristics. At the same time, this study suggests that during long-term multiple forging operations, the initial working conditions and the operating status of various components will indicates a long-term impact on subsequent forging operations, which has a long-term dependence relationship and is reflected in the collected signals. These characteristics lead to the limitations of single model prediction processing.

Therefore, this study proposes a three-stage framework termed “Multi-scale Decomposition - Heterogeneous Model Collaboration - Dynamic Weight Fusion” for constructing an ensemble model. The key parameters of large forging presses in actual production processes are closely related to their health monitoring, such as signals measured by sensors including oil pressure information, vibration information, or temperature signals. These signals reflect the operational status of the equipment from different dimensions and contain critical information for system health monitoring, which is of great significance for enhancing the accuracy, earliness, and interpretability of fault prediction. Predicting the key parameters of large forging presses during actual production processes using the ensemble model and further comparing them with the equipment’s operational parameters under normal conditions represent an important approach for fault prediction.

In the process of constructing the ensemble model, multi-scale decomposition technology is first employed to perform scale-differentiated processing on the original signals, thereby further enhancing the model’s noise resistance capability. Subsequently, models are matched with signal components based on their respective characteristics, avoiding a one-size-fits-all modeling approach. Finally, the DMO is utilized to dynamically adjust the weights of each model based on historical prediction errors, enabling the redistribution of weights among different models. In engineering applications, if manual parameter tuning or grid search is adopted, it necessitates traversing a vast number of weight combinations, resulting in high computational costs and difficulties in identifying the global optimal solution. Conversely, employing gradient descent methods may encounter issues such as inappropriate iteration directions, step sizes, and starting points. Therefore, this study selects the DMO algorithm for weight optimization, aiming to improve the model’s generalization ability and enable it to adaptively respond to changes in operating conditions.

In order to make full use of the advantages of different prediction models, the idea of ensemble learning is adopted to integrate these models to capture more key features in the data. The AR model excels in linear modeling, whereas the SVR model demonstrates an advantage in nonlinear modeling. LSTM is capable of effectively capturing long-term dependencies in time series data and addressing the gradient decay issue present in traditional RNNs. Therefore, in this section, the three methods are integrated into an AR-SVR-LSTM model, which is expected to improve the prediction accuracy and efficiency. The basic process of the hybrid prediction model of AR-SVR-LSTM is shown in Fig. 3.

The steps are as follows:

(1) Collect monitoring signals of key components of the forging press, and construct a signal prediction training dataset and a testing dataset.

(2) First, wavelet transform is applied to transform the original signal into sub-signals at different frequency scales, comprising one approximation component and multiple detail components. The approximation component is used to describe the low-frequency trend, while the detail components capture high-frequency fluctuations. Subsequently, Savitzky-Golay (S-G) filtering is employed to process these multi-scale components, smoothing out noise through local polynomial regression while preserving the edge features of the signal.

(3) Calculate the AR model coefficients of the monitoring signal data of the key components of the forging press, and construct an AR model. Use the test set data as the model input, and obtain the predicted value. After wavelet reconstruction, the reconstructed value $\:{y}_{1}$ is used as the AR model output.

(4) For the parameters of the SVR model, we use grid search to select and optimize them. After constructing the SVR prediction model, we predict the wavelet components decomposed from the signal after S-G filtering, and then perform inverse wavelet transform on the predicted values to complete the wavelet reconstruction and obtain the final prediction result $\:{y}_{2}$ of the SVR model.

(5) Construct an LSTM network model, use each scale component as the model input to predict, and then perform wavelet reconstruction on the predicted values, and output the value $\:{y}_{3}$.

(6) Establish a linear combination of $\:{w}_{1}{y}_{1}+{w}_{2}{y}_{2}+{w}_{3}{y}_{3}$, and then use the DMO algorithm to determine the weights of different models in the hybrid model.

Fitness function of learning prediction model

The aforementioned section presents the establishment process of the proposed AR-SVR-LSTM model, which is intended for predicting critical information of large forging presses. As can be observed from the construction process of the ensemble model, different individual models are assigned distinct non - negative weights within the fitness function. Consequently, the value of the fitness function is non - negative. The deviation between the prediction value and the true value is expressed as Eq. (4):

$$\:{z}_{i}={w}_{1}{y}_{1}\left(i\right)+{w}_{2}{y}_{2}\left(i\right)+{w}_{3}{y}_{3}\left(i\right)-{x}_{i},\quad i=\text{1,2},\dots\:,N$$

(4)

Based on this, a fitness function is proposed for this optimization model, which is expressed as the sum of the squares of the predicted sequence and the actual sequence, as shown in Eq. (5).

$$\:f={min}\sum_{i=1}^{N}{\left({w}_{1}{y}_{1}\left(i\right)+{w}_{2}{y}_{2}\left(i\right)+{w}_{3}{y}_{3}\left(i\right)-{x}_{i}\right)}^{2} \\ \quad s.t.{w}_{1}+{w}_{2}+{w}_{3}=1, \quad {w}_{1},{w}_{2},{w}_{3}\ge\:0$$

(5)

The weight corresponding to each model’s predicted value, obtained by solving the above formula, is substituted into the multi-scale AR-SVR-LSTM model for analysis. Additionally, this study employs MAE and RMSE to evaluate the performance of the prediction model.

The formulas for MAE and RMSE are shown in Eqs. (6) and (7).

$$\:\text{M}\text{A}\text{E}=\sqrt{\frac{1}{\text{N}}\sum_{\text{n}=1}^{\text{N}}\left|\frac{{\text{y}}_{\text{i}}-{\widehat{\text{y}}}_{\text{i}}}{{\text{y}}_{\text{i}}}\right|}, \quad\left(\text{i}=\text{1,2},\dots\:,\text{N}\right)$$

(6)

$$\:RMSE=\sqrt{\frac{1}{N} {\sum_{n=1}^{N}\left(\frac{{y}_{i}-{\widehat{y}}_{i}}{{y}_{i}}\right)}^{2}}, \quad \left(i=\text{1,2},\dots\:,N\right)$$

(7)

where $\:{y}_{i}$ is the true value, which represents the sampling point of the monitoring signal; $\:{\widehat{y}}_{i}$ is the predicted value output by the regression model.

Case study on fault prediction model for large forging presses

To validate the effectiveness and superiority of the proposed prediction model, this experiment was conducted based on data obtained from the 80MN electric screw press production line in an enterprise. The research encompasses the analysis and processing of historical data, the establishment of an ensemble prediction model, and finally, validation through integration with real-time operational data. Prior to conducting experiments, time-domain and frequency-domain analyses are performed on the collected oil pressure signal data from large forging presses. Different data characteristics correspond to three types of fault features: a drop in oil pressure may indicate a leakage fault, high-frequency pressure vibrations may suggest cavitation faults, and low-frequency pressure vibrations could point to blockage faults. Therefore, the established signal prediction model needs to effectively capture features such as temporal trends, high-frequency fluctuations, and low-frequency vibrations in order to achieve accurate and early fault prediction.

Data preprocessing

Considering the complexity and uncertainty of fault data of large forging presses, methods for data preprocessing and feature extraction have been studied to extract useful feature information for fault prediction. Then, how to choose appropriate prediction models and parameter optimization methods to improve the accuracy and generalization ability of prediction models are discussed.

The experiment takes the oil pressure signal of the brake of a large forging press as an example to train and validate the proposed multi-scale AR-SVR-LSTM model, and analyze and verify the prediction effect. Due to the large amount of noise in the monitoring signal of the large forging press, if the original signal is directly modeled and predicted, it will be subject to various noise disturbances, resulting in poor prediction performance. Therefore, it is necessary to perform multi-scale smoothing on the oil pressure signal of the brake of the large forging press. This study proposes two processing strategies:

(1) First, the signal is subjected to wavelet decomposition and multi-scale S-G filtering, as shown in Fig. 4; Then, the signals at each scale are subjected to multi-step prediction using a time series model. Finally, the predicted values are subjected to wavelet reconstruction to obtain the overall predicted sequence of the signal.

(2) After multi-scale smoothing preprocessing, the reconstructed signal is directly subjected to multi-step prediction to obtain the final prediction sequence, as shown in Fig. 5. This study will compare the prediction performance of these two operations and then select the optimal data preprocessing strategy.

This study combines the signals from multiple strikes into a longer time series to increase the amount of training data and improve prediction performance. In this study, the monitoring signal data of the brake oil pressure from a large forging press, which was continuously struck 10 times, was combined to form a time series of 80,000 data points. The dataset was subsequently divided into a training set (80%) and a test set (20%), without performing any shuffling operation. The length of the input time series for all models in this study is set to 2000 for training and validation. All models in this study use recursive methods for multi-step prediction, with a prediction sequence length of 200 time steps. This sequence length is used for comparative analysis of prediction performance.

Predictions of individual models

Prediction results of the AR model

For AR model, the AR model order needs to be determined first. AIC criterion is an indicator to measure the goodness of fit of statistical models, which aims to find a balance between model goodness of fit and model complexity to solve the problem of model selection. AIC avoids overfitting by penalizing the number of model parameters, thereby selecting a model with better predictive performance. Its calculation formula is as follows:

$$\:AIC=-2\:\text{*}\:ln\left(L\right)+2\:\text{*}\:k$$

(8)

where $\:L$ is the maximum likelihood value of the model; $\:k$ is the number of parameters in the model. In time series analysis, AIC is often used to determine the order of an AR model. By calculating the AIC values for different model orders, the model with the smallest AIC value is selected as the optimal model. Figure 6 shows that when the order is 33, the AIC value is the smallest, so the AR model order is determined to be 33. In addition to selecting the model order, it is also necessary to solve for the AR model coefficients $\:{\alpha\:}_{j}\:(j=\text{1,2},\dots\:,p)$.

Subsequently, using the constructed AR prediction model, we can predict the scale components decomposed by wavelet. As shown in Fig. 7, the AR model does not perform well when predicting low-scale detail components, with large deviations. However, it has strong predictive ability for detail components and approximate components of scales 3 and 4, almost perfectly approximating the true values.

The prediction values at each scale are subjected to inverse wavelet transform, that is, wavelet reconstruction, to obtain the final prediction results with the AR model, as shown in Fig. 8.

To compare the prediction accuracy of the following two cases for the brake oil pressure signal: case 1: first perform wavelet decomposition and then separately predict each scale signal; case2: use an AR model to predict the preprocessed signal directly. Here, we use the AR model to predict the multi-scale preprocessed signal, and to distinguish it from the previous prediction method, we label it as “the prediction results of ‘wavelet reconstruction + AR model’”. The results are shown in Fig. 9. Further calculations were conducted to determine the accuracy metrics of different models: The Multi-scale AR model (Fig. 8) exhibits MAE and RMSE values of 5.32% and 0.6322%, respectively. In comparison, the AR model (Fig. 9) demonstrates MAE and RMSE values of 10.45% and 1.2624%, respectively. Comparing the deviation between each predicted value and the true value in Figs. 8 and 9, it shows that the prediction error of case 1 is smaller.

Prediction results of the SVR model

The SVR model is applied to predict the wavelet decomposition and S-G filtered components at different scales, as shown in Fig. 10. The comparison between the predicted values and the true values shows that the SVR model is superior to the AR model in predicting the wavelet detail components at the first two scales, indicating that there are differences in the linear and nonlinear prediction capabilities of the AR model and the SVR model in processing signal data.

The predicted values at different scales are reconstructed, as shown in Fig. 11. It shows that the SVR model has a good prediction effect.

When comparing the prediction results of the wavelet reconstructed signal using the SVR model directly, as shown in Fig. 12, it is evident that the multi-scale signal prediction accuracy is higher at the peak and trough positions. Because the predicted values in Fig. 12 exhibit significant oscillations at the peak and trough positions, whereas in Fig. 11, the predicted values oscillate less frequently and demonstrate a higher degree of fit with the actual values. The accuracy metrics of different models were calculated: The multi-scale SVR model (Fig. 11) exhibits MAE and RMSE values of 7.43% and 0.7135%, respectively. Correspondingly, the SVR model (Fig. 12) demonstrates MAE and RMSE values of 4.99% and 0.5016%, respectively, which further substantiates the aforementioned conclusion.

Prediction results of the LSTM model

Similarly, time-domain signals of a large forging press were predicted using LSTM networks. The LSTM parameters were configured as follows: the input dimension was set to 1; the time step length was defined as 2000; the number of hidden units in each layer was set to 64, with a two-layer LSTM network architecture employed; the dropout rate was fixed at 0.5; the batch size was set to 64; the learning rate was 0.001; the number of iterations was 100; and the squared loss function was selected as the loss criterion. Upon completion of model training, the validation set was utilized for model prediction. Preprocessing of the brake oil pressure signal from the large forging press was conducted using wavelet decomposition and S-G filtering to obtain the preprocessed scale components. The trained LSTM network model was then applied for multi-step prediction. The comparative results between the predicted sequence and the actual sampled sequence are illustrated in Fig. 13. As depicted in the figure, the LSTM model demonstrates superior prediction performance for both detailed and approximate components across different scales compared to the two preceding models.

Similarly, the predicted values at different scales have been reconstructed, as depicted in Fig. 14. Further calculations were carried out to evaluate the accuracy metrics of different models: The Multi-scale LSTM model (Fig. 14) exhibits MAE and RMSE values of 3.15% and 0.3265%, respectively. Correspondingly, the LSTM model (Fig. 15) demonstrates MAE and RMSE values of 5.22% and 0.6143%, respectively. When comparing the prediction results of the wavelet-reconstructed signal using the LSTM network model directly, as shown in Fig. 15, it is evident that the multi-scale LSTM model has higher signal prediction accuracy.

In order to show the differences in prediction performance between the AR model, SVR model, and LSTM model, a detailed comparative analysis is conducted between the actual values and the predicted values of the AR model, SVR model, and LSTM model, as shown in Fig. 16. After removing noise from the oil pressure signal of the large forging pressure brake, important features within the signal become apparent, which is more conducive to predicting signal changes. All prediction models can predict signal changes well, but we can also see that the peak part of the signal has relatively large errors. In addition, the prediction performance of individual models such as LSTM model and SVR model is better than that of AR model, indicating that although AR model can effectively capture the linear trend in the signal, it also has limitations.

Prediction of the ensemble model

Results of the ensemble model

To fully capitalize on the strengths of each prediction model, it is essential to integrate the prediction results obtained from the AR model, SVR model, and LSTM model. Subsequently, the combined weights of these models should be optimized using the DMO algorithm. Notably, the parameter settings for the LSTM model in this combined approach remain identical to those employed when the LSTM model is utilized independently for prediction.

The parameter settings for the DMO algorithm are as follows: the population size is 50; the initial value of the inertia weight is set to w = 0.9 and the termination weight value is $\:w=0.15$, indicating that the inertia weight will gradually decrease from 0.9 to 0.15 during the iterative process of the algorithm; the maximum number of iterations is set to 100, which means that the algorithm will perform 100 updates. Both the acceleration coefficient for the individual optimal solution and the acceleration coefficient for the global optimal solution are set to 1.49445. During the speed update process, the attraction of the individual and global optimal solutions is equal.

To verify the superiority of the DMO algorithm, its performance is compared with that of the Particle Swarm Optimization (PSO) algorithm in optimizing the objective function. The parameter settings of the PSO algorithm are as follows: the acceleration coefficients are $\:{c}_{1}={c}_{2}=1.49445$; the number of particles is 50; the initial weight is $\:w=0.9$; the termination weight $\:w=0.15$; the iteration number is 100. The fitness curve is shown in Fig. 17.

By comparing Figs. 17(a) and (b), it is evident that the DMO algorithm has a faster convergence rate than the PSO algorithm when achieving the same results. In the DMO algorithm, convergence can be achieved in only 13 iterations, while the PSO algorithm requires 26 iterations to achieve convergence. This result further demonstrates the superiority of the DMO algorithm in optimizing weights, providing a strong guarantee for achieving more accurate prediction results.

According to the optimization results, the weight of the AR model in the integrated model is $\:{w}_{1}=0.2245$, the weight of the SVR model is $\:{w}_{2}=0.3624$, and the weight of the LSTM model is $\:{w}_{3}=0.4131$. The prediction results of the multi-scale AR-SVR-LSTM model are obtained, as shown in Fig. 18. From the figure, it is observed that the prediction performance of the proposed model is good, especially in the peak part of the signal, where the prediction deviation has been reduced. The prediction performance of the proposed method has overcome some limitations of a single prediction model. The above results further confirm the superiority of using a multi-model integration strategy in prediction, providing a solution for complex signal prediction problems.

Evaluation of prediction results of the ensemble model

In order to better evaluate and quantify the prediction performance of each model, this study uses two indicators, MAE and RMSE, for comparison. According to Eqs. (6) and (7), the average absolute error and root mean square error of each model are calculated, reflecting the performance and accuracy of the prediction model. The results are shown in Table 1. In addition, Table 1 also documents the computational time required for establishing different models. The computer used in this study is Intel (R) Core (TM) i9-9900 K CPU @ 3.60 GHz, using a 64-bit operating system. By comparing these indicators, we can better understand the strengths and weaknesses of each model in terms of prediction performance, which provides a basis for selecting the appropriate model.

Table 1 The evaluation of prediction errors.

Full size table

The data in Table 1 shows that: (1) the LSTM network model performs the best in single model prediction; (2) the multi-scale model prediction can effectively improve prediction performance; (3) comprehensive comparison of prediction errors of various models shows that the multi-scale AR-SVR-LSTM model proposed in this paper has good performance.

Discussion

By conducting experiments in Sect. "Case study on fault prediction model for large forging presses" centered around issues such as “model selection - data preprocessing - comparative validation - parameter optimization,” the following conclusions can be drawn:

(1) From the comparison of model prediction results under two data processing strategies in Sect. "Prediction results of the AR model", it can be observed that decomposing complex problems into multiple sub-problems through multi-scale decomposition effectively enhances the prediction accuracy of the model for peaks and troughs, thereby better achieving the goals of improving prediction signal quality and facilitating fault diagnosis.

(2) By comparing the prediction results of single-model and ensemble models for oil pressure signals, it is evident that single models have certain limitations in predicting key system information: the AR model fails to capture nonlinear characteristics, which may lead to inaccurate predictions of cavitation faults in large forging presses; the SVR model lacks sufficient modeling of long-term dependencies, potentially preventing it from detecting slow leaks; although the LSTM model can learn long-term dependencies, it exhibits lower sensitivity to high-frequency details compared to the SVR model. In contrast, the proposed AR-SVR-LSTM model can integrate multi-scale features, achieving collaborative optimization of “high-frequency details + long-term dependencies + trend prediction.”

(3) After DMO optimization, the weights of different individual surrogate models in the proposed AR-SVR-LSTM model are adjusted, enhancing the model’s prediction accuracy for key information. Additionally, comparing the number of iterations between PSO and DMO optimizations reveals that DMO optimization offers higher efficiency.

(4) Since signals at different scales correspond to distinct fault characteristics (high-frequency scales relate to cavitation faults, medium-frequency scales correspond to normal operating pressures, and low-frequency scales indicate leakage faults), models capable of comprehensively predicting signals across different scales possess greater advantages. Therefore, based on the model accuracy metrics and fitting results, the proposed AR-SVR-LSTM model successfully integrates the strengths of individual models, enabling effective prediction of information across different scales and meeting industrial standards.

Conclusion

This study addresses the challenges of significant nonlinearity and poor stability in signals generated by key components of large forging presses by proposing a DMO algorithm-assisted AR-SVR-LSTM prediction model for predicting critical signals. By utilizing this model to predict key signals and further conducting feature analysis on them, along with comparing these features with those under normal operating conditions, fault prediction for large forging presses can be achieved. Taking an 80MN electric screw press from a certain enterprise adopted in this study as an example, important fault predictions can be made based on the monitored oil pressure signals. Analysis of the study’s results reveals that the proposed prediction model combines the linear modeling advantages of the AR model, the nonlinear modeling strengths of the SVR model, and the LSTM model’s capability to address gradient decay issues in recurrent neural networks. In the case study, the AR-SVR-LSTM model demonstrates its ability to accurately predict signals at different scales, enhancing the reliability and accuracy of fault prediction. Furthermore, the proposed AR-SVR-LSTM model employs the DMO algorithm to allocate weights among different individual models within the hybrid model, further improving the model’s adaptability to complex operating conditions and enabling precise predictions of signals with complex characteristics. In summary, the proposed multi-scale AR-SVR-LSTM ensemble learning prediction model represents a significant advancement in fault prediction technology for large forging presses. By providing timely and accurate warnings, the model supports proactive maintenance strategies, reduces downtime, and enhances overall operational efficiency. Future work will focus on refining the model’s parameters, exploring additional data sources, and extending its applicability to other complex industrial systems. In addition, future work will endeavor to explore more effective model parameter optimization strategies to enhance adaptability to a variety of complex engineering structures. Developing more efficient and less time-consuming model establishment methods also remains a top priority.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

Guo, Y., Meng, D., Pan, L., Zhang, J. & Yang, S. Reliability evaluation of precision hot extrusion production line based on fuzzy analysis. Structures 64, 106553 (2024).
Article Google Scholar
Dalbosco, M., da Silva Lopes, G., Schmitt, P. D., Pinotti, L. & Boing, D. Improving fatigue life of cold forging dies by finite element analysis: A case study. J. Manuf. Process. 64, 349–355 (2021).
Article Google Scholar
Niu, X. et al. Defect sensitivity and fatigue design: deterministic and probabilistic aspects in additively manufactured metallic materials. Prog. Mater. Sci. 144, 101290 (2024).
Article CAS Google Scholar
Zhu, P., Yang, S., Gao, Z., Liu, J. & Zhou, L. Optimization of hot deformation parameters for multi-directional forging of Ti65 alloy based on the integration of processing maps and finite element method. J. Mater. Res. Technol. 29, 5271–5281 (2024).
Article ADS CAS Google Scholar
Fernández, D. S., Wynne, B. P., Crawforth, P., Fox, K. & Jackson, M. The effect of forging texture and machining parameters on the fatigue performance of titanium alloy disc components. Int. J. Fatigue. 142, 105949 (2021).
Article Google Scholar
Zheng, Y., Zhang, Y. & Lin, J. System reliability analysis for independent and nonidentical components based on survival signature. Probab. Eng. Mech. 73, 103466 (2023).
Article Google Scholar
Huang, J. & Ai, Q. Key vulnerability parameters for steel pipe pile-supported wharves considering the uncertainties in structural design. Int. J. Ocean. Syst. Manage. 2 (1), 35–51 (2025).
Article MathSciNet Google Scholar
Yang, S. & Chen, Y. Modelling and analysis of offshore wind turbine gearbox under multi-field coupling. Int. J. Ocean. Syst. Manage. 2 (1), 52–66 (2025).
Article MathSciNet Google Scholar
Nguyen, K. T. P., Medjaher, K. & Tran, D. T. A review of artificial intelligence methods for engineering prognostics and health management with implementation guidelines. Artif. Intell. Rev. 56 (4), 3659–3709 (2023).
Article Google Scholar
Huang, C., Bu, S., Lee, H. H., Chan, K. W. & Yung, W. K. Prognostics and health management for induction machines: a comprehensive review. J. Intell. Manuf. 35 (3), 937–962 (2024).
Article Google Scholar
Weikun, D. E. N. G., Nguyen, K. T., Medjaher, K., Christian, G. O. G. U. & Morio, J. Physics-informed machine learning in prognostics and health management: state of the Art and challenges. Appl. Math. Model. 124, 325–352 (2023).
Article Google Scholar
Meng, D. et al. Fault analysis of wind power rolling bearing based on EMD feature extraction. CMES-Computer Model. Eng. Sci. 130 (1), 543–558 (2021).
Google Scholar
Zhang, H., Zhu, P., Liu, Z., Qi, S. & Zhu, Y. Research on prediction method of mechanical properties of open-hole laminated plain woven CFRP composites considering drilling-induced delamination damage. Mech. Adv. Mater. Struct. 28 (24), 2515–2530 (2021).
Article CAS Google Scholar
Xie, Y. H. et al. Burst speed prediction and reliability assessment of turbine disks: experiments and probabilistic aspects. Eng. Fail. Anal. 145, 107053 (2023).
Article Google Scholar
Li, X. Q., Song, L. K. & Bai, G. C. Recent advances in reliability analysis of aeroengine rotor system: a review. Int. J. Struct. Integr. 13 (1), 1–29 (2022).
Article Google Scholar
Teng, D., Feng, Y. W., Chen, J. Y. & Lu, C. Structural dynamic reliability analysis: review and prospects. Int. J. Struct. Integr. 13 (5), 753–783 (2022).
Article Google Scholar
Dong, Z., Sheng, Z., Zhao, Y. & Zhi, P. Robust optimization design method for structural reliability based on active-learning MPA-BP neural network. Int. J. Struct. Integr. 14 (2), 248–266 (2023).
Article Google Scholar
Men, Z., Hu, C., Li, Y. H. & Bai, X. A hybrid intelligent gearbox fault diagnosis method based on EWCEEMD and Whale optimization algorithm-optimized SVM. Int. J. Struct. Integr. 14 (2), 322–336 (2023).
Article Google Scholar
Zhu, S. P., Niu, X., Keshtegar, B., Luo, C. & Bagheri, M. Machine learning-based probabilistic fatigue assessment of turbine bladed disks under multisource uncertainties. Int. J. Struct. Integr. 14 (6), 1000–1024 (2023).
Article Google Scholar
Rasul, A., Karuppanan, S., Perumal, V., Ovinis, M. & Iqbal, M. An artificial neural network model for determining stress concentration factors for fatigue design of tubular T-joint under compressive loads. Int. J. Struct. Integr. 15 (4), 633–652 (2024).
Article Google Scholar
Liu, X. & Ma, M. Cumulative fatigue damage theories for metals: review and prospects. Int. J. Struct. Integr. 14 (5), 629–662 (2023).
Article Google Scholar
Deng, Q. Y., Zhu, S. P., He, J. C., Li, X. K. & Carpinteri, A. Multiaxial fatigue under variable amplitude loadings: review and solutions. Int. J. Struct. Integr. 13 (3), 349–393 (2022).
Article Google Scholar
Liu, X., Wu, Q., Su, S. & Wang, Y. Evaluation and prediction of material fatigue characteristics under impact loads: review and prospects. Int. J. Struct. Integr. 13 (2), 251–277 (2022).
Article Google Scholar
Zhu, S. P. et al. Physics-informed machine learning and its structural integrity applications: state of the Art. Philosophical Trans. Royal Soc. A. 381 (2260), 20220406 (2023).
Article Google Scholar
Meng, D. et al. Kriging-assisted hybrid reliability design and optimization of offshore wind turbine support structure based on a portfolio allocation strategy. Ocean Eng. 295, 116842 (2024).
Article Google Scholar
Yang, S. et al. MECSBO: Multi-strategy enhanced circulatory system based optimisation algorithm for global optimisation and reliability-based design optimisation problems. IET Collaborative Intell. Manuf., 6(2), 1-23. (2024).
Meng, D. et al. Intelligent-inspired framework for fatigue reliability evaluation of offshore wind turbine support structures under hybrid uncertainty. Ocean Eng. 307, 118213 (2024).
Article Google Scholar
Meng, D., Yang, S., De Jesus, A. M. P. & Zhu, S. A novel Kriging-model-assisted reliability-based multidisciplinary design optimization strategy and its application in the offshore wind turbine tower. Renew. Energy. 203, 407–420 (2023).
Article Google Scholar
Meng, D., Yang, S., De Jesus, A. M. P., Fazeres-Ferradosa, T. & Zhu, S. A novel hybrid adaptive kriging and water cycle algorithm for reliability-based design and optimization strategy: application in offshore wind turbine monopole. Comput. Methods Appl. Mech. Eng. 412, 116083 (2023).
Article ADS Google Scholar
Yang, S., Meng, D., Wang, H. & Yang, C. A novel learning function for adaptive surrogate-model-based reliability evaluation. Philosophical Trans. Royal Soc. A: Math. Phys. Eng. Sci. 382 (2264), 20220395 (2024).
Article MathSciNet Google Scholar
Yang, S. et al. A novel hybrid adaptive framework for support vector machine-based reliability analysis: A comparative study. Structures 58, 105665 (2023).
Article Google Scholar
Correia, J. A., Haselibozchaloee, D. & Zhu, S. P. A review on fatigue design of offshore structures. Int. J. Ocean. Syst. Manage. 2 (1), 1–18 (2025).
Article Google Scholar
Li, Z., Zhou, J., Nassif, H., Coit, D. & Bae, J. Fusing physics-inferred information from stochastic model with machine learning approaches for degradation prediction. Reliab. Eng. Syst. Saf. 232, 109078 (2023).
Article Google Scholar
Liu, X., Liu, J., Wang, H. & Yang, X. Prediction and evaluation of fatigue life considering material parameters distribution characteristic. Int. J. Struct. Integr. 13 (2), 309–326 (2022).
Article Google Scholar
Kebir, T., Correia, J., Benguediab, M. & De Jesus, A. M. Numerical study of fatigue damage under random loading using rainflow cycle counting. Int. J. Struct. Integr. 12 (1), 149–162 (2020).
Article Google Scholar
Feng, Y. W., Chen, J. Y., Lu, C. & Zhu, S. P. Civil aircraft spare parts prediction and configuration management techniques: review and prospect. Adv. Mech. Eng. 13 (6), 16878140211026173 (2021).
Article Google Scholar
Song, L. K., Li, X. Q., Zhu, S. P. & Choy, Y. S. Cascade ensemble learning for multi-level reliability evaluation. Aerosp. Sci. Technol. 148, 109101 (2024).
Article Google Scholar
Liu, Y., Qiao, N., Zhao, C., Zhuang, J. & Tian, G. Using the AR–SVR–CPSO hybrid model to forecast vibration signals in a high-speed train transmission system. Proc. Institution Mech. Eng. Part. F: J. Rail Rapid Transit. 233 (7), 701–714 (2019).
Article Google Scholar
Van Houdt, G., Mosquera, C. & Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 53, 5929–5955 (2020).
Article Google Scholar
Landi, F., Baraldi, L., Cornia, M. & Cucchiara, R. Working memory connections for LSTM. Neural Netw. 144, 334–341 (2021).
Article PubMed Google Scholar
Arun, K. S. Transactions on pattern analysis and machine intelligence. *IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9*(5), 698–770. (1987).
Schapire, R. E. The strength of weak learnability. Mach. Learn. 5, 197–227 (1990).
Article Google Scholar
Chauhan, H. & Chauhan, A. Implementation of decision tree algorithm c4.5. Int. J. Sci. Res. Publications. 3 (10), 1–3 (2013).
CAS Google Scholar
Li, J., Cheng, J. H., Shi, J. Y. & Huang, F. Brief introduction of back propagation (BP) neural network algorithm and its improvement. In Advances in Computer Science and Information Engineering. Vol. 2, (eds Jin, D. & Lin, S.)553–558 (Springer, Berlin, Heidelberg, 2012).
Chapter Google Scholar
Agushaka, J., Ezugwu, A. & Abualigah, L. Dwarf mongoose optimization algorithm. Comput. Methods Appl. Mech. Eng. 391, 114570 (2022).
Article ADS MathSciNet Google Scholar

Download references

Acknowledgements

This research was funded by the National Key Research and Development Program (Grant No. 2022YFB3706904).

Funding

This research was funded by the National Key Research and Development Program (Grant No. 2022YFB3706904).

Author information

Authors and Affiliations

China Academy of Machinery Beijing Research Institute of Mechanical & Electrical Technology Co., LTD, Beijing, China
Chao Yuan, Hao Zhang, Yunhan Ling, Nan Zhang, Lidong Pan, Yeyingnan Cao & Yiqing Shi
School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China
Chao Yuan
China National Erzhong Group DeYang WanHang Die Forging Co., Ltd, Deyang, China
Tianmin Zhang
Shaanxi Hong Yuan Aviation Forging Company Ltd, Xianyang, 713800, China
Yuxi Tang
School of Management Science and Engineering, University of Jinan, Jinan, 250022, China
Huize Li

Authors

Chao Yuan
View author publications
Search author on:PubMed Google Scholar
Tianmin Zhang
View author publications
Search author on:PubMed Google Scholar
Yuxi Tang
View author publications
Search author on:PubMed Google Scholar
Huize Li
View author publications
Search author on:PubMed Google Scholar
Hao Zhang
View author publications
Search author on:PubMed Google Scholar
Yunhan Ling
View author publications
Search author on:PubMed Google Scholar
Nan Zhang
View author publications
Search author on:PubMed Google Scholar
Lidong Pan
View author publications
Search author on:PubMed Google Scholar
Yeyingnan Cao
View author publications
Search author on:PubMed Google Scholar
Yiqing Shi
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: Chao Yuan, Yiqing Shi, Tianmin Zhang; Data curation: Yuxi Tang, Huize Li; Formal Analysis: Hao Zhang, Yunhan Ling, Yiqing Shi, Nan Zhang; Methodology: Chao Yuan, Lidong Pan, Yeyingnan Cao, Yiqing Shi, Tianmin Zhang, Yuxi Tang, Huize Li, Hao Zhang; Writing–original draft: Chao Yuan, Tianmin Zhang, Huize Li, Hao Zhang, Nan Zhang, Lidong Pan, Yeyingnan Cao; Writing–review & editing: Chao Yuan, Yuxi Tang, Huize Li, Lidong Pan, Yeyingnan Cao, Tianmin Zhang, Yiqing Shi.

Corresponding author

Correspondence to Yiqing Shi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yuan, C., Zhang, T., Tang, Y. et al. Fault prediction method of large forging press based on a multi scale and multi model integrated method. Sci Rep 15, 30675 (2025). https://doi.org/10.1038/s41598-025-16528-x

Download citation

Received: 09 May 2025
Accepted: 18 August 2025
Published: 21 August 2025
DOI: https://doi.org/10.1038/s41598-025-16528-x