Introduction

As important manufacturing equipment for key metal components, large forging presses are a crucial part of high-end industrial setups and play a pivotal role in various fields such as machinery, automotive, shipping, aerospace, and more1,2,3,4. According to incomplete statistics, there are currently over 10,000 screw presses and several thousand crank presses in use within the forging industry, including hot large forging presses, trimming presses, and correction presses, which are the absolute mainstay equipment for heavy forging production lines. The large forging presses mentioned in this study specifically refer to hot large forging presses, encompassing large crank presses and large screw presses. Large forging presses typically consist of numerous components and possess multiple structural levels, making their internal mechanisms intricate to ascertain5. The complex and strongly coupled relationships between their diverse components often exhibit characteristics like concealment, nonlinearity, and randomness, rendering traditional fault diagnosis and prediction techniques based on mechanistic models ineffective6,7,8. However, with the advancement of big data and the industrial internet, there is an ever-increasing variety and volume of monitoring data for large forging presses. The value of this data has garnered widespread attention and research in the academic community, further propelling the progression of Prognostics Health Management (PHM) technology. PHM has emerged as a mainstream technology in the field of reliability today and has been successfully implemented in large forging presses in aviation, new energy vehicles, and other industries9,10,11.

Fault diagnosis focuses on the classification and analysis of fault alarms, fault types, and their underlying causes, whereas fault prediction places greater emphasis on the early warning of potential faults in high-end industrial equipment and anticipates the degradation trajectory of the equipment’s future state12,13,14. At present, the main approaches to fault prediction are categorized into three groups: prediction methods rooted in reliability theory15,16,17, prediction methods based on data-driven methodologies18,19,20, and prediction methods founded on physical models21,22,23,24. The reliability-based prediction method primarily relies on historical data of system failures and the mechanisms of failure occurrence to forecast the timing and likelihood of future failures25,26,27,28,29. The data-driven prediction method involves constructing a prediction model through the analysis of system historical data, thereby enabling the prediction of the system’s future state, encompassing time series prediction, artificial neural network prediction, filter prediction, and grey model prediction30,31,32. The prediction method based on the physical model pertains to establishing a physical or mathematical model of system failure, utilizing information such as system operating status and environmental conditions to anticipate possible future failures of the systems33,34,35.

Currently, with the escalating demands for equipment performance stability and product quality consistency, post-diagnosis alone can no longer suffice the needs of industrial sites36,37. Enterprise managers aspire to receive early warnings when there are merely subtle abnormal symptoms in large forging presses, enabling them to precisely anticipate the degradation trajectory of the equipment in the future, thereby procuring spare parts in advance, devising maintenance and production scheduling plans, and minimizing production downtime. In existing research on fault prediction technology for large forging presses, a single model is frequently employed to predict fault outcomes. However, a single model often encounters issues like overfitting and underfitting when processing intricate large forging press signals, resulting in inaccurate prediction results. Consequently, a multi-model integrated prediction method is imperative to furnish more precise and reliable outcomes for fault prediction of large forging presses.

This study endeavours to explore fault prediction methodologies for large forging presses, aiming to enhance production safety, equipment lifespan, and product quality. In fault diagnosis, the spotlight is on comparing signal data during normal operation and during fault occurrences. This involves scrutinizing the statistical characteristics, patterns, and trends of signal data to discern between normal conditions and faults. In the realm of fault prediction and health management for large forging presses, the emphasis shifts to comprehending the signal data that precede failures. This necessitates contemplating the temporal patterns, trends, and correlations within the signal data to identify early warning signals or precursors of faults. Hence, when making state predictions, it is imperative to account for both linear trends and nonlinear relationships within the signal data, while also exhibiting robust generalization capabilities. Additionally, it is crucial to grasp the interrelationships within the time series signals. Based on the aforementioned rationale, this paper utilizes the Autoregressive (AR) model, Support Vector Regression (SVR) model, and Long Short-Term Memory (LSTM) neural network model for state prediction of key components in large forging presses. The AR model, being a classic linear prediction model, adeptly captures linear trends in signals. The SVR model tackles nonlinear relationships through the employment of kernel tricks, demonstrating robust generalization capabilities. The LSTM model, as a recurrent neural network architecture, is proficient in grasping long-term dependencies within time series signals. Embracing the concept of ensemble learning, this paper fuses the strengths of these three models and harnesses the Dwarf Mongoose Optimization (DMO) algorithm to perform weighted integration of the prediction results, thereby augmenting prediction performance. Finally, to validate the effectiveness of the proposed multi-model ensemble prediction technique in forecasting signals from critical components of large forging presses, this study utilized oil pressure signals from the production line of an 80MN electric screw press at a certain enterprise as a case example. The research results demonstrate that the proposed multi-scale AR-SVR-LSTM ensemble learning prediction model achieves Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics of less than 5% and 1%, respectively, meeting industrial standards and underscoring the model’s effectiveness in industrial applications. By comparing the accuracy of multiple different models, the study further verifies the effectiveness and superiority of the proposed model in fault prediction for large forging presses.

The organization of the remaining content in this study is as follows: In Sect. "Prediction models and optimization", a brief introduction to different machine learning models is provided, along with an explanation of the principles underlying the DMO algorithm employed. Section "A multi-scale AR-SVR-LSTM ensemble learning prediction model" elaborates on the proposed model ensemble strategy in detail. Section "Case study on fault prediction model for large forging presses" presents the validation of the ensemble model using real-world engineering data, demonstrating the advantages of the ensemble model by comparing its predictive performance with that of individual models. Section "Discussion" offers a comparative analysis and discussion of the experimental results obtained in this study. Finally, Sect. "Conclusion" summarizes the research conducted in this paper.

Prediction models and optimization

The time series model

The autoregressive mode

Currently, time series models such as AR, Autoregressive Moving Average (ARMA), and Autoregressive Integrated Moving Average (ARIMA) are widely used to predict failures in high-end industrial equipment. Compared to ARMA and ARIMA models, the AR model offers simplicity, convenience, and parameter-free modelling. To determine the order of the AR model, three commonly used methods are the Discrimination Regression Transfer Function Criterion, the Final Prediction Error Criterion, and the Akaike Information Criterion (AIC). When these three methods are used to estimate the order of the AR model, there is no significant difference in the results38.

The support vector regression model

The core idea of SVR is to maximize the model’s margin within an acceptable range of errors, thereby enhancing its generalization ability while ensuring a certain level of prediction accuracy. The \(\:\epsilon\:\)-insensitive loss function is a core component, as illustrated in Fig. 1. The basic idea of the \(\:\epsilon\:\)-insensitive loss function is that there exists a threshold \(\:\epsilon\:\), and if the absolute difference between the predicted value and the true value does not exceed this threshold, the prediction is considered to have no loss.

Fig. 1
figure 1

\(\:\epsilon\:\)-insensitive loss function.

Long short-term memory neural network model

LSTM is a special type of recurrent neural network primarily used for processing and predicting time series data. The purpose of LSTM is to address the issue of gradient vanishing or gradient explosion that traditional RNNs encounter when dealing with long sequence data39,40. In LSTM, there are two crucial concepts: gates and cell states. Gates provide a means to transmit information on demand. LSTM is composed of structures called gates, which can add or remove information from the cell states.

Multi-model integration and fusion

Ensemble learning

Ensemble learning integrates multiple models using certain strategies to improve decision accuracy through group decision-making41,42. By combining multiple learners, the generalization ability of an ensemble is often much stronger than that of a single learner contained in it. As shown in Fig. 2, the basic workflow of ensemble learning is to first train a group of individual learners and then combine them through some strategies43,44.

Fig. 2
figure 2

The basic process of ensemble learning.

Dwarf mongoose optimization

DMO is a heuristic algorithm inspired by the survival and hunting behavior of the dwarf mongooses45. The entire meerkat colony is divided into the Alpha group, the Babysitter group, and the Scout group. Among them, after foraging, the Alpha group has identified the following candidate locations:

$$\:{X}_{i+1}={X}_{i}+\phi\:\times\:peep$$
(1)

where \(\:\phi\:\) is a uniformly distributed random value, ranging from 0 to 1. \(\:{X}_{i+1}\) denotes candidate food locations, \(\:peep\) is set to 2 in this study.

The new position of an individual after babysitter exchange is given by the following formula:

$$\:{x}_{i,j}={l}_{j}+rand\times\:\left({u}_{j}-{l}_{j}\right)$$
(2)

where \(\:rand\) is a random value, ranging from 0 to 1; \(\:{u}_{j}\) and \(\:{l}_{j}\) are the limitations of the search domain.

Additionally, during the scouting phase, the new positions of the population are described by the following formula:

$$\:{X}_{i+1}\left\{\begin{array}{c}{X}_{i}-CF\:\text{*}\:phi\:\text{*}\:rand\:\text{*}\:\left[{X}_{i}-\overrightarrow{M}\right]if\;{\phi}_{i+1}>{\phi}_{i}\\\:{X}_{i}+CF\:\text{*}\:phi\:\text{*}\:rand\:\text{*}\left[{X}_{i}-\overrightarrow{M}\right]\end{array}\right.$$
(3)

where \(\:rand\) is a random value, ranging from 0 to 1; \(\:CF\:\)is the parameter to adjust the movement willingness of the mongoose groups, calculated as \(\:CF={\left(1-\frac{iter}{Ma{x}_{iter}}\right)}^{\left(2\frac{iter}{Ma{x}_{iter}}\right)}\). \(\:\overrightarrow{M}\:\)is the parameter to decide the movement of the mongoose to the new sleep mound, calculated as \(\:\overrightarrow{M}={\sum}_{i=1}^{n}\frac{{X}_{i}\times\:s{m}_{i}}{{X}_{i}}\).

A multi-scale AR-SVR-LSTM ensemble learning prediction model

Multi-model ensemble strategy

The displacement signal, air pressure signal and hydraulic signal, generated by large forging presses during actual production processes are impact force signal. These key parameters exhibit sudden changes, continuous fluctuations, or oscillations in the historical production process curves, and there are pronounced nonlinear characteristics. At the same time, this study suggests that during long-term multiple forging operations, the initial working conditions and the operating status of various components will indicates a long-term impact on subsequent forging operations, which has a long-term dependence relationship and is reflected in the collected signals. These characteristics lead to the limitations of single model prediction processing.

Therefore, this study proposes a three-stage framework termed “Multi-scale Decomposition - Heterogeneous Model Collaboration - Dynamic Weight Fusion” for constructing an ensemble model. The key parameters of large forging presses in actual production processes are closely related to their health monitoring, such as signals measured by sensors including oil pressure information, vibration information, or temperature signals. These signals reflect the operational status of the equipment from different dimensions and contain critical information for system health monitoring, which is of great significance for enhancing the accuracy, earliness, and interpretability of fault prediction. Predicting the key parameters of large forging presses during actual production processes using the ensemble model and further comparing them with the equipment’s operational parameters under normal conditions represent an important approach for fault prediction.

In the process of constructing the ensemble model, multi-scale decomposition technology is first employed to perform scale-differentiated processing on the original signals, thereby further enhancing the model’s noise resistance capability. Subsequently, models are matched with signal components based on their respective characteristics, avoiding a one-size-fits-all modeling approach. Finally, the DMO is utilized to dynamically adjust the weights of each model based on historical prediction errors, enabling the redistribution of weights among different models. In engineering applications, if manual parameter tuning or grid search is adopted, it necessitates traversing a vast number of weight combinations, resulting in high computational costs and difficulties in identifying the global optimal solution. Conversely, employing gradient descent methods may encounter issues such as inappropriate iteration directions, step sizes, and starting points. Therefore, this study selects the DMO algorithm for weight optimization, aiming to improve the model’s generalization ability and enable it to adaptively respond to changes in operating conditions.

In order to make full use of the advantages of different prediction models, the idea of ensemble learning is adopted to integrate these models to capture more key features in the data. The AR model excels in linear modeling, whereas the SVR model demonstrates an advantage in nonlinear modeling. LSTM is capable of effectively capturing long-term dependencies in time series data and addressing the gradient decay issue present in traditional RNNs. Therefore, in this section, the three methods are integrated into an AR-SVR-LSTM model, which is expected to improve the prediction accuracy and efficiency. The basic process of the hybrid prediction model of AR-SVR-LSTM is shown in Fig. 3.

Fig. 3
figure 3

The AR-SVR-LSTM ensemble learning prediction model.

The steps are as follows:

(1) Collect monitoring signals of key components of the forging press, and construct a signal prediction training dataset and a testing dataset.

(2) First, wavelet transform is applied to transform the original signal into sub-signals at different frequency scales, comprising one approximation component and multiple detail components. The approximation component is used to describe the low-frequency trend, while the detail components capture high-frequency fluctuations. Subsequently, Savitzky-Golay (S-G) filtering is employed to process these multi-scale components, smoothing out noise through local polynomial regression while preserving the edge features of the signal.

(3) Calculate the AR model coefficients of the monitoring signal data of the key components of the forging press, and construct an AR model. Use the test set data as the model input, and obtain the predicted value. After wavelet reconstruction, the reconstructed value \(\:{y}_{1}\) is used as the AR model output.

(4) For the parameters of the SVR model, we use grid search to select and optimize them. After constructing the SVR prediction model, we predict the wavelet components decomposed from the signal after S-G filtering, and then perform inverse wavelet transform on the predicted values to complete the wavelet reconstruction and obtain the final prediction result \(\:{y}_{2}\) of the SVR model.

(5) Construct an LSTM network model, use each scale component as the model input to predict, and then perform wavelet reconstruction on the predicted values, and output the value \(\:{y}_{3}\).

(6) Establish a linear combination of \(\:{w}_{1}{y}_{1}+{w}_{2}{y}_{2}+{w}_{3}{y}_{3}\), and then use the DMO algorithm to determine the weights of different models in the hybrid model.

Fitness function of learning prediction model

The aforementioned section presents the establishment process of the proposed AR-SVR-LSTM model, which is intended for predicting critical information of large forging presses. As can be observed from the construction process of the ensemble model, different individual models are assigned distinct non - negative weights within the fitness function. Consequently, the value of the fitness function is non - negative. The deviation between the prediction value and the true value is expressed as Eq. (4):

$$\:{z}_{i}={w}_{1}{y}_{1}\left(i\right)+{w}_{2}{y}_{2}\left(i\right)+{w}_{3}{y}_{3}\left(i\right)-{x}_{i},\quad i=\text{1,2},\dots\:,N$$
(4)

Based on this, a fitness function is proposed for this optimization model, which is expressed as the sum of the squares of the predicted sequence and the actual sequence, as shown in Eq. (5).

$$\:f={min}\sum_{i=1}^{N}{\left({w}_{1}{y}_{1}\left(i\right)+{w}_{2}{y}_{2}\left(i\right)+{w}_{3}{y}_{3}\left(i\right)-{x}_{i}\right)}^{2} \\ \quad s.t.{w}_{1}+{w}_{2}+{w}_{3}=1, \quad {w}_{1},{w}_{2},{w}_{3}\ge\:0$$
(5)

The weight corresponding to each model’s predicted value, obtained by solving the above formula, is substituted into the multi-scale AR-SVR-LSTM model for analysis. Additionally, this study employs MAE and RMSE to evaluate the performance of the prediction model.

The formulas for MAE and RMSE are shown in Eqs. (6) and (7).

$$\:\text{M}\text{A}\text{E}=\sqrt{\frac{1}{\text{N}}\sum_{\text{n}=1}^{\text{N}}\left|\frac{{\text{y}}_{\text{i}}-{\widehat{\text{y}}}_{\text{i}}}{{\text{y}}_{\text{i}}}\right|}, \quad\left(\text{i}=\text{1,2},\dots\:,\text{N}\right)$$
(6)
$$\:RMSE=\sqrt{\frac{1}{N} {\sum_{n=1}^{N}\left(\frac{{y}_{i}-{\widehat{y}}_{i}}{{y}_{i}}\right)}^{2}}, \quad \left(i=\text{1,2},\dots\:,N\right)$$
(7)

where \(\:{y}_{i}\) is the true value, which represents the sampling point of the monitoring signal; \(\:{\widehat{y}}_{i}\) is the predicted value output by the regression model.

Case study on fault prediction model for large forging presses

To validate the effectiveness and superiority of the proposed prediction model, this experiment was conducted based on data obtained from the 80MN electric screw press production line in an enterprise. The research encompasses the analysis and processing of historical data, the establishment of an ensemble prediction model, and finally, validation through integration with real-time operational data. Prior to conducting experiments, time-domain and frequency-domain analyses are performed on the collected oil pressure signal data from large forging presses. Different data characteristics correspond to three types of fault features: a drop in oil pressure may indicate a leakage fault, high-frequency pressure vibrations may suggest cavitation faults, and low-frequency pressure vibrations could point to blockage faults. Therefore, the established signal prediction model needs to effectively capture features such as temporal trends, high-frequency fluctuations, and low-frequency vibrations in order to achieve accurate and early fault prediction.

Data preprocessing

Considering the complexity and uncertainty of fault data of large forging presses, methods for data preprocessing and feature extraction have been studied to extract useful feature information for fault prediction. Then, how to choose appropriate prediction models and parameter optimization methods to improve the accuracy and generalization ability of prediction models are discussed.

The experiment takes the oil pressure signal of the brake of a large forging press as an example to train and validate the proposed multi-scale AR-SVR-LSTM model, and analyze and verify the prediction effect. Due to the large amount of noise in the monitoring signal of the large forging press, if the original signal is directly modeled and predicted, it will be subject to various noise disturbances, resulting in poor prediction performance. Therefore, it is necessary to perform multi-scale smoothing on the oil pressure signal of the brake of the large forging press. This study proposes two processing strategies:

(1) First, the signal is subjected to wavelet decomposition and multi-scale S-G filtering, as shown in Fig. 4; Then, the signals at each scale are subjected to multi-step prediction using a time series model. Finally, the predicted values are subjected to wavelet reconstruction to obtain the overall predicted sequence of the signal.

(2) After multi-scale smoothing preprocessing, the reconstructed signal is directly subjected to multi-step prediction to obtain the final prediction sequence, as shown in Fig. 5. This study will compare the prediction performance of these two operations and then select the optimal data preprocessing strategy.

Fig. 4
figure 4

The AR-SVR-LSTM ensemble learning prediction model.

Fig. 5
figure 5

The reconstructed wavelet signal after multi-scale filtering.

This study combines the signals from multiple strikes into a longer time series to increase the amount of training data and improve prediction performance. In this study, the monitoring signal data of the brake oil pressure from a large forging press, which was continuously struck 10 times, was combined to form a time series of 80,000 data points. The dataset was subsequently divided into a training set (80%) and a test set (20%), without performing any shuffling operation. The length of the input time series for all models in this study is set to 2000 for training and validation. All models in this study use recursive methods for multi-step prediction, with a prediction sequence length of 200 time steps. This sequence length is used for comparative analysis of prediction performance.

Predictions of individual models

Prediction results of the AR model

For AR model, the AR model order needs to be determined first. AIC criterion is an indicator to measure the goodness of fit of statistical models, which aims to find a balance between model goodness of fit and model complexity to solve the problem of model selection. AIC avoids overfitting by penalizing the number of model parameters, thereby selecting a model with better predictive performance. Its calculation formula is as follows:

$$\:AIC=-2\:\text{*}\:ln\left(L\right)+2\:\text{*}\:k$$
(8)

where \(\:L\) is the maximum likelihood value of the model; \(\:k\) is the number of parameters in the model. In time series analysis, AIC is often used to determine the order of an AR model. By calculating the AIC values for different model orders, the model with the smallest AIC value is selected as the optimal model. Figure 6 shows that when the order is 33, the AIC value is the smallest, so the AR model order is determined to be 33. In addition to selecting the model order, it is also necessary to solve for the AR model coefficients \(\:{\alpha\:}_{j}\:(j=\text{1,2},\dots\:,p)\).

Fig. 6
figure 6

The variation of the AIC values with the order.

Subsequently, using the constructed AR prediction model, we can predict the scale components decomposed by wavelet. As shown in Fig. 7, the AR model does not perform well when predicting low-scale detail components, with large deviations. However, it has strong predictive ability for detail components and approximate components of scales 3 and 4, almost perfectly approximating the true values.

Fig. 7
figure 7

The prediction results with the AR model at different scales.

The prediction values at each scale are subjected to inverse wavelet transform, that is, wavelet reconstruction, to obtain the final prediction results with the AR model, as shown in Fig. 8.

Fig. 8
figure 8

The prediction results of wavelet reconstruction signal at different scales.

To compare the prediction accuracy of the following two cases for the brake oil pressure signal: case 1: first perform wavelet decomposition and then separately predict each scale signal; case2: use an AR model to predict the preprocessed signal directly. Here, we use the AR model to predict the multi-scale preprocessed signal, and to distinguish it from the previous prediction method, we label it as “the prediction results of ‘wavelet reconstruction + AR model’”. The results are shown in Fig. 9. Further calculations were conducted to determine the accuracy metrics of different models: The Multi-scale AR model (Fig. 8) exhibits MAE and RMSE values of 5.32% and 0.6322%, respectively. In comparison, the AR model (Fig. 9) demonstrates MAE and RMSE values of 10.45% and 1.2624%, respectively. Comparing the deviation between each predicted value and the true value in Figs. 8 and 9, it shows that the prediction error of case 1 is smaller.

Fig. 9
figure 9

The preprocessed signal prediction results with using an AR model directly.

Prediction results of the SVR model

The SVR model is applied to predict the wavelet decomposition and S-G filtered components at different scales, as shown in Fig. 10. The comparison between the predicted values and the true values shows that the SVR model is superior to the AR model in predicting the wavelet detail components at the first two scales, indicating that there are differences in the linear and nonlinear prediction capabilities of the AR model and the SVR model in processing signal data.

Fig. 10
figure 10

The prediction results with SVR model at different scales.

The predicted values at different scales are reconstructed, as shown in Fig. 11. It shows that the SVR model has a good prediction effect.

Fig. 11
figure 11

The prediction results of wavelet reconstruction signal at different scales.

When comparing the prediction results of the wavelet reconstructed signal using the SVR model directly, as shown in Fig. 12, it is evident that the multi-scale signal prediction accuracy is higher at the peak and trough positions. Because the predicted values in Fig. 12 exhibit significant oscillations at the peak and trough positions, whereas in Fig. 11, the predicted values oscillate less frequently and demonstrate a higher degree of fit with the actual values. The accuracy metrics of different models were calculated: The multi-scale SVR model (Fig. 11) exhibits MAE and RMSE values of 7.43% and 0.7135%, respectively. Correspondingly, the SVR model (Fig. 12) demonstrates MAE and RMSE values of 4.99% and 0.5016%, respectively, which further substantiates the aforementioned conclusion.

Fig. 12
figure 12

The preprocessed signal prediction results with using the SVR model directly.

Prediction results of the LSTM model

Similarly, time-domain signals of a large forging press were predicted using LSTM networks. The LSTM parameters were configured as follows: the input dimension was set to 1; the time step length was defined as 2000; the number of hidden units in each layer was set to 64, with a two-layer LSTM network architecture employed; the dropout rate was fixed at 0.5; the batch size was set to 64; the learning rate was 0.001; the number of iterations was 100; and the squared loss function was selected as the loss criterion. Upon completion of model training, the validation set was utilized for model prediction. Preprocessing of the brake oil pressure signal from the large forging press was conducted using wavelet decomposition and S-G filtering to obtain the preprocessed scale components. The trained LSTM network model was then applied for multi-step prediction. The comparative results between the predicted sequence and the actual sampled sequence are illustrated in Fig. 13. As depicted in the figure, the LSTM model demonstrates superior prediction performance for both detailed and approximate components across different scales compared to the two preceding models.

Fig. 13
figure 13

The prediction results with the LSTM model at different scales.

Similarly, the predicted values at different scales have been reconstructed, as depicted in Fig. 14. Further calculations were carried out to evaluate the accuracy metrics of different models: The Multi-scale LSTM model (Fig. 14) exhibits MAE and RMSE values of 3.15% and 0.3265%, respectively. Correspondingly, the LSTM model (Fig. 15) demonstrates MAE and RMSE values of 5.22% and 0.6143%, respectively. When comparing the prediction results of the wavelet-reconstructed signal using the LSTM network model directly, as shown in Fig. 15, it is evident that the multi-scale LSTM model has higher signal prediction accuracy.

Fig. 14
figure 14

The prediction results of wavelet reconstruction signal at different scales.

Fig. 15
figure 15

The preprocessed signal prediction results with using the LSTM model directly.

In order to show the differences in prediction performance between the AR model, SVR model, and LSTM model, a detailed comparative analysis is conducted between the actual values and the predicted values of the AR model, SVR model, and LSTM model, as shown in Fig. 16. After removing noise from the oil pressure signal of the large forging pressure brake, important features within the signal become apparent, which is more conducive to predicting signal changes. All prediction models can predict signal changes well, but we can also see that the peak part of the signal has relatively large errors. In addition, the prediction performance of individual models such as LSTM model and SVR model is better than that of AR model, indicating that although AR model can effectively capture the linear trend in the signal, it also has limitations.

Fig. 16
figure 16

Comparison of the predictive performance of multiple models.

Prediction of the ensemble model

Results of the ensemble model

To fully capitalize on the strengths of each prediction model, it is essential to integrate the prediction results obtained from the AR model, SVR model, and LSTM model. Subsequently, the combined weights of these models should be optimized using the DMO algorithm. Notably, the parameter settings for the LSTM model in this combined approach remain identical to those employed when the LSTM model is utilized independently for prediction.

The parameter settings for the DMO algorithm are as follows: the population size is 50; the initial value of the inertia weight is set to w = 0.9 and the termination weight value is \(\:w=0.15\), indicating that the inertia weight will gradually decrease from 0.9 to 0.15 during the iterative process of the algorithm; the maximum number of iterations is set to 100, which means that the algorithm will perform 100 updates. Both the acceleration coefficient for the individual optimal solution and the acceleration coefficient for the global optimal solution are set to 1.49445. During the speed update process, the attraction of the individual and global optimal solutions is equal.

To verify the superiority of the DMO algorithm, its performance is compared with that of the Particle Swarm Optimization (PSO) algorithm in optimizing the objective function. The parameter settings of the PSO algorithm are as follows: the acceleration coefficients are \(\:{c}_{1}={c}_{2}=1.49445\); the number of particles is 50; the initial weight is \(\:w=0.9\); the termination weight \(\:w=0.15\); the iteration number is 100. The fitness curve is shown in Fig. 17.

By comparing Figs. 17(a) and (b), it is evident that the DMO algorithm has a faster convergence rate than the PSO algorithm when achieving the same results. In the DMO algorithm, convergence can be achieved in only 13 iterations, while the PSO algorithm requires 26 iterations to achieve convergence. This result further demonstrates the superiority of the DMO algorithm in optimizing weights, providing a strong guarantee for achieving more accurate prediction results.

Fig. 17
figure 17

The comparison of the fitness curves.

According to the optimization results, the weight of the AR model in the integrated model is \(\:{w}_{1}=0.2245\), the weight of the SVR model is \(\:{w}_{2}=0.3624\), and the weight of the LSTM model is \(\:{w}_{3}=0.4131\). The prediction results of the multi-scale AR-SVR-LSTM model are obtained, as shown in Fig. 18. From the figure, it is observed that the prediction performance of the proposed model is good, especially in the peak part of the signal, where the prediction deviation has been reduced. The prediction performance of the proposed method has overcome some limitations of a single prediction model. The above results further confirm the superiority of using a multi-model integration strategy in prediction, providing a solution for complex signal prediction problems.

Fig. 18
figure 18

Prediction results of multi-scale AR-SVR-LSTM ensemble learning mode.

Evaluation of prediction results of the ensemble model

In order to better evaluate and quantify the prediction performance of each model, this study uses two indicators, MAE and RMSE, for comparison. According to Eqs. (6) and (7), the average absolute error and root mean square error of each model are calculated, reflecting the performance and accuracy of the prediction model. The results are shown in Table 1. In addition, Table 1 also documents the computational time required for establishing different models. The computer used in this study is Intel (R) Core (TM) i9-9900 K CPU @ 3.60 GHz, using a 64-bit operating system. By comparing these indicators, we can better understand the strengths and weaknesses of each model in terms of prediction performance, which provides a basis for selecting the appropriate model.

Table 1 The evaluation of prediction errors.

The data in Table 1 shows that: (1) the LSTM network model performs the best in single model prediction; (2) the multi-scale model prediction can effectively improve prediction performance; (3) comprehensive comparison of prediction errors of various models shows that the multi-scale AR-SVR-LSTM model proposed in this paper has good performance.

Discussion

By conducting experiments in Sect. "Case study on fault prediction model for large forging presses" centered around issues such as “model selection - data preprocessing - comparative validation - parameter optimization,” the following conclusions can be drawn:

(1) From the comparison of model prediction results under two data processing strategies in Sect. "Prediction results of the AR model", it can be observed that decomposing complex problems into multiple sub-problems through multi-scale decomposition effectively enhances the prediction accuracy of the model for peaks and troughs, thereby better achieving the goals of improving prediction signal quality and facilitating fault diagnosis.

(2) By comparing the prediction results of single-model and ensemble models for oil pressure signals, it is evident that single models have certain limitations in predicting key system information: the AR model fails to capture nonlinear characteristics, which may lead to inaccurate predictions of cavitation faults in large forging presses; the SVR model lacks sufficient modeling of long-term dependencies, potentially preventing it from detecting slow leaks; although the LSTM model can learn long-term dependencies, it exhibits lower sensitivity to high-frequency details compared to the SVR model. In contrast, the proposed AR-SVR-LSTM model can integrate multi-scale features, achieving collaborative optimization of “high-frequency details + long-term dependencies + trend prediction.”

(3) After DMO optimization, the weights of different individual surrogate models in the proposed AR-SVR-LSTM model are adjusted, enhancing the model’s prediction accuracy for key information. Additionally, comparing the number of iterations between PSO and DMO optimizations reveals that DMO optimization offers higher efficiency.

(4) Since signals at different scales correspond to distinct fault characteristics (high-frequency scales relate to cavitation faults, medium-frequency scales correspond to normal operating pressures, and low-frequency scales indicate leakage faults), models capable of comprehensively predicting signals across different scales possess greater advantages. Therefore, based on the model accuracy metrics and fitting results, the proposed AR-SVR-LSTM model successfully integrates the strengths of individual models, enabling effective prediction of information across different scales and meeting industrial standards.

Conclusion

This study addresses the challenges of significant nonlinearity and poor stability in signals generated by key components of large forging presses by proposing a DMO algorithm-assisted AR-SVR-LSTM prediction model for predicting critical signals. By utilizing this model to predict key signals and further conducting feature analysis on them, along with comparing these features with those under normal operating conditions, fault prediction for large forging presses can be achieved. Taking an 80MN electric screw press from a certain enterprise adopted in this study as an example, important fault predictions can be made based on the monitored oil pressure signals. Analysis of the study’s results reveals that the proposed prediction model combines the linear modeling advantages of the AR model, the nonlinear modeling strengths of the SVR model, and the LSTM model’s capability to address gradient decay issues in recurrent neural networks. In the case study, the AR-SVR-LSTM model demonstrates its ability to accurately predict signals at different scales, enhancing the reliability and accuracy of fault prediction. Furthermore, the proposed AR-SVR-LSTM model employs the DMO algorithm to allocate weights among different individual models within the hybrid model, further improving the model’s adaptability to complex operating conditions and enabling precise predictions of signals with complex characteristics. In summary, the proposed multi-scale AR-SVR-LSTM ensemble learning prediction model represents a significant advancement in fault prediction technology for large forging presses. By providing timely and accurate warnings, the model supports proactive maintenance strategies, reduces downtime, and enhances overall operational efficiency. Future work will focus on refining the model’s parameters, exploring additional data sources, and extending its applicability to other complex industrial systems. In addition, future work will endeavor to explore more effective model parameter optimization strategies to enhance adaptability to a variety of complex engineering structures. Developing more efficient and less time-consuming model establishment methods also remains a top priority.