A novel combination model for ultra-short-term wind speed prediction

Xu, Shunqing; Niu, Jincai

doi:10.1038/s41598-025-20497-6

Download PDF

Article
Open access
Published: 21 October 2025

A novel combination model for ultra-short-term wind speed prediction

Shunqing Xu¹ &
Jincai Niu^2,3

Scientific Reports volume 15, Article number: 36666 (2025) Cite this article

1638 Accesses
Metrics details

Subjects

Abstract

Accurate and rapid ultra-short-term wind speed prediction is crucial for wind farm operation control. This paper proposes a wind speed prediction model that integrates Variational Mode Decomposition, Sparrow Search Algorithm, and Long Short-Term Memory to address the issue of insufficient precision in current ultra-short-term forecasts. To tackle the complexity of wind speed datasets and the challenge of selecting parameters for Variational Mode Decomposition, the Archimedean Optimization Algorithm is employed to optimize modal component values and the penalty factor, enabling adaptive selection of these parameters. Additionally, the Sparrow Search Algorithm is utilized to optimize parameters for Long Short-Term Memory, including the number of hidden layer units, iteration count, and initial learning rate. The combined model of Variational Mode Decomposition, Sparrow Search Algorithm, and Long Short-Term Memory accurately forecasts ultra-short-term wind speed based on historical data. Validate the prediction model with wind speed data from the National Data Buoy Center (NDBC) in the United States. The experimental results demonstrate that this model achieves a prediction accuracy of approximately 97%, providing a novel approach to ultra-short-term wind speed forecasting in wind farms.

A novel wind speed prediction model based on neural networks, wavelet transformation, mutual information, and coot optimization algorithm

Article Open access 29 March 2025

Enhanced framework embedded with data transformation and multi-objective feature selection algorithm for forecasting wind power

Article Open access 08 May 2025

Ranking-oriented machine learning framework for probabilistic wind power forecasting with temporal reliability constraints

Article Open access 26 November 2025

Introduction

Recently, against the backdrop of emission peak and carbon neutrality goals, China’s clean and renewable energy has stepped into a novel phase of development, with the growth pace of wind power development accelerating^1,2,3. In line with the most recent figures from China’s National Energy Administration (NEA), by the close of April 2024, the installed capacity of wind power in China approximates 460 million kW, and the installed capacity of new energy makes up approximately 37.5% of the national installed power generation capacity. As per the Global Wind Energy Association (GWEA), it is anticipated that the installed capacity of wind power will hit 157 GW by 2027, boasting a compound annual growth rate of around 15%.The unpredictability, variability, and randomness of wind energy make it difficult to accurately predict its speed⁴. The accuracy of ultra-short-term wind speed forecasting is a critical factor in forecasting wind power. It serves as a vital approach to enhance the stability of wind power grid connection and bears remarkable significance for the operation of wind farms as well as power systems⁵. Despite the fact that ultra-short-term wind speed prediction has been utilized within the power system for numerous years, achieving accurate forecasts of ultra-short-term wind speed remains a formidable challenge. Consequently, it is of utmost urgency to conduct ultra-short-term wind speed prediction with high precision.

Since wind speed represents a non-stationary and fluctuating random signal, wind power generation exhibits complex nonlinear characteristics. As a result, the prediction accuracy of a single prediction model is rather low, and the associated prediction error is relatively large, making it difficult to fulfill the actual requirements of the power system. In recent years, certain scholars have employed combined prediction models in the research regarding wind speed and power prediction, thereby enhancing the prediction accuracy to a certain degree^6,7,8.Wind speed forecasting can be divided into three distinct categories, based on the varying temporal scales: ultra-short-term forecasting, short-term forecasting, and medium to long-term forecasting⁷. Ultra-short-term wind speed forecasting pertains to estimating the wind speed within a wind farm for the upcoming 0 to 4 h, with a time interval not exceeding 15 min. Typically, historical data from wind farms is utilized for ultra-short-term wind speed forecasting. Given that meteorological alterations are predominantly dictated by the persistence of atmospheric conditions, this research opts to investigate ways to enhance the precision of ultra-short-term wind speed forecasting based on historical wind speed data.

Wind speed forecasting methodologies are generally categorized into physical methods, statistical methods, and intelligent approaches⁹. Each category presents distinct advantages and limitations, addressing varying application scenarios to achieve accurate and context-specific predictions. By selecting an appropriate methodology aligned with the forecast horizon and application requirements, energy planners and system operators can effectively manage the intrinsic variability of wind energy, thereby facilitating its reliable and efficient integration into power systems.

In recent years, along with the advancement of artificial intelligence technology, the number of artificial intelligence prediction algorithms has witnessed a rapid growth and they have been extensively applied in various fields like wind power forecasting, photovoltaic power prediction, and power load prediction ^10,11,12,13. Several studies have shown that both linear and nonlinear models such as the extreme learning machine (ELM)^14,15 random forest (RF)¹⁶, LSTM¹⁷ and transformer¹⁸ are commonly utilized for wind power prediction. Additionally, another nonlinear prediction model extreme gradient boosting (XGBoost)¹⁹ has also been put forward for the prediction of wind energy. Zhao and his associates proposed a unified model for forecasting short-term wind power, which is founded on Variable Mode Decomposition (VMD), Convolutional Neural Network (CNN), and Gated Recurrent Unit (GRU). Through extracting characteristics from VMD, this model successfully lessens the instability of wind speed time series²⁰. Taking into account the impact of other elements on wind speed, Zhang and their associates proposed employing a two-layer Long Short-Term Memory (LSTM) neural network for predicting multivariate wind speed. This approach serves to diminish the prediction errors and also exhibits a certain degree of model generalization capacity²¹. Shahid et al. applied LSTM to wind power prediction and achieved certain results²². Drawing on these insights, Tuerxun and his colleagues proposed the application of Long Short-Term Memory (LSTM) to ultra-short-term wind speed prediction, thereby enhancing the prediction accuracy²³. Hua and their team decomposed the initial wind speed data by means of Variable Mode Decomposition (VMD). Subsequently, they carried out preliminary forecasts using Extreme Learning Machine (ELM) models and then established a residual prediction model founded on Long Short-Term Memory (LSTM). Eventually, the prediction outcomes were integrated. However, they failed to present a methodology for attaining the optimal parameters of VMD and LSTM^24,25,26. Gao and their associates integrated VMD with LSTM to attain a more precise prediction of ultra-short-term wind speeds. They utilized the central frequency observation approach to ascertain the selection of VMD parameters²⁷. Zhang and their colleagues put forward an enhanced combined prediction model that merges Variable Mode Decomposition (VMD) with the Genetic Neural Network (GNN). This model managed to fulfill the objective of predicting ultra-short-term wind speeds within wind farms. Nevertheless, the algorithm failed to offer a particular approach for obtaining the optimal parameters of VMD²⁸. Liu and their counterparts proposed a combined wind speed prediction model that integrates VMD, the multi-headed self-attention (MSA) mechanism, and LSTM. This model was enhanced through the utilization of the rime optimization algorithm (RIME) to boost the performance of the prediction model, which holds certain referential value for wind speed prediction²⁹.

When performing VMD decomposition, it is necessary to set a reasonable number of modes k and penalty factor α. If the value of k is too large, it can lead to over decomposition, and vice versa, it can lead to under decomposition. If the value of α is too large, it can cause the loss of signal frequency band information, otherwise it can lead to information redundancy. At present, many literature materials use the center frequency observation method to set the number of modal components, and determine the k value by analyzing the center frequency under different k values. Obviously, this method has a certain degree of randomness and can only determine the number of modalities k, and the penalty factor cannot be determined^20,30.

In this research, a new combined prediction model named VMD-SSA-LSTM is put forward for the prediction of ultra-short-term wind speed. The prediction accuracy of this model has been validated. The outcomes demonstrated that the proposed model showed higher precision in comparison with other models.

Methodology

The historical wind speed data of a wind farm is fundamentally time series data. LSTM is capable of effectively modeling such time series data, enabling it to handle the long-term dependencies of time series data more proficiently and enhancing the stability and robustness of the model. The VMD is utilized to extract the effective features from the historical wind speed data. Meanwhile, the relevant parameters of VMD and LSTM are respectively optimized through the Arithmetic Optimization Algorithm (AOA) and the Salp Swarm Algorithm (SSA).The proposed model in this study is briefly introduced in the following manner.

Optimized variational mode decomposition

Variational Mode Decomposition (VMD) was chosen to break down the wind speed signal, due to the intricate nature of wind speed, as per the analysis previously mentioned. VMD represents an adaptive and entirely non-recursive approach for signal decomposition and noise reduction processing. It is capable of decomposing complex signals that are both non-stationary and nonlinear, and can more effectively suppress modal aliasing³¹. It is predominantly utilized for extracting features from complex signals, with the aim of obtaining different constituents or modes. Through iterative searches for the optimal solution of the variational mode, the time series gets decomposed into a sequence of Intrinsic Mode Functions (IMFs) possessing inherent bandwidths. Meanwhile, the optimal center frequencies and bandwidths of each IMF are adaptively updated. The original input signal can be disassembled into a series of discrete modal components by means of variational mode decomposition. By directly adding these discrete modal components, the approximate signal of the actual input signal can be acquired, demonstrating strong robustness against sampling and noise. The wind speed exhibits typical volatility, randomness, and uncontrollability, making it arduous to extract its features. However, via variational mode decomposition, the historical wind speed data is broken down into a finite number of IMF components. These components reflect the regularity or periodicity of the time series within different frequency bands, thereby diminishing the nonlinear characteristics of the wind speed signals. It is assumed that the wind speed signal $f\left( t \right)$ is composed of K modal components, and its spectrum is modulated to the corresponding baseband by Hilbert transform, and corresponds to the estimated center frequency $\omega_{k}$ corresponding to an index. Finally, the bandwidth is estimated by demodulating the Gaussian smoothness of the signal, so that the variational mode decomposition problem is transformed into a variational constraint model:

$$\left\{ {\begin{array}{*{20}c} {\mathop {\min }\limits_{{\left\{ {u_{k} } \right\},\left\{ {\omega_{k} } \right\}}} \mathop \sum \limits_{k = 1}^{K} \left\| {\partial_{t} \left[ {\left( {\delta \left( t \right) + \frac{j}{\pi t}} \right) * u_{k} \left( t \right)} \right]e^{{ - j\omega_{k} t}} } \right\|_{2}^{2} } \\ {{\text{s}} .t.\mathop \sum \limits_{k = 1}^{K} u_{k} \left( t \right) = f\left( t \right)} \\ \end{array} } \right.$$

(1)

The kth modal component of wind speed is represented $u_{k}$ in the formula, with the center frequency of the corresponding wind speed modal component being denoted $\omega_{k}$;convolution operator is denoted $*$;unit impulse function at time t is represented $\delta \left( t \right)$,and the partial derivative operator of function to t is included $\partial_{t}$.Simultaneously, with the aim of attaining the optimal solution of the variational constraint model, the quadratic penalty operator along with the Lagrange operator are employed to convert the optimization problem presented in formula (1) into the unconstrained optimization problem described in formula (2).

$$L\left( {\left\{ {u_{k} } \right\},\left\{ {\omega_{k} } \right\},\lambda } \right) = \alpha \mathop \sum \limits_{k = 1}^{K} \left\| {\partial_{t} \left[ {\left( {\delta \left( t \right) + \frac{j}{\pi t}} \right) * u_{k} \left( t \right)} \right]e^{{ - j\omega_{k} t}} } \right\|_{2}^{2} + \left\| {f\left( t \right) - \mathop \sum \limits_{k = 1}^{K} u_{k} \left( t \right)} \right\|_{2}^{2} + \left[ {\lambda \left( t \right),f\left( t \right) - \mathop \sum \limits_{k = 1}^{K} u_{k} \left( t \right)} \right]$$

(2)

where $\alpha$ denotes the quadratic penalty factor;$\lambda$ represents the Lagrangian multiplier;$\otimes$ indicates the convolution operator. The wind speed modal components $u_{k}$ and their center frequency domains $\omega_{k}$ are updated using the Alternating Direction Method of Multipliers (ADMM), as shown in Eq. (3).

$$u\left( \omega \right) = \frac{{\hat{f}\left( \omega \right) - \sum\limits_{i \ne k} {\hat{u}_{i}^{n} \left( \omega \right) + \frac{{\hat{\lambda }^{n} \left( \omega \right)}}{2}} }}{{1 + 2\alpha \left( {\omega - \omega_{k}^{n} } \right)^{2} }}$$

(3)

In the formula, both i and n are arbitrary values obtained on behalf of different parameters;$\omega$ represents the relevant parameters of signal transformation from time domain to t frequency domain;$\mathop u\limits^{ \wedge }$_, $\mathop f\limits^{ \wedge } \left( \omega \right)$ and $\mathop \lambda \limits^{ \wedge } \left( \omega \right)$ are $u$_, $f\left( \omega \right)$ and $\lambda \left( \omega \right)$ after Fourier transform. Updates $\omega_{k}^{n + 1}$ and $\lambda_{k}^{n + 1}$, When a specific convergence criterion is satisfied, the loop iteration is terminated, the convergence criterion is shown in the following formula (4).

$$\frac{{\sum\limits_{k} {\left\| {\hat{u}_{k}^{n + 1} - \hat{u}_{k}^{n} } \right\|_{2}^{2} } }}{{\left\| {\hat{u}_{k}^{n} } \right\|_{2}^{2} }} < \varepsilon$$

(4)

In the formula, ε denotes the convergence precision, which is the termination condition of the iteration. Finally, the wind speed sequence is decomposed into K narrow-band IMF components. The essence of VMD decomposition is to minimize the sum of the center frequency bandwidth of each modal component, and the sum of all modal components should be consistent with the original signal. Therefore, the penalty factor $\alpha$ and the modal decomposition number k are the key parameters of VMD decomposition performance. The selection of the appropriate modal number parameter k is particularly important for VMD, which determines how many modes are included in the decomposition results. If the k value of the modal component is too small, it may lead to incomplete signal decomposition, and multiple real modes are mixed together, resulting in modal aliasing. If the modal component k value is too large, it may lead to the emergence of false modes. The penalty factor α determines the size of the bandwidth of the modal component. The greater its magnitude, the narrower the bandwidth of the modal component will be, and some signal information is liable to be lost within the decomposed signal. Conversely, the smaller its value, the wider the bandwidth of the modal component becomes, which is inclined to incorporate signals from other components and thereby give rise to noise interference.

The above two parameters are usually selected according to empirical values, and it is difficult to select parameters based on artificial experience. To this end, the AOA optimization algorithm is used to adaptively optimize the two key parameters of VMD (penalty factor α and modal decomposition number k).Optimizing the VMD combination with the minimum envelope entropy as the objective function, the larger the entropy value, the more noise and the fewer features; thus, it is hoped that the lower the entropy, the better.

The Archimedes Optimization Algorithm (AOA) is a heuristic optimization algorithm based on population, which was put forward by Hashim and his colleagues in 2020³². The Archimedes Optimization Algorithm (AOA) takes all the individuals submerged in the liquid as a group, and mimics the buoyancy relationship that occurs when objects immersed in the liquid collide. During the iterative process, the density, volume, and acceleration of each individual are continuously adjusted to enable the individual to reach an equilibrium state. Meanwhile, the individuals with strong fitness guide the entire group to converge towards the optimal position, thereby fulfilling the purpose of optimization. In comparison to traditional optimization algorithms, this algorithm possesses characteristics such as fewer control parameters, a certain degree of randomness, and strong adaptability. Simultaneously, it has the merits of a robust local search ability and high optimization accuracy. However, it also suffers from drawbacks including a weak global search ability, low solution accuracy, and a tendency to get trapped in local optima.

AOA divides the optimization process into two stages: global exploration and local search development. If the object does not collide, the algorithm enters the global exploration stage; otherwise, it enters the local search development stage. The transfer operator (TF) is used for the switching of two stages. In the optimization algorithm, t and $t_{\max }$ represent the current number of iterations and the maximum number of iterations, respectively. If TF ≤ 0.5, the AOA algorithm enters the global exploration stage, otherwise enters the local search development stage.

Sparrow search algorithm

Xue and his colleagues proposed the Sparrow Search Algorithm (SSA) in 2020³³, a swarm intelligence optimization algorithm, inspired by the behavior of sparrows while searching for food. The algorithm possesses a powerful global search capability, being capable of promptly escaping from local extrema. It also exhibits the merits of strong optimization proficiency, rapid convergence speed, and high convergence accuracy. During the process of foraging for food, the sparrow population adheres to the discoverer-follower model and is equipped with an investigation and early warning mechanism. In the sparrow population, the individual that locates the food source is regarded as the discoverer, while the remaining individuals are considered as the followers. Simultaneously, a certain proportion of individuals within the population are chosen for risk investigation. Should any risk factors be detected, the food will be abandoned. The algorithm steps are as follows:

(1) Population Initialization. We assume that the dimension of the sparrow population position matrix is Dim, and the initial solution X of N sparrows is randomly initialized. The population position is as follows (5):

$$X = \left[ {\begin{array}{*{20}c} {x_{1,1} } & {x_{1,2} } & \cdots & \ldots & {x_{1,d} } \\ {x_{2,1} } & {x_{2,2} } & \cdots & \cdots & {x_{2,d} } \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ {x_{n,1} } & {x_{n,2} } & { \cdot \cdot \cdot } & { \cdot \cdot \cdot } & {x_{n,d} } \\ \end{array} } \right]$$

(5)

where $x_{n,d}$ is the location of the sparrow, n is the number of sparrows in the population size, n = 1,2,…, N, d are the number or dimension of variables that need to be optimized, d = 1,2,…, Dim. The fitness vector of sparrow is as follows (6).

$$F_{X} = \left[ {\begin{array}{*{20}c} {f\left( {\left[ {\begin{array}{*{20}c} {x_{1,1} } & {x_{1,2} } & \cdots & \cdots & {x_{1,d} } \\ \end{array} } \right]} \right)} \\ {f\left( {\left[ {\begin{array}{*{20}c} {x_{2,1} } & {x_{2,2} } & \cdots & \cdots & {x_{2,d} } \\ \end{array} } \right]} \right)} \\ \vdots \\ {f\left( {\left[ {\begin{array}{*{20}c} {x_{n,1} } & {x_{n,2} } & \cdots & \ldots & {x_{n,d} } \\ \end{array} } \right]} \right)} \\ \end{array} } \right]$$

(6)

Among them,$F_{X}$ is the population fitness function matrix, and the value of each row represents the fitness of the sparrow individual.

(2) The location of the discoverer is updated. The task of the discoverer is to provide the location and direction of food for the entire sparrow population, and it should be given greater global foraging search capabilities. During each iteration, the finder position $X_{i,j}^{n + 1}$ update formula is shown in (7):

$$X_{i,j}^{n + 1} = \left\{ {\begin{array}{*{20}c} {X_{i,j}^{n} \exp \left( {\frac{ - i}{{\alpha iter_{{{\text{max}}}} }}} \right),} & {R < ST} \\ {X_{i,j}^{n} + QL,} & {R \ge ST} \\ \end{array} } \right.$$

(7)

where $X_{i,j}^{n}$ denotes the position of the i-th sparrow in the j-th dimension at the n-th iteration;$iter_{\max }$ is the maximum iteration count;$\alpha \in \left( {0,1} \right]$ and Q follow a normal distribution; L represents the $1 \times Dim$ matrix;$R \in \left[ {0,1} \right]$ indicates the warning value; and $ST \in \left[ {0.5,1} \right]$ signifies the safety threshold.

When $R < ST$, it indicates a safe state where the surrounding search environment is free of threats, allowing discoverers to conduct extensive search operations. If $R \ge ST$, it signifies that certain sparrows within the population have detected danger and issued alerts to others. In this scenario, the entire sparrow population must promptly relocate to alternative secure areas for foraging.

(3) The follower 's position is updated. For the follower, the discoverer will be monitored at all times during the search process. Once the discoverer is observed to search for better food, it will immediately leave the existing position to grab food. The follower location update formula is shown in (8):

$$X_{i,j}^{n + 1} = \left\{ {\begin{array}{*{20}c} {Q \cdot \exp \left( {\frac{{X_{{{\text{worst}}}}^{n} - X_{i,j}^{n} }}{{i^{2} }}} \right),} & {i > \frac{n}{2}} \\ {X_{B}^{n + 1} + \left| {X_{i,j}^{n + 1} - X_{B}^{n + 1} } \right| \cdot l \cdot A^{ + } ,} & {{\text{otherwise}}} \\ \end{array} } \right.$$

(8)

Here,$X_{B}^{n + 1}$ represents the optimal position occupied by the discoverer in the n + 1 generation ;$X_{{{\text{worst}}}}^{n}$ denotes the position of the global worst individual in the current n generation;$A$ is a $1 \times Dim$ random matrix, and $A^{ + } = A^{T} \left( {AA^{T} } \right)^{ - 1}$.When $i > \frac{n}{2}$, it indicates that the i-th participant with lower fitness has not obtained food and is in a state of hunger. It needs to search for food elsewhere to obtain more energy search for food elsewhere to obtain more energy.

(4) Update of the location of the vigilante. The update formula for the sparrow population, with its initial position randomly generated and accounting for 10% to 20% of the population, is depicted in (9)—assuming the dangerous sparrows.

$$X_{i,j}^{n + 1} = \left\{ {\begin{array}{*{20}c} {X_{{{\text{best}}}}^{n} + \beta \left| {X_{i,j}^{n} - X_{{{\text{best}}}}^{n} } \right|,} & {f_{i} > f_{k} } \\ {X_{i,j}^{n} + k\frac{{\left| {X_{i,j}^{n} - X_{{{\text{worst}}}}^{n} } \right|}}{{\left( {f_{i} - f_{{{\text{worst}}}} } \right) + \varepsilon }},} & {f_{i} = f_{k} } \\ \end{array} } \right.$$

(9)

where $X_{best}^{n}$ denotes the current nth-generation global optimal value with higher security.$\beta$ is the step length control coefficient that follows the normal distribution; $k \in \left[ { - 1,1} \right]$ is a random number with either a plus or minus sign indicates the sparrow’s direction of movement.$f_{wrost} ,f_{i}$ and $f_{g}$ represent the global worst fitness value, current fitness value, and global optimal fitness value, respectively; $\varepsilon$ is the smallest constant. When $f_{i} > f_{k}$, it means that the sparrow is at the edge of the population and is vulnerable to predator attack. When $f_{i} = f_{k}$, it indicates that the sparrow is located in the middle of the population and is aware of the danger, while the sparrow moves closer to other individuals to avoid being attacked by predators and reduce the risk of being preyed upon.

Long short-term memory

LSTM is a special recurrent neural network, which is especially suitable for the processing and prediction of time series data. It was first proposed by Hochreiter and Schmidhuber in 1997³⁴.Composed mainly of three parts—input, hidden and output—the neural network can effectively combat gradient explosion or phased gradient disappearance due to the augmentation of network layer number³⁵.Different from other traditional recurrent neural networks, the essence of long-term and short-term memory neural networks is to introduce controllable self-circulation by adding memory unit modules to the hidden layer structure, so that information can be stored and inherited for a long time. The cell state is to be set up in the concealed layer, and the input, forgetting, and output gates are employed to manage and safeguard it.

The input gate is responsible for deciding to retain the current input to the current unit state, the sigmoid layer determines the update value, and the tanh layer is responsible for sending the update information to the cell state. The forgetting gate is responsible for deciding to retain the unit state from the previous moment to the current moment, that is, to decide what information is discarded from the cell state. The sigmoid layer determines the output of the cell state, which is then processed by the tanh layer. The output part is determined by multiplying the output of the sigmoid layer with the number of outputs at the present moment, as determined by the output gate.

The basic calculation steps of the LSTM algorithm are as follows:

Step 1: The forgotten gate is used to filter out the information that needs to be lost, and the forgotten gate output is updated. The update formula is shown in (10):

$$f_{t} = \sigma \left( {W_{f} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{f} } \right)$$

(10)

The forget gate receives the previous output $h_{t - 1}$ and current input $x_{t}$, then outputs the forget matrix $f_{t}$, which determines the retention extent of the previous cell state $C_{t - 1}$.where $h_{t - 1}$ denotes the state at the previous time step, $x_{t}$ represents the new input at the current timestep,$W_{f}$ is the weight matrix of the forget gate,$b_{f}$ is the bias vector of the forget gate, and $\sigma \left( \cdots \right)$ denotes the sigmoid activation function.

From the new information, Step 2 ascertains what data is retained in the cell state, and then produces the data to be modified, as (11) and (12) illustrate.

$$i_{t} = \sigma \left( {W_{i} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{i} } \right)$$

(11)

$$\tilde{C}_{t} = \tanh \left( {W_{c} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + b_{c} } \right)$$

(12)

$W_{i} ,W_{c}$ and $b_{i} ,b_{c}$ are the weights and thresholds corresponding to the input gate σ and the tanh activation function, respectively. tan ℎ(…) is a hyperbolic tangent function.

Step 3: The cell state is updated, and the cell state $C_{t - 1}$ at the previous moment is combined with the new candidate cell state $\mathop {C_{t} }\limits^{ \sim }$.

$$C_{t} = f_{t} * C_{t - 1} + i_{t} * \mathop {C_{t} }\limits^{ \sim }$$

(13)

Here, $f_{t}$ represents the forgetting matrix, $C_{t - 1}$ is the state of the previous moment, $i_{t}$ is the digital unit that determines to be updated, and $\mathop {C_{t} }\limits^{ \sim }$ is the candidate state.

Step 4: Based on the cell state, the output gate output information is updated.

$$o_{t} = \sigma \left( {W_{o} \left[ {h_{t - 1} ,x_{t} } \right] + b_{o} } \right)$$

(14)

$$h_{t} = o_{t} * \tanh \left( {C_{t} } \right)$$

(15)

The output matrix $o_{t}$ is generated by the output gate (sigmoid layer), which determines the output of the current state $C_{t}$. The state $C_{t}$ is multiplied by the $o_{t}$ through the tanh layer, and the output $h_{t}$ is obtained.

The LSTM schematic diagram shown in Fig. 1 clearly illustrates the interactions between the gating mechanisms and data flows, which helps to intuitively understand its internal computational logic.

The parameter training of LSTM follows the back propagation algorithm. The main steps are as follows:

The first step: calculate the output value of each neuron forward;

The second step: determine the optimization objective function, calculate the error of neurons, and construct the loss function.

The third step: update the network weights according to the gradient of the loss function.

The VMD-SSA-LSTM model

A VMD-SSA-LSTM ultra-short-term wind speed prediction model, constructed from VMD, SSA and LSTM models, is established based on the above analysis and research. The specific prediction process is shown in Fig. 2, the specific steps are as follows:

Beginning with Step 1, the wind speed sample data set is procured and the model input is chosen from the wind speed data of the initial n months.

Step 2: The relevant parameters of VMD are optimized by AOA optimization algorithm, and the optimized k value and α value is selected. The optimized VMD decomposes the wind speed signal into a series of single stable signals IMF(1), IMF( 2), …, IMF( n) with different characteristics;

The SSA-LSTM model was established by selecting the root mean square error ( RMSE), mean absolute error (MAE), and mean absolute percentage error as the optimization objective function. To achieve this, the sparrow population size N and the maximum number of iterations M were set, as well as the number of neural units in the hidden layer H, the number of training iterations E, and the initial learning rate η.

Step 4: Each modal component is input into the SSA-LSTM prediction model to obtain k prediction models;

At last, the k prediction models’ predicted values are amalgamated to acquire the ultra-short-term wind speed’s predicted values, and then compared to the single LSTM model and VMD-LSTM model’s training outcomes.

Case study

Dataset description

The wind speed data set provided by the National Data Buoy Center (NDBC) is taken as an example for analysis. The specific data collection is from site 41008, 40 nautical miles southeast of Savannah, Georgia, USA. The wind speed data for the week from 0: 00 on January 1, 2022 to 23: 50 on January 7, 2022 is selected, and it is collected every 10 min, a total of 1008 sampling data.

Parameter setting

In order to facilitate analysis and comparison, this paper chooses three schemes for experimental comparison. Firstly, using a single LSTM for wind speed prediction does not involve variational mode decomposition of the wind speed dataset or optimization of LSTM parameters; Secondly, the VMD-LSTM model is selected to predict the wind speed. The wind speed data is decomposed by VMD, and the wind speed is predicted directly by LSTM after decomposition. Finally, the VMD-SSA-LSTM model is selected. The AOA optimization algorithm optimizes the VMD, and the decomposed data is then obtained from the optimized VMD. Subsequently, SSA optimizes the LSTM parameters to acquire the most advantageous ones, and the LSTM with the optimal parameters forecasts the wind speed.

The wind speed data of the first 4 h is used to predict the wind speed at the next moment. The first 70% of the data set is used as the training set, and the last 30% of the data is used as the test set. After optimization by AOA, the number of VMD modal components, k is 4, and the penalty factor, $\alpha$ is 1361. The decomposed modal components are shown in Fig. 3.

The VMD-SSA-LSTM model has a population of sparrows of 20, a maximum of 5 iterations, 20% of which are discoverers, and a warning value of 0.6—all of which are set to 35, 35, and 0.005 hidden layer neurons, training iterations, and initial learning rate respectively. The evaluation of root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) was conducted after SSA optimization, which yielded 91 hidden layer neurons,190 training iterations, an initial learning rate of 0.016706 and a regularization coefficient of 0.4.

The square root of the difference between the predicted and actual value is the root mean square error. The lower the RMSE, the more fitting the model and the better the prediction effect. Its expression is as follows (16).

$$RMSE = \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left( {\mathop y\limits^{ \wedge }_{i} - y_{i} } \right)^{2} } }$$

(16)

$y_{i}$ represents the actual value of wind speed, $\mathop y\limits^{ \wedge }_{i}$ represents the predicted value of wind speed, and n represents the number of test samples.

The expression of mean absolute error (MAE) is shown in formula (17): the greater the MAE, the more successful the prediction effect, which is the average of the absolute discrepancy between the predicted and actual values.

$$MAE = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left| {\mathop y\limits^{ \wedge }_{i} - y_{i} } \right|}$$

(17)

The absolute average difference between the actual and predicted values is known as the average absolute percentage error, which is used to assess the precision of the prediction. The lower the value, the closer it gets to 0, signifying that the prediction effect is more effective. It is usually used together with MAE and RMSE as a measure of evaluation indicators.

$$MAPE = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left| {\frac{{\mathop y\limits^{ \wedge }_{i} - y_{i} }}{{y_{i} }}} \right|} \times 100\%$$

(18)

Results and discussion

Ablation experiment

In this section, the experimental analysis of the effectiveness of the VMD-SSA-LSTM model for wind speed forecasting will be presented. The three models of LSTM, VMD-LSTM and VMD-SSA-LSTM are used for prediction. The prediction results are shown in Table 1, Fig. 4 and Fig. 5.

Table 1 Result comparison between 3 model prediction methods.

Full size table

It can be seen from Table 1 that the accuracy of single LSTM prediction is about 62%. It can be seen that if the LSTM parameters are not optimized, the prediction effect is poor.VMD-LSTM prediction’s relative error is around 26%, with a prediction accuracy of 73% when compared to the single LSTM prediction model. The prediction accuracy is increased by 11%, suggesting that VMD decomposition has an effect on the augmentation of accuracy. The prediction accuracy of VMD-SSA-LSTM optimized by AOA is about 97%, which is 24% higher than that of VMD-LSTM model.

An examination of the VMD-SSA-LSTM combined prediction model, as proposed in this paper, demonstrates that it is capable of resolving the issue of intricate data and the arduous optimization of manual adjustment in the forecasting procedure based on past wind speed information.The combined model of ultra-short-term wind speed prediction, proposed in this paper, is significantly more precise than the single prediction model, and serves as a basis for further enhancing wind power prediction.

Comparison of prediction models

This section presents a detailed analysis of the VMD-SSA-LSTM prediction model. To evaluate predictive performance and adaptability, three hybrid models—VMD-SSA-LSTM, VMD-SSA-GRU, and VMD-SSA-BP—were employed for comparative forecasting. The corresponding prediction results are illustrated in Table 2, Fig. 6, and Fig. 7.

Table 2 Result comparison between 3 model prediction methods.

Full size table

As shown in Table 2, the VMD-SSA-BP model exhibits relatively low prediction accuracy, with a MAPE of 9.4374% on the test set. This indicates that the traditional BP neural network is insufficient in handling complex and nonlinear wind power time series data. In contrast, the VMD-SSA-GRU model reduces the test set MAPE to 3.3831%, suggesting that GRU possesses a stronger capability in modeling temporal dependencies.

Further analysis reveals that the VMD-SSA-LSTM model achieves the lowest MAPE on the test set, reaching only 3.05349%, and demonstrates the best performance among the three models. Compared with the other two models, its prediction error is significantly reduced, indicating that the introduction of VMD decomposition and SSA-based parameter optimization substantially enhances the prediction accuracy and robustness of the LSTM model in wind power forecasting tasks.

An analysis of the VMD-SSA-LSTM hybrid prediction model proposed in this paper shows that it can effectively address the challenges posed by nonlinear and non-stationary characteristics in wind power forecasting, while also overcoming the limitations of traditional models that rely heavily on manual parameter tuning. The results demonstrate that the proposed hybrid model exhibits superior prediction performance and generalization ability in ultra-short-term wind power forecasting, providing strong technical support for high-precision wind power prediction.

Comparison of optimization algorithms

To validate the superiority of the VMD-SSA-LSTM model, other optimization algorithms were selected to replace the Sparrow Search Algorithm (SSA) and compared horizontally with the original model. Experiments were conducted using the same dataset and computational environment, with results as shown in the Table 3 below:

Table 3 Comparison of results from the three models.

Full size table

To validate the core contribution of the Sparrow Search Algorithm (SSA), a controlled variable experiment was designed: maintaining the VMD-LSTM structure while replacing the optimizer to test performance. The results demonstrate that SSA comprehensively outperforms alternatives with an RMSE of 0.2127 and MAPE of 3.34%. Compared to Particle Swarm Optimization (PSO), SSA achieves a 50.3% reduction in RMSE—attributed to its hierarchical optimization mechanism: Discoverers broaden the solution space via global exploration (rapid error decline in first 2 iterations);Followers refine parameters through local exploitation (convergence within 3 subsequent iterations). In contrast, PSO remains trapped in local optima after 18 iterations due to its fixed inertia weight. Although the Grey Wolf Optimizer (GWO) enhances search efficiency through its α/β/γ leadership hierarchy, premature leadership solidification results in an MAE of 0.291 (81.6% higher than SSA). Leveraging its bio-inspired vigilance mechanism and hierarchical collaboration strategy, SSA significantly outperforms PSO, GWO, and other optimizers in both efficiency and accuracy for LSTM parameter optimization.

Conclusion

This article proposes a combined wind speed prediction model, incorporating AOA, VMD, SSA, and LSTM, in order to address the issue of low accuracy in wind speed prediction, due to the variability, fluctuation, and randomness of wind speeds in wind farms. It is then applied to actual ultra-short-term wind speed prediction. The innovation and contribution of this work are presented below.

1.
Utilizing the sparrow search algorithm (SSA), the SSA-LSTM ultra-short-term wind speed prediction model is constructed to address the difficulty of optimizing the parameters of LSTM in this model.
2.
The Archimedean optimization algorithm (AOA) is employed to refine the penalty factor and modal decomposition number of VMD’s essential parameters, thereby enhancing the precision of ultra-short-term wind speed forecasting.
3.
A combination model of ultra-short-term wind speed prediction based on VMD-SSA-LSTM is proposed. Analysis of the example reveals that the combination model significantly enhances the precision of ultra-short-term wind speed forecasting.

In this paper, the influence of other factors on wind speed is not considered. In the future, ultra-short-term wind speed prediction can be carried out by considering season, weather change, temperature, geographical location and so on, so as to improve the schedulability of wind power.

Data availability

The research data required for this paper is sourced from publicly available measured wind speed data from the National Data Buoy Center (NDBC) in the United States. https://www.ndbc.noaa.gov/historical_data.shtml.

References

Ma, Y., Li, Y. P. & Huang, G. H. Planning China’s non-deterministic energy system (2021–2060) to achieve carbon neutrality. Appl. Energy 334, 120673 (2023).
Article CAS Google Scholar
Cao, Y. et al. Life cycle environmental analysis of offshore wind power: A case study of the large-scale offshore wind farm in China. Renew. Sustain. Energy Rev. 196, 114351 (2024).
Article CAS Google Scholar
Sun, L., Yin, J. & Bilal, A. R. Green financing and wind power energy generation: Empirical insights from China. Renew. Energy 206, 820–827 (2023).
Article Google Scholar
Liu, Y. et al. Probabilistic spatiotemporal wind speed forecasting based on a variational Bayesian deep learning model. Appl. Energy 260, 114259 (2020).
Article Google Scholar
Zhang, N. et al. Modeling conditional forecast error for wind power in generation scheduling. IEEE Trans. Power Syst. 29(3), 1316–1324 (2013).
Article CAS ADS Google Scholar
Lipu, M. S. H. et al. Artificial intelligence based hybrid forecasting approaches for wind power generation: Progress, challenges and prospects. IEEE Access 9, 102460–102489 (2021).
Article Google Scholar
Hu, W. et al. A novel two-stage data-driven model for ultra-short-term wind speed prediction. Energy Rep. 8, 9467–9480 (2022).
Article Google Scholar
Valdivia-Bautista, S. M. et al. Artificial intelligence in wind speed forecasting: A review. Energies 16(5), 2457 (2023).
Article Google Scholar
Wu, X. et al. CEEMDAN-SE-HDBSCAN-VMD-TCN-BiGRU: A two-stage decomposition-based parallel model for multi-altitude ultra-short-term wind speed forecasting. Energy. 330 (2025).
Wang, Y. et al. A review of wind speed and wind power forecasting with deep neural networks. Appl. Energy 304, 117766 (2021).
Article Google Scholar
Jiang, P., Liu, Z., Niu, X. & Zhang, L. A combined forecasting system based on statistical method, artificial neural networks, and deep learning methods for short-term wind speed forecasting. Energy 217, 119361 (2021).
Article Google Scholar
Li, L.-L. et al. Improved tunicate swarm algorithm: Solving the dynamic economic emission dispatch problems. Appl. Soft Comput. 108, 107504 (2021).
Article Google Scholar
Bentsen, L. Ø. et al. Spatio-temporal wind speed forecasting using graph networks and novel Transformer architectures. Appl. Energy 333, 120565 (2023).
Article Google Scholar
Wan, C. et al. Probabilistic forecasting of wind power generation using extreme learning machine. IEEE Trans. Power Syst. 29(3), 1033–1044 (2013).
Article ADS Google Scholar
Wang, J. et al. A wind speed forecasting system for the construction of a smart grid with two-stage data processing based on improved ELM and deep learning strategies. Expert Syst. Appl. 241, 122487 (2024).
Article Google Scholar
Lahouar, A. & Slama, J. B. H. Hour-ahead wind power forecast based on random forests. Renew. Energy 109, 529.e41 (2017).
Article Google Scholar
Hu, Z. et al. Improved multistep ahead photovoltaic power prediction model based on LSTM and self-attention with weather forecast data. Appl. Energy 359, 122709 (2024).
Article Google Scholar
Jiang, W. et al. Applicability analysis of transformer to wind speed forecasting by a novel deep learning framework with multiple atmospheric variables. Appl. Energy 353, 122155 (2024).
Article Google Scholar
Zheng, Y. et al. Technical indicator enhanced ultra‐short‐term wind power forecasting based on long short‐term memory network combined XGBoost algorithm. IET Renew. Power Gener. (2024).
Zhao, Z. et al. Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features. Eng. Appl. Artif. Intell. 121, 105982 (2023).
Article Google Scholar
Zhang, H. et al. Point and interval wind speed forecasting of multivariate time series based on dual-layer LSTM. Energy 294, 130875 (2024).
Article Google Scholar
Shahid, F., Zameer, A. & Muneeb, M. A novel genetic LSTM model for wind power forecast. Energy 223, 120069 (2021).
Article Google Scholar
Tuerxun, W. et al. An ultra-short-term wind speed prediction model using LSTM based on modified tuna swarm optimization and successive variational mode decomposition. Energy Sci. Eng. 10(8), 3001–3022 (2022).
Article Google Scholar
Hua, L. et al. Integrated framework of extreme learning machine (ELM) based on improved atom search optimization for short-term wind speed prediction. Energy Convers. Manag. 252, 115102 (2022).
Article Google Scholar
Ai, C. et al. Chaotic time series wind power prediction method based on OVMD-PE and improved multi-objective state transition algorithm. Energy 278, 127695 (2023).
Article Google Scholar
Elsaraiti, M. & Merabet, A. A comparative analysis of the arima and lstm predictive models and their effectiveness for predicting wind speed. Energies 14(20), 6782 (2021).
Article Google Scholar
Gao, X. et al. Short-term wind power forecasting based on SSA-VMD-LSTM. Energy Rep. 9, 335–344 (2023).
Article Google Scholar
Zhang, Y. et al. Short-term wind speed prediction model based on GA-ANN improved by VMD. Renew. Energy 156, 1373–1388 (2020).
Article Google Scholar
Liu, W. et al. A wind speed forcasting model based on rime optimization based VMD and multi-headed self-attention-LSTM. Energy. 130726 (2024).
Hu, H. et al. Rolling decomposition method in fusion with echo state network for wind speed forecasting. Renew. Energy 216, 119101 (2023).
Article Google Scholar
Konstantin, D. & Dominique, Z. Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014).
Article MathSciNet Google Scholar
Hashim, F. A. et al. Archimedes optimization algorithm: a new metaheuristic algorithm for solving optimization problems. Appl. Intell. 51, 1531–1551 (2021).
Article Google Scholar
Xue, J. & Shen, B. A novel swarm intelligence optimization approach: sparrow search algorithm. Syst. Sci. Control Eng. 8(1), 22–34 (2020).
Article Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).
Article CAS PubMed Google Scholar
Yu, R. et al. LSTM-EFG for wind power forecasting based on sequential correlation features. Future Gener. Comput. Syst. 93, 33–42 (2019).
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering, Yancheng Institute of Technology, Yancheng, 224051, Jiangsu, China
Shunqing Xu
State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin, 300401, China
Jincai Niu
School of Electronic Engineering, Chaohu University, Hefei, 238024, Anhui, China
Jincai Niu

Authors

Shunqing Xu
View author publications
Search author on:PubMed Google Scholar
Jincai Niu
View author publications
Search author on:PubMed Google Scholar

Contributions

Prof. Jincai Niu participated in the overall paper structure design, manuscript writing, and program development throughout the entire research process. Prof. Shunqing XU contributed to the writing of Chapters 1 and 2, and critically reviewed, revised, and provided intellectual input into the design of the entire manuscript.

Corresponding author

Correspondence to Jincai Niu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, S., Niu, J. A novel combination model for ultra-short-term wind speed prediction. Sci Rep 15, 36666 (2025). https://doi.org/10.1038/s41598-025-20497-6

Download citation

Received: 12 May 2025
Accepted: 15 September 2025
Published: 21 October 2025
Version of record: 21 October 2025
DOI: https://doi.org/10.1038/s41598-025-20497-6