Sequence to sequence architecture based on hybrid LSTM global and local encoders approach for meteorological factors forecasting

Sun, Guoqiang; Zhao, Yang; Qi, Xiaoyan

doi:10.1038/s41598-025-08331-5

Download PDF

Article
Open access
Published: 02 July 2025

Sequence to sequence architecture based on hybrid LSTM global and local encoders approach for meteorological factors forecasting

Guoqiang Sun^1,3,
Yang Zhao¹ &
Xiaoyan Qi^2,4

Scientific Reports volume 15, Article number: 22753 (2025) Cite this article

811 Accesses
1 Citations
Metrics details

Subjects

Abstract

Accurate prediction of meteorological factors is critical across various domains such as agriculture, disaster management, and climate research. Traditional models, such as Numerical Weather Prediction (NWP), often face limitations in capturing highly non-linear and chaotic weather patterns, particularly at finer temporal and spatial scales, and they require substantial computational resources. This study introduces a deep learning model, the Hybrid LSTM Global-Local Encoder (H-LSTM-GLE), designed to enhance predictive accuracy in meteorological factors prediction. The H-LSTM-GLE model leverages a local encoder with a a sliding window mechanism, a global encoder for secondary encoding, and a state vector calculation module to improve forecasting precision. When benchmarked against ten baseline models across two datasets, relative humidity (SML2010-Hum) and outdoor temperature (SML2010-outTem), the H-LSTM-GLE model consistently outperforms its conterparts. Ablation studies further validate the model’s enhanced performance, attributing the improvements to the synergistic integration of both local and global encoders. This study advances the theoretical framework of sequence to sequence models and offers practical implications of hybrid architectures in achieving high-accuarcy meteorological factors forecasts.

Enhanced streamflow forecasting using hybrid modelling integrating glacio-hydrological outputs, deep learning and wavelet transformation

Article Open access 22 January 2025

Analysis of environmental factors using AI and ML methods

Article Open access 02 August 2022

Transformer based models with hierarchical graph representations for enhanced climate forecasting

Article Open access 02 July 2025

Introduction

Accurate foercasting of meteorological factors is paramount in various fields, including agriculture, disaster management, and climate research^1,2. Reliable weather forecasts can help mitigate the impacts of extreme weather events, optimize agricultural planning, and improve our understanding of climate change. Traditional Numerical Weather Prediction (NWP) models, which solve complex physical equations describing atmospheric processes, have long been the cornerstone of weather forecasting^3,4. However, these models are often limited in their ability to capture highly non-linear and chaotic weather patterns, particularly at finer temporal and spatial scales, and they require significant computational resources⁵.

Recently, deep learning has emerged as a promising approach to address these challenges⁶. In training and optimizing deep learning models for meteorological prediction, the role of big data is crucial⁷. Extensive meteorological datasets from satellites, ground stations, and other platforms, provide a rich foundation for developing sophisticated models⁸. Among various deep learning techniques, Recurrent Neural Networks (RNNs) have attracted considerable attention and rapid development due to their powerful capabilities in modelling sequential data. However, traditional RNNs suffer from inherent limitations, such as short-term memory and the vanishing gradient problem, which hinder their ability to capture long-term dependencies^7,8. These long-term dependencies are crucial for modelling historical patterns over long periods of time, a key factor in accurate weather forecasting. In this context, Long Short-Term Memory (LSTM) networks, a variant of RNNs, effectively address these challenges by enabling the model to capture long-term dependencies within the data⁹. Additionally, LSTMs are particularly well-suited to modelling temporal dependencies, a feature essential for sequential data such as meteorological variables, where future values are strongly influenced by past observations. The ability of LSTMs to mitigate the vanishing gradient problem and retain critical long-term information makes them particularly effective for time-series forecasting tasks, such as predicting meteorological factors.

Presently, LSTMs have been successfully applied to numerous time series prediction tasks, including language modelling¹⁰, speech recognition¹¹, and financial forecasting¹². Their capacity to handle temporal dependencies makes them particularly suitable for meteorological forecasting applications. Several studies have highlighted the effectiveness of LSTM networks in forecasting meteorological factors. For examples, Luo et al.¹³ proposed the Reconstitute Spatiotemporal LSTM (RST-LSTM) based on Convolutional Recurrent Neural Network (ConvRNN) for precipitation nowcasting. Similarly, Suleman et al.¹⁴ introduced a Spatial Feature Attention Long Short-Term Memory (SFA-LSTM) model to accurately capture spatial and temporal relationships of multiple meteorological features for temperature forecasting. Additionally, Subbiah et al.¹⁵ developed a Bi-directional LSTM with Boruta Feature Selection (BFS-Bi-LSTM) model to improve wind speed forecasting.

Sequence to Sequence (Seq2Seq), in which LSTMs are commonly used as encoders and decoders, converts input sequences into output sequences using methods that involve an encoder, a decoder, and a state vector¹⁶. While LSTM-based Seq2Seq architectures address the data vanishing/exploding gradient problem, they still face limitations in analyzing high-dimensional and complex data structures: (1) In meteorological forecasting, long-range dependencies, such as the influence of past weather conditions on current forecasts, are essential. However, the encoder struggles to capture these dependencies due to gradient problems, resulting in poor retention of early sequence information and excessive sensitivity to late sequence data^17,18. This reduces accuracy as early weather data (e.g., initial temperature or humidity) significantly influences future forecasts¹⁹. (2) Meteorological data is dynamic, with sudden changes in atmospheric pressure or wind patterns that can alter forecasts. However, the encoder in traditional Seq2Seq models encodes the sequence once and reuses this fixed representation throughout decoding, limiting flexibility and the ability to adapt to evolving weather patterns^20,21.

Considering these challenges, we propose a novel hybrid coding model for accurate forecasting of meteorological factors. Our approach aims to address the limitations of traditional LSTM-based Seq2Seq models. The local encoder with sliding window and the attention mechanism are used to solve the problem that the model has different sensitivity to the start and end information of the sequence. The two-layer hybrid coding structure of global encoder and local encoder is used to solve the problem of poor flexibility caused by the one-time coding of the traditional Seq2Seq architecture. The detailed descriptions of numerous acronyms and abbreviations in this study are provided in Table S1. The main research contributions of our work are as follows:

We propose a novel Seq2Seq architecture based on a Hybrid LSTM-Global-Local Encoder (H-LSTM-GLE) for accurate meteorological forecasting. To our knowledge, this is the first meteorological model that integrate both global and local encoders with attention-based selection of medium- and short-term dependencies, marking a significant advance in meteorological forecasting.
The H-LSTM-GLE model uses a local encoder with a sliding window to capture medium- and short-term dependencies, while the global encoder incorporates broader sequence patterns. This dual-level encoding improves the model’s ability to learn complex temporal and spatial relationships in meteorological data.
A key innovation is the state vector computation module, which uses an attention mechanism to dynamically select relevant medium- and short-term dependencies, allowing the decoder to focus on critical information and capture both long- and short-term relationships in meteorological time series.
The model demonstrates superior performance on real meteorological datasets, including relative humidity (SML2010-Hum) and outdoor temperature (SML2010-outTem), validating its robustness and practical applicability in forecasting complex meteorological variables.

Methodology

Construction of a deep learning model based on a hybrid global-local encoder

As shown in Fig. 1, the Seq2Seq architecture optimization model (H-LSTM-GLE) was developed to improve the prediction accuracy of meteorological factors. The key innovation of the H-LSTM-GLE model is its two-tier hybrid encoding structure, which integrates both local and global encoding mechanisms to effectively capture short-term and long-term dependencies in meteorological time series data. The local encoder focuses on modelling the short- and medium-term temporal dependencies within the input sequence. To achieve this, we use a sliding window approach, where a fixed-length window is moved across the input sequence to capture local patterns in the sequence data. This encoder processes a localized portion of the sequence at a time, emphasizing the immediate, dynamic relationships between the predictor and target variables. In this way, the local encoder effectively captures rapid fluctuations or shorter time trends that are critical for accurate short-term meteorological forecasts.

In contrast, the global encoder captures long-term dependencies by performing secondary encoding on the entire input sequence. This encoder processes the entire input sequence without the constraints of a sliding window, enabling it to integrate a broader historical context. While the local encoder focuses on recent patterns, the global encoder identifies overarching trends and long-term temporal relationships. By synthesizing this global perspective, the model can account for the influence of broader meteorological patterns that persist over longer periods, improving the model’s ability to forecast over longer time horizons. The outputs of the local encoder, representing the captured short- and medium-term dependencies, are passed as features to the global encoder. This step allows the model to synthesize both local dynamic information and global historical context. The global encoder then integrates these features, producing a comprehensive representation of both local and long-term relationships. By combining these two levels of encoding, the model enhances its ability to recognize complex, multi-scale patterns in meteorological data, leading to improved forecast accuracy for both short- and long-term forecasts.

A critical component of the H-LSTM-GLE model is the state vector computation module, which uses an attention mechanism to dynamically select the most relevant dependency information from the input sequence. Specifically, the attention mechanism focuses on the most strongly correlated medium- and short-term dependencies identified by the local encoder, as well as the most important long-term dependencies identified by the global encoder. This dynamic selection ensures that the decoder focuses on the most relevant information and prevents the model from being overwhelmed by less relevant data. Consequently, the attention mechanism ensures that both critical short-term fluctuations and long-term trends are effectively captured in the final predictions. We deliberately chose a fixed window length for the local encoder to capture a consistent set of medium- and short-term temporal dependencies across all sequences. The global encoder, on the other hand, processes the entire input sequence, which inherently covers a wider temporal range. Thus, the two encoders work in complementary ways: the local encoder focuses on fine-grained, recent patterns, while the global encoder synthesizes information over longer periods. The sliding window approach in the local encoder ensures that the model can efficiently handle the temporal scale of short-term fluctuations, while the global encoder provides a holistic view of the broader historical context.

Local encoder operation procedure

Figure 2 illustrates the structure of the local encoder in the H-LSTM-GLE model. Initially, the sliding window mechanism²² is employed to capture the medium- and short-term dependencies $\:{w}_{t}^{l}$ of each window sequence. For each time step $\:t$, a local window $\:{W}_{t}$ of size $\:2m+1$ is defined. This window contains the sequence $\:{X}_{t}=[{x}_{t-m},\dots\:,{x}_{t},\dots\:,{x}_{t+m}]$, encompassing the input $\:{x}_{t}$ at time step $\:t$ along with its preceding and succeeding $\:m$ time steps’ inputs. The computation of the forget gate $\:{f}_{t}^{l}$ within the local encoder units in $\:{W}_{t}$ is depicted in Eq. (1):

$$\:{f}_{t}^{l}=\sigma\:({W}_{f}\cdot\:[{h}_{t-1},{x}_{t}]+{b}_{f})$$

(1)

where the forget gate $\:{f}_{t}^{l}$ determines the information from the previous unit is retained within the local window $\:{W}_{t}$. $\:{W}_{f}$ is the forget gate weight matrix, $\:{b}_{f}$ is the bias term, $\:\sigma\:$ is the sigmoid activation function, and $\:{h}_{t-1}$ is the hidden state of the previous LSTM unit within $\:{W}_{t}$. $\:{f}_{t}^{l}$ is a vector between 0 and 1, which, when multiplied element-wise with the previous cell state $\:{c}_{t-1}$, determines the amount of information to retain. The computation of the input gate $\:{i}_{t}^{l}$ is shown in Eq. (2):

$$\:{i}_{t}^{l}=\sigma\:({W}_{i}\cdot\:[{h}_{t-1},{x}_{t}]+{b}_{i})$$

(2)

where the input gate $\:{i}_{t}^{l}$ controls information from $\:{x}_{t}$ is added to the cell state. $\:{W}_{i}$ is the input gate weight matrix, and $\:{b}_{i}$ is the bias term. The computation of the candidate cell state $\:{\stackrel{\sim}{c}}_{t}^{l}$ is illustrated in Eq. (3):

$$\:{\stackrel{\sim}{c}}_{t}^{l}=\text{t}\text{a}\text{n}\text{h}({W}_{c}\cdot\:[{h}_{t-1},{x}_{t}]+{b}_{c})$$

(3)

where $\:{W}_{c}$ is the weight matrix for the candidate cell state, $\:{b}_{c}$ is the bias term, and tanh is the hyperbolic tangent activation function, producing values in the range of − 1 to 1. $\:{\stackrel{\sim}{c}}_{t}^{l}$ generates new candidate values for updating the cell state. Subsequently, combining the forget gate $\:{f}_{t}^{l}$ and the input gate $\:{i}_{t}^{l}$ updates the cell state, described by Eq. (4):

$$\:{c}_{t}^{l}={f}_{t}^{l}\times\:{c}_{t-1}+{i}_{t}^{l}\times\:{\stackrel{\sim}{c}}_{t}^{l}$$

(4)

The output gate $\:{o}_{t}^{l}$ determines which parts of the cell state within the local window $\:{W}_{t}$ can be passed to the hidden state, as described by Eq. (5):

$$\:{o}_{t}^{l}=\sigma\:({W}_{o}\cdot\:[{h}_{t-1},{x}_{t}]+{b}_{o})$$

(5)

where $\:{W}_{o}$ is the output gate weight matrix, $\:{b}_{o}$ is the bias term.

Finally, the output gate and the cell state jointly determine the update process of the hidden state, as shown in Eq. (6):

$$\:{h}_{t}^{l}={o}_{t}^{l}\times\:\text{t}\text{a}\text{n}\text{h}\left({c}_{t}^{l}\right)$$

(6)

where the hidden state $\:{h}_{t}^{l}$ encompasses the output information at the current time step $\:t$, which will be transmitted to the next LSTM unit at the subsequent time step and can be utilized for the final output of the window $\:{W}_{t}$ sequence. Through the transformation of sequence $\:{X}_{t}$, the resulting hidden state $\:{h}_{t}^{l}$ captures the medium- and short-term dependencies for the local window $\:{W}_{t}$. On one hand, $\:{h}_{t}^{l}$ can be used for subsequent sequence processing, and on the other hand, it can be transmitted to the global encoder, thereby providing the global encoder with the medium- and short-term dependencies of the sequence, $\:{w}_{t}^{l}={h}_{t}^{l}$. Through this calculation method, as the window $\:{W}_{t}$ continually slides, the local encoder can maintain the long-term flow of information and gradients, simultaneously capturing the local dependencies of the sequence within the window.

Global encoder structure in the H-LSTM-GLE model

The structure of the global encoder within the H-LSTM-GLE model is illustrated in Fig. 3, which consists of two types of LSTM units: a standard LSTM unit and an LSTM unit that integrates the short-term dependency $\:{w}_{t}^{l}$ from the local encoder¹⁹. The integration of $\:{w}_{t}^{l}$ into the global encoder enhances its ability to synthesize local dynamic information with global historical context.

The design of the forget gate $\:{f}_{t}^{g}$ for the global encoder unit $\:{h}_{t}^{g}$ is shown in Eq. (7):

$$\:{f}_{t}^{g}=\sigma\:({W}_{f}\cdot\:[{h}_{t-1},{x}_{t},\:{w}_{t}^{l}]+{b}_{f})$$

(7)

where $\:{W}_{f}$ is the weight matrix, and $\:{b}_{f}$ is the bias term. By incorporating $\:{w}_{t}^{l}$, the forget gate can make more informed decisions based on long-term state, current input, and local encoder information, effectively filtering historical information.

Similarly, the input gate $\:{i}_{t}^{g}$ utilizes $\:{w}_{t}^{l}$ to determine how the current input features and and local context should update the cell state, as described in Eq. (8):

$$\:{i}_{t}^{g}=\sigma\:({W}_{i}\cdot\:[{h}_{t-1},{x}_{t},\:{w}_{t}^{l}]+{b}_{i})$$

(8)

where $\:{W}_{i}$ is the weight matrix for the input gate, and $\:{b}_{i}$ is the bias term. Additionally, the calculation process for the candidate cell state is shown in Eq. (9):

$$\:{\stackrel{\sim}{c}}_{t}^{g}=\text{t}\text{a}\text{n}\text{h}({W}_{c}\cdot\:[{h}_{t-1},{x}_{t},{w}_{t}^{l}]+{b}_{c})$$

(9)

where $\:{W}_{c}$ is the weight matrix for the candidate cell state, and $\:{b}_{c}$ is the bias term. By incorporating local information, the candidate cell state generates more accurate cell state updates, reflecting both local and global contexts, as described in Eq. (10):

$$\:{c}_{t}^{g}={f}_{t}^{g}\times\:{c}_{t-1}+{i}_{t}^{g}\times\:{\stackrel{\sim}{c}}_{t}^{g}$$

(10)

This updated cell state simultaneously reflect forgotten old information and new local information, and maintaining long-term memory and enhancing the model’s ability to capture internal dependencies within the time series²². The output gate $\:{o}_{t}^{g}$ is influenced by local information $\:{w}_{t}^{l}$, as shown in Eq. (11):

$$\:{o}_{t}^{g}=\sigma\:({W}_{o}\cdot\:[{h}_{t-1},{x}_{t},\:{w}_{t}^{l}]+{b}_{o})$$

(11)

where $\:{W}_{o}$ is the weight matrix for the output gate, and $\:{b}_{o}$ is the bias term.

Finally, the hidden state $\:{h}_{t}^{g}$ is updated using the output gate and cell state, as described in Eq. (12):

$$\:{h}_{t}^{g}={o}_{t}^{g}\times\:\text{t}\text{a}\text{n}\text{h}\left({c}_{t}^{g}\right)$$

(12)

The final hidden state $\:{h}_{t}^{g}$ blends information from both the local and global encoders. This mixed information is passed to the next time step, influencing the final output sequence. By integrating $\:{w}_{t}^{l}$ into LSTM unit, the global encoder can synthesis local dynamic information with global historical information, providing a comprehensive sequence representation. This integration allows the H-LSTM-GLE model to reflect the combined effects of local and long-term dependencies more accurately, enhancing predication accuracy for meteorological factors such as relative humidity and temperature.

State vector calculation method

In the Seq2Seq architecture, the state vector bridges the encoder and decoder, transmitting information between them^23,24. The state vector $\:{s}_{t}$ is derived using an attention mechanism from the global encoder’s output $\:{h}_{t}^{g}$, the local encoder’s output $\:{h}_{t}^{l}$, and the short-term dependencies $\:{W}^{l}={w}_{1}^{l},{w}_{2}^{l},\dots\:,{w}_{j}^{l}$ generated by each sliding window in the local encoder. For each short-term dependency $\:{w}_{i}^{l}$, the attention weight $\:{\alpha\:}_{ti}$ is calculated as Eq. (13):

$$\:{\alpha\:}_{ti}=\frac{\text{exp}\left(score\left({h}_{t-1}^{d},{w}_{i}^{l}\right)\right)}{\sum\:_{k=1}^{j}\:\text{exp}\left(score\left({h}_{t-1}^{d},{w}_{k}^{l}\right)\right)}$$

(13)

where $\:{\alpha\:}_{ti}$ represents the attention weight of the decoder state at time step $\:t$ for the $\:i$-th short-term dependency, $\:{h}_{t-1}^{d}$ is the decoder hidden state from the previous time step, $\:j$ is the number of sliding windows in the local encoder, and $\:score(\bullet\:)$ is the scoring function. This study employs the additive attention scoring function, as shown in Eq. (14):

$$\:score\left({h}_{t-1}^{d},{w}_{i}^{l}\right)={v}_{a}^{{\top\:}}\text{tanh}\left({W}_{a}{h}_{t-1}^{d}+{U}_{a}{w}_{i}^{l}\right)$$

(14)

where $\:{v}_{a}$, $\:{W}_{a}$, and $\:{U}_{a}$ are learnable weight matrices, and $\:{v}_{a}^{{\top\:}}$ denotes the transpose of $\:{v}_{a}$. The attention weights $\:{\alpha\:}_{ti}$ dynamically select the most relevant short-term dependency information, enabling the decoder to focus on crucial parts of the input sequence for the current output, thereby improving sequence outputs accuracy.

Using the attention weights, the weighted sum of the short-term dependencies forms the context vector $\:{c}_{t}^{a}$, as described in Eq. (15):

$$\:{c}_{t}^{a}=\sum_{i=1}^{j}\:{\alpha\:}_{ti}{w}_{i}^{l}$$

(15)

where the context vector $\:{c}_{t}^{a}$ represents a weighted summary of the local encoder information at different time steps, providing the encoder with a dynamic representation of the input sequence’s current portion and helping to capture local features and dependencies.

Next, the global encoder’s output $\:{h}_{t}^{g}$, the local encoder’s output $\:{h}_{t}^{l}$, and the context vector $\:{c}_{t}^{a}$ are combined to form the final state vector $\:{s}_{t}$, as shown in Eq. (16):

$$\:{s}_{t}=\text{tanh}\left({W}_{s}\left[{h}_{t}^{g};{h}_{t}^{l};{c}_{t}^{a}\right]+{b}_{s}\right)$$

(16)

where $\:{W}_{s}$ is a learnable weight matrix and $\:{b}_{s}$ is a learnable bias vector. The combination enables the model to capture both long-term and short-term dependencies, enhancing the accuracy of the decoder output.

Finally, the state vector $\:{s}_{t}$ is fed into each time step of the decoder, along with the ground truth values from the training data, to generate the output for the next time step, as described in Eq. (17):

$$\:{h}_{t}^{d}=Decoder\left({h}_{t-1}^{d},{s}_{t},{y}_{t-1}\right)$$

(17)

where $\:{h}_{t}^{d}$ is the decoder hidden state at the current time step, and $\:{y}_{t-1}$ is the ground truth value from the training data.

The state vector $\:{s}_{t}$ allows the decoder to focus on different parts of the input sequence as needed by incorporating global, local, and short-term dependency information, thereby improving the model’s predication flexibility. Additionally, $\:{s}_{t}$ provides rich contextual information to the decoder at each time step, aiding in generating more accurate and coherent sequences.

Datasets and setup

Experimental setup

To validate the effectiveness of the H-LSTM-GLE in sequence prediction tasks, both the global encoder and the local encoder modules were constructed using LSTM layers. Each encoder utilized LSTM layers with 128 hidden units, a parameter configuration determined to be optimal by cross-validation on the dataset. Experiments were conducted on a computer equipped with an NVIDIA RTX 2060 GPU and 64 GB of RAM. All models were implemented using Python 3.8 and the PyTorch 1.9.0 framework. Parameter optimization was performed using Stochastic Gradient Descent (SGD) with momentum set to 0.9. The initial learning rate was set to 0.001. To account for learning rate decay, a learning rate adjustment strategy was used where the learning rate was halved if the validation loss did not decrease for two consecutive iterations, with a maximum of five reductions. To prevent overfitting, an early stopping mechanism was introduced, which stopped training if the validation loss did not improve for 10 consecutive epochs. For our experiments, we set the length of the input sequence considered was 24 h (96-time steps), and the length of output sequence was set to 1 h (4-time steps). This setup allows the model to use a comprehensive temporal context to make accurate predictions. This setup is also consistent with the research on multi-step prediction of Seq2Seq architecture²⁵. The Root Mean Square Error (RMSE) was used as the loss function, defined as Eq. (18):

$$\:L(y,\widehat{y})=\sqrt{\frac{1}{t}\sum_{i=1}^{t}\:{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}$$

(18)

where $\:{y}_{i}$ and $\:{\widehat{y}}_{i}$ represent the actual and H-LSTM-GLE predicted values, respectively.

The experimental results are based on the average of five independent runs, and to ensure reproducibility, the same random seed was set for all experiments.

Analysis and pre-processing of datasets

Analysis of the datasets

In this study, comparative experiments were performed on two publicly available real-world datasets: SML2010-Hum (relative humidity) and SML2010-outTem (outdoor temperature). The SML2010 dataset comprises climate monitoring data collected over 40 days, containing 21 types of time series data indicators, including relative humidity, outdoor temperature, and indoor temperature, with 4137 samples for each indicator. The data were sampled every minute and smoothed a 15 min mean, following the methodology employed of Qin et al.²⁶.

In the feature selection process, we carefully considered monotonic and non-monotonic effects, redundancy, and relevance to ensure that the selected features contributed meaningfully to the predictive accuracy of weather forecasts. The Pearson correlation coefficient between SML2010-Hum and SML2010-outTem is -0.57, indicating a linear monotonic effect between the two features, which is crucial for reliable predictions in time series forecasting. Although there is some redundancy between the features, the relatively low correlation is insufficient to justify the removal of either feature based solely on redundancy alone. Furthermore, the fact that the model performs well on both weakly correlated features suggests strong generalizability, demonstrating the model’s ability to adapt to different data distributions and feature patterns. Additionally, this correlation reflects the complementary nature of the meteorological factors, where temperature influences the moisture holding capacity of the air, while humidity represents the moisture content of the air. Both features are highly relevant to the forecasting task and are essential for modelling phenomena such as dew point, precipitation potential and temperature variations.

In addition, the distributions of the target variables, SML2010-Hum and SML2010-outTem, are shown in Fig. 4A and B. The histograms and kernel density estimates (KDEs) show that both target variables have approximately normal distributions, but exhibit some skewness and kurtosis, indicating potential challenges for the prediction models. All predictor variables correspond to the same spatial location as the target variable, ensuring consistency in spatial correlation. The degree of imbalance in the datasets was also assessed by examining the frequency distributions of the target variables. As shown in Fig. 4C and D, the relative humidity data and outdoor temperature data exhibit slight imbalances with higher frequencies around certain humidity and temperature levels. These imbalances must be carefully managed to ensure model robustness and generalization.

Data pre-processing

Data pre-processing included Min-Max normalization, dataset splitting, and time warping. As relative humidity and outdoor temperature were measured by different sensors, their original scales differed significantly. For example, relative humidity typically ranges from 0 to 100, while temperature can range from 0 to 40 degrees Celsius. To prevent the larger range of one type of data from dominating model training and to ensure stable numerical behavior, we applied Min-Max normalization to scale the data to the range [0, 1]. Before splitting the dataset, we did not perform any data shuffling to preserve the temporal structure, which is essential for time series forecasting models. The dataset was split into training, validation, and test sets in a ratio of 8:1:1, resulting in 3339 samples for training, 399 for validation, and 399 for testing. Additionally, global time warping and local time warping with a 12 h cycle was applied to the training set to allow the model to capture diverse patterns of variations in meteorological factors during training. To ensure the reliability of the evaluation, the test set remained in its original state, unaffected by any data augmentation. Fig. 4E displays a time series plot of the first 30 days of data, which visually illustrates the persistence of the target variables under a 15 min sampling interval, as well as the distribution and degree of imbalance in the datasets.

Model evaluation metrics

The performance of the H-LSTM-GLE model on the two public datasets is evaluated using three metrics: Mean Squared Error (MSE)²⁷, Mean Absolute Error (MAE)²⁸, and Mean Absolute Percentage Error (MAPE)^29,30.

MSE is a commonly used performance evaluation metric in sequence prediction tasks, quantifying the discrepancy between predicted and actual values²⁷. A smaller MSE indicates higher prediction accuracy²⁷. Due to the squared term, MSE is highly sensitive to outliers. The calculation of MSE is given by Eq. (19):

$$\:\text{M}\text{S}\text{E}=\frac{1}{n}\sum_{i=1}^{n}\:({y}_{test}^{\left(i\right)}-{\widehat{y}}_{test}^{\left(i\right)}{)}^{2}$$

(19)

where $\:n$ represents the number of samples in the test set; $\:{y}_{test}^{\left(i\right)}$ denotes the $\:i$-th actual observation in the test set, and $\:{\widehat{y}}_{test}^{\left(i\right)}$ denotes the $\:i$-th predicted value. The term $\:({y}_{test}^{\left(i\right)}-{\widehat{y}}_{test}^{\left(i\right)}{)}^{2}\:$represents the squared difference between the actual and predicted values, which eliminates the sign of the error and assigns greater weight to larger errors.

MAE is another widely used metric to assess the quality of prediction results, directly reflecting the average deviation of predicted values from actual values²⁸. Lower MAE values indicate higher prediction accuracy of the model²⁸. Unlike MSE, MAE does not involve squared terms, making it less sensitive to outliers. The calculation of MAE is given by Eq. (20):

$$\:\text{M}\text{A}\text{E}=\frac{1}{n}\sum_{i=1}^{n}\:|{y}_{test}^{\left(i\right)}-{\widehat{y}}_{test}^{\left(i\right)}|$$

(20)

MAPE uses absolute values to avoid the cancellation of positive and negative errors, making it suitable for comparing the accuracy of various time series models^31,32. Theoretically, a smaller MAPE value indicates a better fit of the prediction model, demonstrating higher accuracy³². The calculation of MAPE is shown in Eq. (21):

$$\:\text{M}\text{A}\text{P}\text{E}=100\text{\%}\times\:\frac{1}{t}\sum_{i=1}^{t}\:\left|\frac{{y}_{i}-{\widehat{y}}_{i}}{{y}_{i}}\right|$$

(21)

where $\:{y}_{i}$ and $\:{\widehat{y}}_{i}$ represent the actual value and the predicted value by H-LSTM-GLE at time $\:i$, respectively.

Hyperparameter tunning procedure of models

To validate the predictive capability of the H-LSTM-GLE model on meteorological factors such as humidity and temperature, we compared its performance with that of eleven baseline models on the test sets of the SML2010-Hum and SML2010-outTem datasets. The baseline models for univariate data include: Cross-dimension Dependency Transformer (Crossformer)³³, Frequency Enhanced Decomposed Transformer (FEDformer)³⁴, Pointer-Generator Networks (PGNET)³⁵, Copying Neural Network (COPYNET)³⁶, Convolutional Sequence to Sequence Network (ConvS2S)³⁷, Neural Attentive Recommendation Machine (NARM)³⁸, Neural Basis Expansion Analysis with Exogenous Variables (N-BEATS)³⁹, and Deep Autoregressive RNN (DeepAR)⁴⁰. For multivariate data, the baseline models include: Wind Resource Quality Assisted Spatial Attention (WRQASA)²⁵, Spatiotemporal Convolutional Sequence to Sequence Network (STConvS2S)⁴¹, and SFA-LSTM¹⁴.

For each baseline model and our H-LSTM-GLE model, we performed a rigorous hyperparameter tuning procedure to ensure a fair comparison with the H-LSTM-GLE model. The tuning process was conducted by grid search or random search, depending on the hyperparameter space of the model, using a validation set for optimization. We systematically explored combinations of key hyperparameters such as learning rate, batch size, number of layers, hidden state size, and dropout rate. Specifically, for models such as Crossformer, FEDformer, and PGNET, we tested different configurations for the number of attention heads and the size of the attention window, while for models such as N-BEATS and DeepAR, we varied the number of units in the fully connected layers. In all cases, the objective was to maximize the performance of the model on the validation set in terms of forecasting accuracy, measured by standard metrics such as MAE and MSE.

Hyperparameter optimization was conducted within a reasonable computational budget, ensuring that each baseline model was trained with optimal settings. The selected hyperparameter values, such as tuning methods and parameter ranges, were shown in Table 1. After the tuning process, the final selected models were evaluated on the test sets of the SML2010-Hum and SML2010-outTem datasets. The results were compared with those of the H-LSTM-GLE model to assess their relative performance.

Table 1 Hyperparameter optimization of the baseline and H-LSTM-GLE models.

Full size table

Results and discussion

Optimization of parameters in H-LSTM-GLE

We optimized two key parameters of the H-LSTM-GLE model on the validation sets of the SML2010-Hum and SML2010-outTem datasets, namely the sliding window step size and the hidden states size, to ensure the predictive accuracy of the model.

Optimization of the sliding window step size

The local encoder of the H-LSTM-GLE model extracts short-term dependency features using a sliding window. To optimize the sliding window step size for capturing these dependencies, we conducted an experiment on the SML2010-Hum and SML2010-outTem datasets, testing step sizes of 1, 2, 4, 6, 8, 10 and 12.

As shown in Table S2 and Fig. 5A and B, since the fluctuations in both datasets are strongly correlated with diurnal fluctuations, the model achieved the best performance in terms of MSE, MAE, and MAPE metrics when the sliding step size was set to 6. This step size effectively captures short-term dependencies, minimizing predication errors. Conversely, a step size of 1, which removes the sliding window effect, resulted in suboptimal performance, highlighting the importance of the sliding window in feature extraction. Additionally, the larger step sizes exceeded 6 resulted in a gradual increase in MSE, MAE, and MAPE, indicating a decrease in predictive accuracy. This is due to the model’s reduced ability to capture short-term dependencies, which blur into longer-term dependencies, reducing the effectiveness of the local encoder. In summary, a sliding window step size of 6 is optimal for the H-LSTM-GLE model and provides the best predictive performance. Therefore, this step size will be used in subsequent comparative experiments.

Optimization of the hidden states size

Truncating the training and test data disrupts the hidden states and prevents gradients from fully propagating; therefore, we chose to input the entire training sequence to predict the subsequent test sequence. Thus, the selection of the optimal hidden states size is crucial for the predictive performance of the H-LSTM-GLE model. We also conducted optimization experiments on the SML2010-Hum and SML2010-outTem datasets. As shown in Table S3 and Fig. 5C and D, the performance of the H-LSTM-GLE model deteriorates when the hidden state size is either too large or too small. The hidden state size determines the dimensionality of the internal state representation of the H-LSTM-GLE model. A hidden state size that is too small fails to adequately capture the complex patterns in time series data, while an excessively large hidden state size leads to overfitting problems and increased computational cost. Experimental results on the SML2010-Hum and SML2010-outTem datasets indicate that a hidden state size of 128 achieves the best performance in terms of MSE, MAE, and MAPE metrics. The H-LSTM-GLE model uses a local encoder and a global encoder to capture short-and long-term dependencies within the time series, respectively. When the hidden state size is 128, the model excels in capturing these dependencies, particularly in the selection and representation of medium- and short-term dependencies, by effectively utilizing the attention mechanism to dynamically select the most relevant information. Therefore, we adopted 128 as the optimal hidden state size for subsequent experiments.

Comparison of models in humidity and temperature datasets

Comparison of evaluation metrics results

We performed a comparison between our H-LSTM-GLE model and several traditional univariate time series forecasting models, including NARM, COPYNET, PGNET, and ConvS2S. As shown in Table 2 and Fig. 6, the H-LSTM-GLE model outperformed these models on both test sets. This can be attributed to its improved capability to capture short- and medium-term dependencies, as well as its optimized approach to computing state vectors. On the SML2010-Hum dataset, the H-LSTM-GLE exhibited a 54.10% (P < 0.001), 53.28% (P < 0.001), and 47.37% (P < 0.01) reduction in MSE, MAE, and MAPE, respectively, compared to NARM (Fig. 6A). On the SML2010-outTem dataset, H-LSTM-GLE showed a significant improvement, with reductions in MSE, MAE, and MAPE of 18.32% (P < 0.001), 19.25% (P < 0.01), and 12.07% (P < 0.001), respectively, compared to NARM (Fig. 6B). This demonstrates H-LSTM-GLE’s excellent capability in capturing short- and medium-term dependencies within sequences through a sliding window approach. Moreover, models such as COPYNET and PGNET, which use copying mechanisms to improve the capture of features, performed worse than H-LSTM-GLE on the SML2010-Hum and SML2010-outTem datasets (Fig. 6). Compared to PGNET, H-LSTM-GLE reduced the MSE, MAE and MAPE by 58.47% (P < 0.001), 58.44% (P < 0.01) and 60.72% (P < 0.001) on the SML2010-Hum dataset. Compared to COPYNET, H-LSTM-GLE reduced the MSE, MAE and MAPE by 61.42% (P < 0.001), 61.21% (P < 0.001) and 56.07% (P < 0.001) on the SML2010-Hum dataset, indicating that H-LSTM-GLE outperforms copying mechanism-based models on these datasets.

Table 2 Comparison of models on meteorological datasets.

Full size table

In addition, we also compared the H-LSTM-GLE model with several advanced univariate time series forecasting models, including Crossformer, Fedformer, N-BEATS, and DeepAR. As shown in Figs. 6 and 7, the H-LSTM-GLE model demonstrated superior performance in these comparative experiments. Compared to the Crossformer, H-LSTM-GLE reduced the MSE, MAE, and MAPE by 50.75% (P < 0.001), 49.74% (P < 0.001) and 43.24% (P < 0.001) on the SML2010-Hum dataset, and by 4.29% (P > 0.05), 3.82% (P > 0.05) and 7.97% (P > 0.05) on the SML2010-outTem dataset (Fig. 6). Compared to FEDformer, H-LSTM-GLE also reduced the MSE, MAE, and MAPE by 0.223 (P < 0.001), 0.209 (P < 0.001), and 0.882 (P < 0.001) on the SML2010-Hum dataset (Fig. 6). These comparisons indicate that despite the significant performance of the Transformer models in time series forecasting, the H-LSTM-GLE model, with further optimization, achieves even higher predictive accuracy. In comparison with the autoregressive recurrent network-based DeepAR model, the H-LSTM-GLE model achieved reductions in MSE, MAE, and MAPE of up to 54.73% (P < 0.01), 54.39% (P < 0.05), and 43.81% (P < 0.05) on SML2010-Hum dataset (Fig. 7). When compared to the N-BEATS, H-LSTM-GLE model showed a performance improvement of 61.57% (MSE, P < 0.001) on the SML2010-Hum dataset and 38.62% (MAE, P < 0.001) on the SML2010-outTem dataset. Collectively, these results highlight the significant predictive accuracy advantage of the H-LSTM-GLE model over eight baseline univariate models.

Previous studies have compared the H-LSTM-GLE model with several traditional and advanced univariate models, confirming its superiority in forecasting meteorological factors. Recently, advanced models based on multivariate data, such as WRQASA, STConvS2S, and SFA-LSTM, have also been used to predict meteorological factors (Fig. 7). Accordingly, we have extended our comparison to include these baseline models. As these three baseline models use multivariate input data, we added auxiliary data columns to their input. For outdoor humidity predication, we added indoor humidity and light intensity as additional input features to outdoor humidity. Similarly, for outdoor temperature predication, indoor temperature and light intensity data were added. During the experiments, we found that these three multivariate models generally outperformed the univariate baseline models. Compared to the WRQASA model, the inclusion of ancillary data significantly improved its performance (Table 2; Fig. 7). Specifically, its MSE metric slightly outperformed H-LSTM-GLE by 4.26% (P > 0.05) on the SML2010-Hum dataset and 3.65% (P > 0.05) on the SML2010-outTem dataset. However, its MAE and MAPE metrics were lower for the H-LSTM-GLE model, with reductions of 6.34% (P > 0.05) and 6.29% (P < 0.05) on the SML2010-Hum dataset, and 1.95% (P > 0.05) and 6.47% (P > 0.05) on the SML2010-outTem dataset (Fig. 7). This suggests that the WRQASA model, with the aid of additional auxiliary data, can effectively use its N-encoder-1-decoder multi-channel fusion group forecasting Seq2Seq architecture, allowing it to process multiple input channels in parallel. This architecture can reveal correlations across the dimensions of multivariate data, thereby improving the accuracy of MSE. However, as MAE and MAPE are more sensitive to small errors, the performance of the WRQASA model in minimizing small errors was inferior to that of H-LSTM-GLE. In the comparisons with STConvS2S and SFA-LSTM, we found that although these models benefit from ancillary data, their forecasting capabilities were still weaker than that of H-LSTM-GLE by 41.15% (MAPE, P < 0.01) and 27.67% (MAPE, P < 0.001) on SML2010-Hum dataset, respectively (Fig. 7). The STConvS2S model, which relies on CNN for feature extraction from time series data, was less effective in extracting local features in the complex context of meteorological forecasting, compared to the hybrid encoder of H-LSTM-GLE (Fig. 7). Additionally, SFA-LSTM, although effective in handling multivariate data, lacks the multi-layer encoding mechanism of H-LSTM-GLE (Fig. 7). Overall, the H-LSTM-GLE model, even without auxiliary data, effectively captures critical dependencies within meteorological data through its hybrid LSTM global-local encoder structure and attention mechanism, reducing both computational overhead and data pre-processing costs.

The above research has demonstrated the performance superiority of the H-LSTM-GLE model on meteorological factor datasets. To analyze the stability performance of the model, we extended the comparative experiments between H-LSTM-GLE and six baseline models to larger-scale datasets with complex fluctuations (Nasdaq100 and Electricity Transformer Temperature (ETT), Supplementary Material). As shown in Fig. S1, H-LSTM-GLE exhibits significant performance improvements on both Nasdaq100 and ETT datasets compared to six baseline models (Crossformer, FEDformer, PGNET, COPYNET, ConvS2S, NARM), demonstrating that the H-LSTM-GLE model has good stability and generalization in different types of sequence data prediction tasks.

Comparison of time series prediction

We also present the time series plots of the predictions made by the H-LSTM-GLE model and eleven baseline models over a period of consecutive 5 days on the test set. The time series plots provide a clear visual comparison of the prediction errors between each model and the actual values at each time step. As shown in Fig. 8A, the predictions of the H-LSTM-GLE model closely match the actual values, indicating that the hybrid encoding structure of H-LSTM-GLE, together with the incorporation of attention mechanisms and sliding windows, enables it to handle more complex meteorological factors. In contrast, the predictions of traditional univariate models, such as NARM (Fig. 8B), COPYNET (Fig. 8C), PGNET (Fig. S2A), and ConvS2S (Fig. S2B), show significantly larger deviations from the actual values in the test set. This suggests that univariate models based on copy mechanisms or traditional convolutional methods face challenges in capturing long-term dependencies, especially for tasks such as predicting meteorological factors, where long-term trends are more pronounced. Although the predictions of Transformer-based models, such as Crossformer (Fig. 9A) and Fedformer (Fig. S2C), have slightly larger discrepancies than those of H-LSTM-GLE, they are still significantly smaller than those of traditional models and outperform N-BEATS (Fig. 9B) and the autoregressive recurrent network-based DeepAR (Fig. 9C). This indicates that Transformer-based models are better capabilities in capturing long-range dependencies in time series data compared to N-BEATS and DeepAR, with N-BEATS struggling particularly with non-linear variations in the data.

Finally, we also present the time series plots for multivariate prediction models (Fig. 10). It is evident that the predictions from WRQASA (Fig. 10A), STConvS2S (Fig. 10B), and SFA-LSTM (Fig. 10C) exhibit minimal discrepancies from the actual values, with WRQASA’s predictions being almost identical to the actual values, showing a very small difference compared to H-LSTM-GLE. This indicates that the H-LSTM-GLE model, despite not using any additional auxiliary data, achieves similar prediction performance to the optimal multivariate prediction model, WRQASA, which does incorporate such data. The similarity in prediction ability between the two models for meteorological factor data suggests that H-LSTM-GLE, relying solely on its hybrid LSTM global-local encoder structure and attention mechanisms, can deliver comparable results. However, since H-LSTM-GLE does not require auxiliary data, it incurs lower computational costs and data preprocessing overhead.

Ablation experiments on the H-LSTM-GLE model

To further validate the impact of the local and global encoder modules on the predictive performance of the H-LSTM-GLE model, we perform ablation experiments on the SML2010-Hum and SML2010-outTem datasets. As shown in Table 3, the entire H-LSTM-GLE model, which includes both the global and local encoders, significantly outperforms models with only the local encoder or only the global encoder. On the SML2010-Hum dataset, the H-LSTM-GLE model reduced the MSE by 6.14-fold and 10.86-fold compared to models without local encoder or global encoder, respectively. Similarly, on the SML2010-outTem dataset, the MSE was reduced by 6.90-fold and 7.71-fold, respectively. These results demonstrated the importance of the hybrid encoder approach. Additionally, the model with only the local encoder performs outperformed the model with only the global encoder, highlighting the effectiveness of the sliding window in capturing short- and medium-term dependencies.

Table 3 Ablation experiments with global encoder and local encoder.

Full size table

Conclusion

In this study, we proposed a novel deep learning model based on a global-local hybrid encoder (H-LSTM-GLE) for efficient prediction of meteorological factors, such as relative humidity and temperature. The local encoder with a sliding window captures short- to medium-term dependencies, while the global encoder re-encodes the input sequence and integrates these dependencies. The state vector computation module uses an attention mechanism to dynamically select relevant information. Compared to eleven baseline models, the H-LSTM-GLE performs exceptionally well on the SML2010-Hum and SML2010-outTem datasets. Ablation experiments also confirmed the importance of both encoder modules and demonstrated the efficiency and accuracy of the model, further demonstrating that the constructed deep learning model can efficiently achieve accurate predictions of meteorological factors. Notably, the H-LSTM-GLE model addresses the problem of varying sensitivity to the beginning and end of long meteorological data sequences in traditional forecasting models, as well as the problem of inflexibility due to the single encoding process that cannot adapt to dynamic weather patterns. Future research will build on these findings to incorporate external factors such as atmospheric pressure, wind speed, and solar radiation to further improve the model’s prediction accuracy.

Data availability

All data generated or analyzed during this study are included in this published article.

References

Wang, L. et al. Meteorological sequence prediction based on multivariate space-time auto regression model and fractional calculus grey model. Chaos Soliton Fract. 128, 203–209. https://doi.org/10.1016/j.chaos.2019.07.056 (2019).
Article ADS MathSciNet Google Scholar
Jia, S. P. et al. Effectiveness of cascading time series models based on meteorological factors in improving health risk prediction. Environ. Sci. Pollut R. 29, 9944–9956. https://doi.org/10.1007/s11356-021-16372-2 (2022).
Article Google Scholar
Gibson, J. K. A production multi-tasking numerical weather prediction model. Comput. Phys. Commun. 37, 317–327. https://doi.org/10.1016/0010-4655(85)90168-7 (1985).
Article ADS Google Scholar
Buizza, R. The value of a variable resolution approach to numerical weather prediction. Mon Weather Rev. 138, 1026–1042. https://doi.org/10.1175/2009MWR3077.1 (2010).
Article ADS Google Scholar
Thiruvengadam, P., Indu, J. & Ghosh, S. Significance of 4DVAR radar data assimilation in weather research and forecast model-based nowcasting system. J. Geophys. Res-Atmos. 35, 23103–23124. https://doi.org/10.1007/s00521-023-08957-4 (2020).
Article Google Scholar
Khoei, T. T., Slimane, H. O. & Kaabouch, N. Deep learning: systematic review, models, challenges, and research directions. Neural Comput. Appl. 125, 31369. https://doi.org/10.1029/2019JD031369 (2023).
Article Google Scholar
Peng, L. et al. The advances and challenges of deep learning application in biological big data processing. Curr. Bioinform. 13, 352–359. https://doi.org/10.2174/1574893612666170707095707 (2018).
Article CAS Google Scholar
Kow, P. Y., Lee, M. H., Sun, W., Yao, M. H. & Chang, F. J. Integrate deep learning and physically-based models for multi-step-ahead microclimate forecasting. Expert Syst. Appl. 210, 118481. https://doi.org/10.1016/j.eswa.2022.118481 (2018).
Article Google Scholar
Nguyen, N. A., Dang, T. D., Verdu, E. & Solanki, V. K. Short-term forecasting electricity load by long short-term memory and reinforcement learning for optimization of hyper-parameters. Evol. Intell. 16, 1729–1746. https://doi.org/10.1007/s12065-023-00869-5 (2023).
Article Google Scholar
Bormann, T. On the interaction of verbal short-term memory and language processing: evidence from cognitive neuropsychology. Psychol. Rundsch. 61, 18–24. https://doi.org/10.1026/0033-3042/a000006 (2010).
Article Google Scholar
Chen, Y. L., Wang, N. C., Ciou, J. F. & Lin, R. Q. Combined bidirectional long short-term memory with mel-frequency cepstral coefficients using autoencoder for speaker recognition. Appl. Sci-Basel. 13, 7008. https://doi.org/10.3390/app13127008 (2023).
Article CAS Google Scholar
Huang, Y. S., Gao, Y. L., Gan, Y. & Ye, M. A new financial data forecasting model using genetic algorithm and long short-term memory network. Neurocomputing 425, 207–218. https://doi.org/10.1016/j.neucom.2020.04.086 (2021).
Article Google Scholar
Luo, C. Y., Xu, G. N., Li, X. T. & Ye, Y. M. The reconstitution predictive network for precipitation nowcasting. Neurocomputing 507, 1–15. https://doi.org/10.1016/j.neucom.2022.07.061 (2022).
Article Google Scholar
Suleman, M. A. R. & Shridevi, S. Short-term weather forecasting using Spatial feature attention based LSTM model. IEEE Access. 10, 82456–82468. https://doi.org/10.1109/ACCESS.2022.3196381 (2022).
Article Google Scholar
Subbiah, S. S., Paramasivan, S. K., Arockiasamy, K., Senthivel, S. & Thangavel, M. Deep learning for wind speed forecasting using Bi-LSTM with selected features. Intell. Autom. Soft Co. 35, 3829–3844. https://doi.org/10.32604/iasc.2023.030480 (2023).
Article Google Scholar
Jang, M., Seo, S. & Kang, P. Recurrent neural network-based semantic variational autoencoder for sequence-to-sequence learning. Inf. Sci. 490, 59–73. https://doi.org/10.1016/j.ins.2019.03.066 (2014).
Article Google Scholar
Fan, C. & Li, Y. Research on Chinese word segmentation method of sequence labeling based on recurrent neural networks. Intell. Comput. Appl. 12837, 316–326 (2021).
Google Scholar
Ghimire, S. et al. Stacked LSTM sequence-to-sequence autoencoder with feature selection for daily solar radiation prediction: a review and new modeling results. Energies 15, 1061. https://doi.org/10.3390/en15031061 (2022).
Article Google Scholar
Malik, M. H., Ghous, H., Ahsan, I. & Ismail, M. Saraiki language hybrid stemmer using rule-based and LSTM-based sequence-to-sequence model approach. Innovative Comput. Rev. 2, 18–40. https://doi.org/10.32350/icr.0202.02 (2022).
Article Google Scholar
Xiang, Z., Yan, J. & Demir, I. A rainfall-runoff model with LSTM-based sequence-to-sequence learning. Water Resour. Res. 56, 1–17. https://doi.org/10.1029/2019WR025326 (2020).
Article Google Scholar
Mahjoub, A. B. & Atri, M. A flexible high-level fusion for an accurate human action recognition system. J. Circ. Syst. Comp. 29, 2050190. https://doi.org/10.1142/S021812662050190X (2020).
Article Google Scholar
Anshuman, A. & Eldho, T. I. Entity aware sequence to sequence learning using LSTMs for estimation of groundwater contamination release history and transport parameters. J. Hydrol. 608, 127662. https://doi.org/10.1016/j.jhydrol.2022.127662 (2022).
Article CAS Google Scholar
Yin, H., Zhang, X., Wang, F., Zhang, Y. & Jin, J. Rainfall-runoff modeling using LSTM-based multi-state-vector sequence-to-sequence model. J. Hydrol. 598, 126378. https://doi.org/10.1016/j.jhydrol.2021.126378 (2021).
Article Google Scholar
Han, H., Choi, C., Jung, J. & Kim, H. S. Deep learning with long short term memory based sequence-to-sequence model for rainfall-runoff simulation. Water 13, 437. https://doi.org/10.3390/w13040437 (2021).
Article Google Scholar
Xu, S. et al. A multi-step wind power group forecasting seq2seq architecture with spatial–temporal feature fusion and numerical weather prediction correction. Energy 291, 130352. https://doi.org/10.1016/j.energy.2024.130352 (2024).
Article Google Scholar
Qin, Y. et al. A dual-stage attention-based recurrent neural network for time series prediction. Conf. Artif. Intell. 7, 2627–2633. https://doi.org/10.5555/3172077.3172254 (2017).
Article Google Scholar
Zhou, W. & Bovik, A. C. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal. Proc. Mag. 26, 98–117. (2009). https://doi.org/10.1109/MSP.2008.930649
Tomás, L. M. Estimation of a probability with guaranteed normalized mean absolute error. IEEE Commun. Lett. 13, 817–819. https://doi.org/10.1109/LCOMM.2009.091128 (2009).
Article Google Scholar
McKenzie, J. Mean absolute percentage error and bias in economic forecasting. Econ. Lett. 113, 259–262. https://doi.org/10.1016/j.econlet.2011.08.010 (2013).
Article MathSciNet Google Scholar
Ahmar, A. S. Forecast error calculation with mean squared error (MSE) and mean absolute percentage error (MAPE). J. Infor Vis. 1, 94–96. https://doi.org/10.35877/454RI.jinav303 (2020).
Article Google Scholar
Nourbakhsh, Z. & Habibi, N. Combining LSTM and CNN methods and fundamental analysis for stock price trend prediction. Multimed Tools Appl. 82, 17769–17799. https://doi.org/10.1007/s11042-022-13963-0 (2022).
Article Google Scholar
Feda, A. K., Adegboye, O. R., Agyekum, E. B., Hassan, A. S. & Kamel, S. Carbon emission prediction through the harmonization of extreme learning machine and INFO algorithm. IEEE Access. 12, 2169–3536. https://doi.org/10.1109/ACCESS.2024.3390408 (2024).
Article Google Scholar
Zhang, Y. & Yan, J. Crossformer Transformer utilizing cross-dimension dependency for multivariate time series forecasting. Int. Conf. Learn. Represent. 1, 1-21. (2023).
Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L. & Jin, R. Frequency enhanced decomposed transformer for long-term series forecasting. Int. Conf. Mach. Learn. 162, 27268-27286. https://doi.org/10.48550/arXiv.2201.12740 (2022).
See, A., Liu, P. J. & Manning, C. D. Get to the point: summarization with pointer-generator networks. Annu. Meet Assoc. Comput. Linguist. 1, 1073–1083. https://doi.org/10.18653/v1/P17-1099 (2017).
Article Google Scholar
Gu, J., Lu, Z. & Li, V. O. K. Incorporating copying mechanism in sequence-to-sequence learning. Annu. Meet Assoc. Comput. Linguist. 1, 1631–1640. https://doi.org/10.18653/v1/P16-1154 (2016).
Article Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D. & Dauphin, Y. N. Convolutional sequence to sequence learning. Int. Conf. Mach. Learn. 70, 1243–1252 (2017).
Google Scholar
Ren, P. J., Chen, Z., Ren, Z., Lian, J. & Ma, T. Neural attentive session-based recommendation. Conf. Inf. Knowl. Manage. 1419–1428. https://doi.org/10.1145/3132847.3132926 (2017).
Souto, H. G. & Moradi, A. Introducing NBEATSx to realized volatility forecasting. Expert Syst. Appl. 242, 122802. https://doi.org/10.1016/j.eswa.2023.122802 (2024).
Article Google Scholar
Salinas, D., Flunkert, V., Gasthaus, J. & Januschowski, T. DeepAR: probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36, 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001 (2020).
Article Google Scholar
Castro, R., Souto, Y.M., Ogasawara, E., Porto, F., Bezerra, E. STConvS2S: spatiotemporal convolutional sequence to sequence network for weather forecasting. Neurocomputing 426, 285–298. https://doi.org/10.1016/j.neucom.2020.09.060 (2021).
Article Google Scholar

Download references

Acknowledgements

We would like to express sincere gratitude to all the investigators who supported this study.

Author information

Authors and Affiliations

Department of Aviation Equipment Support Command, Naval Aviation University, Qingdao, 266041, P.R. China
Guoqiang Sun & Yang Zhao
State Key Laboratory of Microbial Technology, Shandong University, Qingdao, 266237, P.R. China
Xiaoyan Qi
School of Information Science and Engineering, Shandong University, Qingdao, 266237, P.R. China
Guoqiang Sun
Department of Biochemistry and Molecular Biology, Binzhou Medical University, Yantai, 264003, P.R. China
Xiaoyan Qi

Authors

Guoqiang Sun
View author publications
Search author on:PubMed Google Scholar
Yang Zhao
View author publications
Search author on:PubMed Google Scholar
Xiaoyan Qi
View author publications
Search author on:PubMed Google Scholar

Contributions

Guoqiang Sun and Xiaoyan Qi wrote the main manuscript text. Guoqiang Sun and Yang Zhao prepared Figs. 1, 2, 3...10 ; Tables 1, 2 and 3. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiaoyan Qi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sun, G., Zhao, Y. & Qi, X. Sequence to sequence architecture based on hybrid LSTM global and local encoders approach for meteorological factors forecasting. Sci Rep 15, 22753 (2025). https://doi.org/10.1038/s41598-025-08331-5

Download citation

Received: 08 July 2024
Accepted: 20 June 2025
Published: 02 July 2025
DOI: https://doi.org/10.1038/s41598-025-08331-5

Subjects

Abstract

Similar content being viewed by others

Enhanced streamflow forecasting using hybrid modelling integrating glacio-hydrological outputs, deep learning and wavelet transformation

Analysis of environmental factors using AI and ML methods

Transformer based models with hierarchical graph representations for enhanced climate forecasting

Introduction

Methodology

Construction of a deep learning model based on a hybrid global-local encoder

Local encoder operation procedure

Global encoder structure in the H-LSTM-GLE model

State vector calculation method

Datasets and setup

Experimental setup

Analysis and pre-processing of datasets

Analysis of the datasets

Data pre-processing

Model evaluation metrics

Hyperparameter tunning procedure of models

Results and discussion

Optimization of parameters in H-LSTM-GLE

Optimization of the sliding window step size

Optimization of the hidden states size

Comparison of models in humidity and temperature datasets

Comparison of evaluation metrics results

Comparison of time series prediction

Ablation experiments on the H-LSTM-GLE model

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links