Introduction

To achieve the dual-carbon goals of carbon peaking and carbon neutrality, China is accelerating the construction of a new power system with renewable energy as its core. Among renewable sources, solar energy plays a particularly important role due to its wide availability, environmental friendliness, and scalability. As a clean and sustainable energy source, photovoltaic (PV) power generation not only reduces environmental pollution but also serves a broad range of economic and public service needs, especially in a geographically vast country like China. Driven by supportive national policies, PV installations across China have experienced rapid growth.

On the one hand, the increasing penetration of PV systems has made them a vital component of the electricity supply structure, significantly affecting grid operations and reserve planning. Accurate and reliable PV power forecasting has thus become a key enabling technology for enhancing system efficiency and improving renewable energy integration. However, the chaotic nature of atmospheric systems introduces considerable uncertainty and randomness into solar power output, complicating prediction tasks. Moreover, as more influencing factors are incorporated into PV prediction models, both internal time-series dependencies and complex, fluctuating meteorological inputs must be accounted for. These data are characterized by strong nonlinearity, non-stationarity, and heteroscedasticity, making the modeling of input–output mappings particularly challenging. The increasing heterogWith the rapid advancement of renewable energy integration and the construction of new-type power systems, both domestic and international scholars have conducted extensive studies on short-term forecasting of renewable energy output. Forecasting methodologies are primarily classified into point prediction, interval prediction, and random scenario (probabilistic distribution) prediction. Point prediction aims to provide a deterministic estimate for a future time step and commonly utilizes methods such as numerical weather prediction, wavelet neural networks, and least squares support vector machines1,2,3. In contrast, interval prediction outputs a range of possible values for uncertain variables, partially reflecting their stochastic nature. Methods including Gaussian processes and deep learning frameworks have been widely employed for this purpose4,5,6, offering improved robustness yet still susceptible to deviations caused by random fluctuations.

Random scenario prediction, also known as probabilistic distribution forecasting, models the possible realizations of uncertain variables by estimating their probability distributions over time. This approach is particularly vital for handling the high variability and intermittency of renewable sources such as photovoltaic (PV) and wind power. It addresses the limitations of point and interval forecasts in capturing the true distributional characteristics of uncertainty. For example, Zhang7 uses Quantile Regression (QR) to generate short-term stochastic scenarios of PV output, while Huang8 further improves this with a quantile convolutional neural network. Similarly, He9 applies a Gaussian quantile-kernel density estimation model to capture wind power distributions, and Fan10 utilizes the swing door algorithm along with fuzzy c-means clustering to classify patterns of wind power fluctuation and predict short-term stochastic scenarios. Despite these advances, QR and swing door-based methods are heavily reliant on large-scale historical datasets, limiting their generalizability in data-scarce environments.

To mitigate data dependency, generative adversarial networks (GANs) have been introduced for renewable energy prediction tasks. References11,12 demonstrate that GAN-based models can generate wind power distributions resembling historical data, facilitating reliable scenario generation even with limited samples. However, standard GAN architectures fall short in modeling temporal dependencies intrinsic to PV outputs. To address this, Time-series Generative Adversarial Networks (TimeGAN) have been proposed as a novel solution. Leveraging the inherent time correlations in PV data, TimeGAN effectively captures both static and temporal features to generate synthetic data that align with the real probabilistic distribution. The application of TimeGAN in PV forecasting has shown promising results, with visualization and empirical evaluation confirming its ability to generate realistic output scenarios.

In parallel, deep recurrent models such as LSTM and GRU have shown strong capabilities in learning temporal dependencies within time-series data. LSTM models have been successfully applied in various domains: for instance, Ehsan et al13. used LSTM to model water absorption behavior in composites, while Zhang et al.14 proposed a multilevel LSTM integrated with ultrasonic detection for defect identification. Jian et al15. combined BiLSTM with attention and transfer learning to predict composite fatigue life with high accuracy. To further enhance long sequence learning, an extended LSTM (xLSTM) was proposed16,17, incorporating deeper network structures and advanced gating mechanisms. While LSTM and GRU models are adept at capturing long-range dependencies, they often underperform in extracting localized features or modeling non-stationary transitions across multi-stage processes, such as the evolution of damage in complex systems.

To overcome these limitations, hybrid neural architectures have emerged, combining different model strengths. Recent works have explored combinations such as CNN-LSTM18 and LSTM-Transformer19, achieving improved accuracy in sequence prediction tasks. In this context, integrating TimeGAN for data generation, xLSTM for long-term temporal modeling, and Transformer for global attention-based feature extraction offers a promising composite framework for accurate, probabilistic, and robust photovoltaic power forecasting under complex and uncertain conditions.eneity and volume of multi-source data further complicate analysis and modeling tasks.

Recent studies have increasingly emphasized the importance of hybrid learning frameworks and data-driven integration strategies in renewable energy systems and complex prediction tasks. At the planning and system level, city-scale photovoltaic deployment and spatial optimization have been investigated to support large-scale PV integration under urban constraints21. From the perspective of short-term operation, advanced deep learning architectures incorporating spatiotemporal modeling and attention mechanisms have demonstrated improved forecasting robustness for photovoltaic clusters under highly variable conditions20. In parallel, learning-based approaches have been widely adopted in power-electronics-dominated energy systems to enhance dynamic control performance and reliability22,23. Beyond the energy domain, recent advances in multi-model fusion learning show that combining heterogeneous predictors and leveraging complementary outputs can significantly improve overall prediction accuracy and generalization capability24. These studies collectively indicate that integrating data augmentation, multi-scale temporal modeling, and fusion-based learning is a promising direction, while existing methods still face challenges in handling data scarcity and complex temporal dependencies—issues explicitly addressed by the proposed TimeGAN–xLSTM–Transformer framework.

(1) This study introduces a TimeGAN-driven data augmentation strategy to address the data scarcity and imbalance issues commonly encountered in photovoltaic (PV) forecasting. By leveraging the temporal modeling capability of TimeGAN, we synthetically generate diverse and realistic PV time series while preserving the intrinsic correlation patterns between key environmental variables such as irradiance and temperature. This enhanced dataset improves the generalization capacity of downstream prediction models under varied operating conditions.

(2) This study propose a novel hybrid architecture that integrates extended Long Short-Term Memory (xLSTM) networks with Transformer modules. The xLSTM component incorporates a matrix-based memory structure (mLSTM), enabling efficient parallel processing and enhanced local feature extraction across multi-stage temporal patterns. Meanwhile, the Transformer submodule contributes global contextual awareness through self-attention mechanisms, thereby effectively capturing long-range dependencies and complex interactions within the PV generation sequences.

(3) This study validate the proposed TimeGAN-xLSTM-Transformer framework using real-world operation data obtained from the State Grid of China. Experimental results demonstrate that our method significantly outperforms traditional machine learning and deep learning baselines in terms of prediction accuracy and robustness. The empirical evaluation confirms the framework’s applicability and reliability in practical PV power forecasting scenarios under complex and uncertain environmental conditions.

Methodology

To comprehensively capture the intrinsic temporal dynamics and uncertainty in photovoltaic (PV) power generation data, this study proposes a multi-component framework that processes historical PV time series from three key perspectives: data augmentation, temporal pattern learning, and dependency modeling, as shown in Fig. 1.

Fig. 1
Fig. 1
Full size image

The multi-component framework of TimeGAN-xLSTM-Transformer.

First, a TimeGAN-based data augmentation module is employed to address the data sparsity and imbalance issues, especially under extreme fluctuation scenarios. By integrating autoregressive modeling, generative adversarial learning, and temporal dynamics embedding, TimeGAN generates synthetic PV time series that preserve both statistical properties and temporal dependencies of real-world data, thereby enriching the training dataset. Second, an xLSTM network is utilized to extract temporal features across multiple timescales. The model integrates convolutional feature extraction and memory mechanisms, enabling it to simultaneously capture localized fluctuations and long-term trends in PV output. This ensures a more accurate reflection of dynamic variations caused by environmental and operational factors. Finally, a Transformer module is introduced to model complex interdependencies and potential nonlinear evolution patterns within the PV power sequences. Its self-attention mechanism allows for flexible weighting of relevant time steps, facilitating a deeper understanding of how contextual changes affect future outputs.

TimeGAN based synthetic data generation for photovoltaic time series

During the model training process, the scarcity of extreme fluctuation scenarios in photovoltaic (PV) data poses a significant challenge. Training prediction models directly on such limited data may lead to underfitting and insufficient generalization. To address this issue, a time-series data augmentation model based on Generative Adversarial Networks (GANs) is proposed, aiming to generate synthetic PV power sequences that share similar distributions with the original extreme fluctuation scenarios. Reference extends the traditional GAN framework by incorporating the inherent temporal dynamics of sequential data and introduces the TimeGAN model. TimeGAN consists of four key components: an embedding network, a recovery network, a sequence generator, and a sequence discriminator, as illustrated in Fig. 2. The embedding and recovery networks form an autoencoding component, while the generator and discriminator constitute the adversarial component. These two parts are trained jointly, enabling TimeGAN to simultaneously learn meaningful representations, generate realistic sequences, and capture temporal dependencies.

Fig. 2
Fig. 2
Full size image

TimeGAN model structure.

Let \(\:{s}_{t}\) denote static features and \(\:{x}_{t}\) denote time-series features. The embedding function \(\:e\) maps them into latent representations \(\:{h}_{t}\):

$$\:{h}_{t}=e({s}_{t},{x}_{t})=\left({e}^{S}\left(st\right),{e}^{X}\left({h}_{t-1},{x}_{t}\right)\right)$$
(1)

where \(\:{e}^{S}:S\to\:H\) and \(\:{e}^{X}:H\times\:X\to\:H\) are the static and temporal embedding networks, respectively.

The recovery function \(\:r\) maps the latent representation back to the feature space:

$$\:\left({\widehat{s}}_{t},{\widehat{x}}_{t}\right)=r\left({h}_{t}\right)=\left({r}^{S}\left({h}_{t}\right),{r}^{X}\left({h}_{t}\right)\right)$$
(2)

The generator creates latent codes from randomly sampled inputs \(\:{z}^{S},{z}_{t}\) :

$$\:{\stackrel{\sim}{h}}_{t}=g\left({z}^{S},{z}_{t}\right)=\left({g}^{S}\left({z}^{S}\right),{g}^{X}\left({\stackrel{\sim}{h}}_{t-1},{z}_{t}\right)\right)$$
(3)

The discriminator operates in the latent space using a bidirectional RNN and feedforward classifier to distinguish real from generated data:

$$\:{\stackrel{\sim}{y}}_{t}=d\left({\stackrel{\sim}{h}}_{t}\right)=\left({d}^{S}\left({u}_{t}\right),{d}^{X}\left({u}_{t}\right)\right)$$
(4)

where \(\:{u}_{t}\) encodes contextual information via forward and backward hidden states.

To ensure accurate reconstruction of input data, the reconstruction loss is defined as:

$$\:{L}_{recon}=E\left[{\parallel{s}_{t}-{\widehat{s}}_{t}\parallel}^{2}+\sum\limits_{t=1}^{T}{\parallel{x}_{t}-{\widehat{x}}_{t}\parallel}^{2}\right]$$
(5)

The unsupervised adversarial loss that guides the generator and discriminator is:

$$\:{L}_{unsup}=\mathbb{E}\left[\text{log}d\left({h}_{t}\right)+\text{log}\left(1-d\left({\stackrel{\sim}{h}}_{t}\right)\right)\right]$$
(6)

A supervised loss is introduced to align the next-step prediction from real and synthetic latent sequences:

$$\:{L}_{sup}=\mathbb{E}\left[{\parallel{h}_{t}-g\left({h}_{t-1},{z}_{t}\right)\parallel}^{2}\right]$$
(7)

TimeGAN jointly optimizes reconstruction, adversarial, and supervised losses to effectively model the dynamics of time-series data and enhance data quality. Its novelty lies in simultaneously learning both global and conditional stepwise distributions, enabling realistic and temporally coherent sequence generation.

xLSTM based Multi-Scale Temporal feature extraction

In this study, the Extended Long Short-Term Memory (xLSTM) network is employed to capture both short-term local and long-term global temporal dependencies within PV features. The xLSTM extends the conventional LSTM by incorporating two additional components: the short-term memory module (sLSTM) and the multi-scale memory module (mLSTM), as shown in Fig. 3.

Fig. 3
Fig. 3
Full size image

xLSTM model structure.

Unlike conventional LSTM architectures that rely on a single vector-based cell memory, the extended LSTM (xLSTM) introduces enhanced memory structures and multi-scale temporal modeling mechanisms. In particular, the matrix-based memory LSTM (mLSTM) replaces the traditional scalar memory with a matrix-form representation, enabling richer state transitions and improved parallel processing capability. This design allows xLSTM to more effectively capture complex temporal patterns, non-stationary dynamics, and stage-wise variations in photovoltaic power output, which are difficult to model using standard LSTM networks.

The standard LSTM update equations are as follows:

$$\:\left\{\begin{array}{l}\begin{array}{l}{f}_{t}=\sigma\:\left({W}_{f}\cdot\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{f}\right)\\\:{i}_{t}=\sigma\:\left({W}_{f}\cdot\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{i}\right)\\\:{\stackrel{\sim}{C}}_{t}=tanh\left({W}_{C}\cdot\:\left[{h}_{t-1},{x}_{t}\right]+{b}_{C}\right)\end{array}\\\:\begin{array}{l}{C}_{t}={f}_{t}\odot\:{C}_{t-1}+{i}_{t}\odot\:{\stackrel{\sim}{C}}_{t}\\\:{o}_{t}=\sigma\:\left({W}_{o}\left[{h}_{t-1},{x}_{t}\right]+{b}_{o}\right)\\\:{h}_{t}={o}_{t}\odot\:tanh\left({C}_{t}\right)\end{array}\end{array}\right.$$
(8)

Where \(\:{f}_{t}\), \(\:{i}_{t}\) and \(\:{o}_{t}\) represent the forget, input, and output gates, \(\:{C}_{t}\) is the cell state, and \(\:{h}_{t}\) is the hidden state. \(\:\sigma\:\) denotes the sigmoid function and \(\:tanh\) is the hyperbolic tangent.

sLSTM Module: The sLSTM module strengthens the model’s responsiveness to transient fluctuations in AE signals, such as the early formation and growth of micro-cracks. It improves the control over memory updates, enabling precise modeling of short-term signal patterns.

mLSTM Module: The mLSTM module captures long-term dependencies by processing the input at multiple temporal scales \(\:i\). Each scale has a dedicated LSTM unit with the following equations:

$$\:\left\{\begin{array}{l}\begin{array}{l}{f}_{t}^{\left(i\right)}=\sigma\:\left({W}_{f}^{\left(i\right)}\cdot\:\left[{h}_{t-1}^{\left(i\right)},\:{x}_{t}^{\left(i\right)}\right]+{b}_{f}^{\left(i\right)}\right)\\\:{i}_{t}^{\left(i\right)}\:=\sigma\:\left({W}_{i}^{\left(i\right)}\cdot\:\left[{h}_{t-1}^{\left(i\right)},\:{x}_{t}^{\left(i\right)}\right]+{b}_{i}^{\left(i\right)}\right)\\\:{\stackrel{\sim}{C}}_{t}^{\left(i\right)}=\text{tanh}\left({W}_{C}^{\left(i\right)}\cdot\:\left[{h}_{t-1}^{\left(i\right)},\:{x}_{t}^{\left(i\right)}\right]+{b}_{C}^{\left(i\right)}\right)\end{array}\\\:\begin{array}{l}{C}_{t}^{\left(i\right)}={f}_{t}^{\left(i\right)}\odot\:{C}_{t-1}^{\left(i\right)}+{i}_{t}^{\left(i\right)}\odot\:{\stackrel{\sim}{C}}_{t}^{\left(i\right)}\\\:{o}_{t}^{\left(i\right)}=\sigma\:\left({W}_{o}^{\left(i\right)}\cdot\:\left[{h}_{t-1}^{\left(i\right)},\:{x}_{t}^{\left(i\right)}\right]+{b}_{o}^{\left(i\right)}\right)\\\:{h}_{t}^{\left(i\right)}={o}_{t}^{\left(i\right)}\odot\:\text{tanh}\left({C}_{t}^{\left(i\right)}\right)\end{array}\end{array}\right.$$
(9)

The final output of xLSTM is a fusion of hidden states from all scales, providing a unified representation that captures both fine-grained and long-term signal characteristics.

Transformer based Temporal dependency modeling

In this study, a hybrid model integrating xLSTM and Transformer is proposed to enhance temporal feature representation and global dependency modeling. The xLSTM network is first employed to extract fine-grained temporal features, which are then linearly projected via a fully connected layer to match the input dimensionality of the Transformer. To compensate for the Transformer’s lack of inherent temporal structure, positional encoding is added to the projected features. The Transformer architecture, composed of an encoder-decoder structure as shown in Fig. 4, utilizes multi-head self-attention, residual connections, and layer normalization to effectively capture long-range dependencies. In the decoder, masked attention ensures causality by preventing access to future positions, and information from the encoder guides the decoding process through cross-attention. The final prediction is generated via a softmax output layer. The attention mechanism is central to the Transformer’s strength, enabling the model to assign relevance-based weights across time steps, while residual connections and layer normalization ensure stable gradient flow and efficient training. The integrated feed-forward network further enhances representational capacity by transforming each sequence element independently.

Fig. 4
Fig. 4
Full size image

Transformer general structure.

Transformer’s self-attention mechanism can be formulated as Eq. (10):

$$\:A(K,Q,V)=S\left(\frac{Q{K}^{T}}{\sqrt{{d}_{k}}}\right)V$$
(10)

Where \(\:A\) represents the attention mechanism; \(\:S\) represents the Softmax function that calculates the attention weights; \(\:K,Q,V\) calculates the matrix of the attention mechanism, the keys of the matrix and the values, respectively; \(\:{d}_{k}\) is the dimension of the keys.

The multiple attention mechanism can be expressed as Eq. (11):

$$\:\begin{array}{c}{M}_{h}=C({h}_{1},{h}_{2},{h}_{3},L,{h}_{i}){W}_{o}\\\:{h}_{i}=A\left(Q{W}_{Qi},Q{W}_{Ki},Q{W}_{Vi}\right)\end{array}$$
(11)

Where \(\:{M}_{h}\) is the multi-head attention mechanism; \(\:C\) is the connection mechanism between the attention; \(\:{h}_{i}\) is denoted as the i-th attention mechanism; \(\:{W}_{o}\) is the linear transformation weight matrix after the connection of the multi-head attention mechanism; \(\:{W}_{Qi}\),\(\:{W}_{Ki}\),\(\:{W}_{Vi}\) are the linear transformation weight matrix.

Experiment results

This study employed historical operation data from a distributed photovoltaic (PV) cluster in the State Grid of China, incorporating temperature, humidity, irradiance, and actual power output as input features for TimeGAN-based time-series data generation. The dataset covers a continuous operating period of approximately 180 days, with a sampling interval of 15 min, resulting in 96 time steps per day and a total of over 17,000 data points. The recorded variables include photovoltaic power output, ambient temperature, humidity, and solar irradiance, providing a comprehensive representation of real-world PV operating conditions. Through t-SNE and PCA visualization techniques, the generated data were evaluated both locally and globally. The t-SNE projection results (Fig. 5) revealed that the synthetic samples closely overlapped with real data in the low-dimensional space, demonstrating TimeGAN’s capability to accurately preserve local data structure. Similarly, the PCA-based global projection (Fig. 6) showed that the main variance directions of generated data aligned well with the original data, confirming the model’s effectiveness in capturing the overall data distribution.

Fig. 5
Fig. 5
Full size image

Graph of t-SNE results.

Fig. 6
Fig. 6
Full size image

Graph of PCA results.

Quantitative evaluations further validated the augmentation quality. The Discriminative Score was 0.154, indicating minimal distributional discrepancy between generated and real data. The Predictive Score reached 0.061, close to the original dataset, reflecting the ability of TimeGAN to maintain meaningful temporal dependencies. Using these synthetic sequences, a backpropagation neural network was trained and tested alongside a model trained on the original dataset. The specific hyperparameter settings for the proposed model are listed in Table 1.

Table 1 xLSTM-Transformer parameters.

The xLSTM–Transformer hybrid architecture proposed in this study successfully combined local feature extraction with global temporal pattern recognition. The xLSTM module, enhanced by a matrix memory structure (mLSTM), captured stage-specific features of PV output under fluctuating environmental inputs. Concurrently, the Transformer module’s self-attention mechanism enabled modeling of long-range dependencies across sequences. Under identical hyperparameter settings, the proposed hybrid model achieved a Mean Absolute Percentage Error (MAPE) of 2.726%. These findings underscore the model’s robustness and superiority in capturing both short- and long-term dynamics of PV generation. A comparison was conducted between the xLSTM-Transformer method and the traditional LSTM and Transformer methods. The forecasting curves are visualized in Fig. 7, and the numerical results are presented in Table 2.

Fig. 7
Fig. 7
Full size image

Forecasting curve of xLSTM-Transformer.

Table 2 xLSTM-Transformer parameters comparison.

To further validate the effect of TimeGAN-based data augmentation, we trained two identical backpropagation neural network models using the original dataset and the augmented dataset, respectively, and tested both on the same test set. As illustrated in Figure 8 and compared in Table 3, the model trained on TimeGAN-augmented data achieved a MAPE of 2.726%, while the model trained on the original data yielded a significantly higher MAPE of 9.423%. This indicates that the synthetic sequences generated by TimeGAN are not only structurally consistent with real data but also substantially enhance the learning effectiveness of downstream models. The results confirm the practical value of TimeGAN in expanding training samples and improving generalization in photovoltaic forecasting tasks.

Fig. 8
Fig. 8
Full size image

Comparison of model predictions before and after data enhancement of the training set.

Table 3 TimeGAN parameters comparison.

Discussion

The results of this study highlight the effectiveness of integrating data augmentation and hybrid deep learning architectures for photovoltaic power forecasting. Compared with traditional sequence models, such as standalone LSTM or Transformer networks, the proposed framework benefits from the complementary strengths of xLSTM in capturing multi-scale temporal patterns and Transformer in modeling long-range dependencies. In addition, the incorporation of TimeGAN significantly enhances model generalization by enriching training data with realistic synthetic samples, particularly under extreme fluctuation scenarios.

Compared with recent studies on photovoltaic forecasting and hybrid learning frameworks, the proposed approach demonstrates competitive or superior performance while maintaining a relatively simple and interpretable architecture. Nevertheless, several limitations remain. First, the current framework is validated on data from a single photovoltaic cluster, and its scalability to multi-site or large-scale systems requires further investigation. Second, external uncertainty sources such as numerical weather prediction errors are not explicitly modeled. Future work will focus on extending the framework to multi-site cooperative forecasting, incorporating probabilistic weather forecasts, and deploying the proposed method in real-time power system operation environments.

Conclusion

This study proposed a novel hybrid TimeGAN–xLSTM–Transformer framework for photovoltaic power forecasting under complex and uncertain environmental conditions. By integrating TimeGAN-based data augmentation with multi-scale temporal feature extraction and attention-based dependency modeling, the proposed approach effectively addresses data scarcity, nonlinear dynamics, and long-range temporal dependencies inherent in photovoltaic power generation. Experimental results based on real-world operational data demonstrate that the proposed framework significantly outperforms conventional LSTM and Transformer baselines in terms of RMSE, MAE, and MAPE. The findings confirm that combining generative modeling with hybrid sequence learning provides a robust and accurate solution for photovoltaic power forecasting, offering practical value for renewable energy integration and power system operation.