Abstract
Accurate photovoltaic (PV) power forecasting serves as a critical foundation for economic dispatch and reliable grid operation. To address the inherent uncertainty in PV power generation, this study proposes a short-term PV power interval prediction method based on Bayesian-optimized CNN-BiLSTM-attention (BO-CNN-BiLSTM-attention) that accounts for conditional dependencies in prediction errors. The methodology comprises three main stages: first, PV output data undergoes preprocessing and feature selection. Second, a Bayesian-optimized CNN-BiLSTM-attention model achieves high-precision point forecasting for target time periods. Finally, the K-shape time series clustering algorithm matches point predictions with temporally similar historical data, while adaptive bandwidth kernel density estimation models the probability distribution of prediction errors from similar patterns, thereby enabling interval prediction. Experimental validation on a photovoltaic plant in Xinjiang, China demonstrates that the proposed method achieves superior prediction accuracy compared to various single and ensemble forecasting models, while outperforming multiple interval construction approaches in terms of prediction effectiveness.
Introduction
With the proposal of the “dual carbon” goals, the construction of new power systems is advancing steadily, and modern power systems are evolving toward a "dual-high" paradigm characterized by high proportions of power electronic equipment applications and renewable energy integration. As one of the most significant forms of renewable energy generation, photovoltaic (PV) power generation has become increasingly critical for grid scheduling, planning, and safe, stable operation1. Given that the high volatility and strong randomness of PV output pose substantial challenges to accurate forecasting, an increasing number of researchers have begun attempting to quantify PV output uncertainty through interval prediction, thereby providing more comprehensive PV output forecasting information2.
PV output prediction methods are primarily categorized into deterministic prediction and uncertainty prediction3. Deterministic prediction methods mainly refer to point prediction, and most studies on PV output prediction have centered on point prediction to date. The main technical approaches can be classified into physical methods, statistical methods, and hybrid prediction methods that combine both approaches.
Physical methods for PV output point prediction are primarily based on numerical weather prediction (NWP) models, satellite imagery, and sky imaging data sources, predicting PV output by simulating atmospheric behavior, solar radiation transmission, and cloud coverage processes. These methods involve detailed modeling of the physical characteristics of PV systems, including solar radiation separation and conversion, electrical characteristics of PV components, and environmental factor influences. For instance, Yuan et al.4 proposed a meteorological forecast-based physical prediction model for PV power, achieving meteorological parameter prediction through hierarchical clustering and implementing PV power point prediction combined with an improved incremental conductance method. János et al.5 conducted a comprehensive comparative analysis of physical models for PV power prediction, demonstrating that model selection significantly impacts prediction accuracy, with average absolute error differences reaching up to 13% between the most and least accurate models. Physical methods offer advantages in interpretability, small-sample effectiveness, and stability, making them suitable for medium- to long-term prediction. However, they require high-quality meteorological data and exhibit unstable accuracy under extreme weather conditions.
Statistical methods for PV output point prediction are primarily based on historical data, establishing prediction models by analyzing statistical relationships between historical meteorological data and PV power data. These methods mainly include traditional statistical methods, machine learning methods, and currently prevalent deep learning methods: (a) Traditional statistical methods: These include autoregressive integrated moving average (ARIMA) models, seasonal autoregressive integrated moving average (SARIMA) models, and various regression analysis methods. ARIMA models, as typical representatives of Box-Jenkins time series modeling methodology6, have been widely applied in PV output point prediction research7,8,9. SARIMA builds upon ARIMA by considering seasonal patterns in data, possessing the capability to capture both short-term and long-term dependencies, enabling more accurate predictions10. Regression analysis methods achieve prediction by establishing regression relationships between PV power and meteorological factors, offering advantages of model simplicity and high computational efficiency11. However, traditional statistical methods6,7,8,9,10,11 have extremely limited capabilities in handling nonlinearity and non-stationarity in data, making it difficult to effectively capture complex patterns in PV power data. (b) Machine learning methods: These methods overcome some limitations of traditional statistical approaches by learning complex nonlinear relationships from historical data to achieve PV power prediction. Mainstream machine learning methods include support vector machines (SVM)12, artificial neural networks (ANN)13, extreme learning machines (ELM)14, and random forests15. These methods can process high-dimensional feature data with strong nonlinear fitting capabilities. Raza et al.16 proposed a multivariate neural network (NN) ensemble framework for PV output point prediction by training multiple neural network predictors combined with Bayesian model averaging techniques, providing practical solutions for improving seasonal PV power output prediction accuracy. Liu et al.17 integrated three different model types—SVM, MLP, and MARS—through recursive arithmetic averaging, proposing a data-driven ensemble modeling technique with prediction performance significantly superior to individual models. While machine learning methods12,13,14,15,16,17 can utilize large-sample PV power data to achieve relatively accurate PV output prediction, they still have limitations in handling long-term dependencies in time series data and limited prediction effectiveness when facing large-scale, high-dimensional complex datasets. (c) Deep learning methods: The emergence of recurrent neural networks (RNN), long short-term memory networks (LSTM), gated recurrent units (GRU), and convolutional neural networks (CNN) has effectively addressed the limitations of traditional machine learning methods, particularly excelling in handling complex temporal relationships in time series data. Maria et al.18 achieved precise hourly PV output prediction based on stacked long short-term memory networks. Sun et al.19 proposed a specialized convolutional neural network model called "SUNSET," which effectively addresses uncertainty caused by cloud changes in short-term solar power prediction by fusing sky images and historical PV power data, improving prediction accuracy and reliability. Furthermore, advanced deep learning architectures have demonstrated enhanced capabilities through sophisticated structural innovations. Niu et al.20 developed Solar-Mixer, an efficient end-to-end model that integrates anomaly detection with interval-mixing and channel-mixing layers based on multilayer perceptrons, enabling accurate long-sequence time series forecasting while maintaining low computational complexity through pure MLP-based architecture. Deep learning methods can automatically extract features and effectively capture complex nonlinear relationships in data, achieving further accuracy improvements in PV power prediction compared to traditional machine learning methods.
Hybrid prediction methods that integrate multi-domain, multi-disciplinary, or multi-type technical approaches have gained scholars’ attention to fully combine the advantages of different methods and maximize high-precision PV output prediction. These include physics-statistics hybrid methods, deep learning combination methods, and ensemble models. Physics-statistics hybrid methods combine the mechanistic nature of physical models with the flexibility of statistical methods to enhance prediction performance. Mellit et al.10 developed a hybrid SARIMA-SVM model for short-term power prediction of small grid-connected PV plants, achieving a normalized root mean square error of only 9.57%, significantly outperforming individual SARIMA or SVM models. Theocharides et al.21 proposed a day-ahead PV power prediction method based on machine learning and statistical post-processing, combining numerical weather prediction (NWP) outputs with historical observational data to effectively improve prediction accuracy. Deep learning combination methods emphasize fusing advantages of different deep learning models to enhance prediction performance, with CNN-LSTM hybrid architectures being typical representatives. Chai et al.22 combined CNN’s spatial feature extraction capabilities with LSTM’s nonlinear temporal dependency fitting abilities, proposing a CNN-LSTM method for PV output point prediction while introducing correntropy criteria to handle data contamination, improving model robustness for simultaneous PV power prediction across multiple regions and time periods. Li et al.23 proposed a temporal convolutional network (TCN)-based hybrid prediction framework for utility-scale PV forecasting. Niu et al.24 proposed a mid-term PV forecasting system employing a 'De-Trend First, Attend Next’ strategy that separates trend and seasonal components before applying encoder-decoder structures with temporal convolution and attention mechanisms, achieving 73% improvement in mean squared error through component-specific modeling. The same research team also introduced an innovative 'Amplify Seasonality, Prioritize Meteorological’ approach that leverages dual-layer hierarchical attention mechanisms to strengthen connections between meteorological features and seasonal components while protecting trend components from short-term meteorological fluctuations, demonstrating over 10% improvement in Mean Absolute Error25. Furthermore, advanced ensemble methodologies have enhanced prediction robustness across diverse operational conditions. Cao et al.26 developed an ultra-short-term PV forecasting framework based on Stacking ensemble algorithm (StAB) that integrates correlation-guided fast Fourier transform decomposition with multi-model optimization, achieving superior generalization performance across different distributed PV systems. Overall, these hybrid methods can effectively reduce point prediction errors of individual models, improving prediction stability and reliability, making them optimal choices for current PV output point prediction. Overall, these hybrid methods can effectively reduce point prediction errors of individual models, improving prediction stability and reliability, making them optimal choices for current PV output point prediction.
The above methods are all point prediction approaches that offer the advantage of intuitive prediction results; however, they provide very limited prediction information and cannot reflect global uncertainty. Interval prediction methods represent a viable solution to this challenge. Interval prediction methods obtain upper and lower bounds of PV output at given confidence levels, effectively quantifying PV output uncertainty and providing richer prediction information. Existing interval prediction methods can be specifically categorized into two types:
The first type is direct interval prediction, where prediction methods can directly output upper and lower bounds for interval prediction without relying on deterministic prediction results. Literature27 proposed an ensemble method (ELUBE) based on ELM and Lower upper bound estimation (LUBE) to achieve direct interval prediction of PV power. To improve prediction interval quality, the authors employed three different activation functions—sigmoid, radial basis, and sine functions—to train multiple ELUBE models, combining selected high-performance models through weighted averaging methods. Zhao et al.28 proposed an efficient probabilistic interval prediction method for PV output by improving Bayesian neural networks, introducing probabilistic representation weights, t-distributed stochastic neighbor embedding algorithm for dimensionality reduction, and fully connected and convolutional neural networks, achieving superior accuracy and reliability compared to traditional models.
The second type is indirect interval prediction, which constructs confidence intervals for PV output through other methods based on point prediction results. Han et al.29 analyzed seasonal distribution characteristics of PV power, utilized seasonal multi-models of Extreme Learning Machines (ELM) for deterministic prediction, then employed kernel density estimation to fit deterministic prediction errors, proposing an indirect interval prediction method for PV power combining multi-models and non-parametric estimation. Literature30 proposed a PV power point prediction method based on hybrid intelligent models, optimized using wavelet transforms and radial basis neural networks, achieving indirect and direct interval prediction of PV power through Bootstrap and quantile regression methods, respectively, further expanding neural network applications in time series data learning and providing new insights for PV energy prediction. Yang et al.31 employed commonly used recurrent neural networks (LSTM, GRU) to achieve PV output point prediction, comprehensively corrected point prediction results considering temporal and spatial dependencies of PV power, and finally implemented indirect interval prediction of PV output using kernel density estimation methods based on conditional dependencies of PV output errors. However, due to their relatively single point prediction approach, the point prediction effectiveness still requires improvement. The so-called conditional dependency of PV output errors refers to the fact that the probability distribution of PV output is related to factors such as prediction time scales and prediction models. Research addressing this includes conventional methods32,33, methods considering spatial dependencies34, and methods for statistical analysis of errors within the same time periods35.
In summary, this paper proposes a short-term PV output interval prediction method based on Bayesian-optimized CNN-BiLSTM-Attention (BO-CNN-BiLSTM-Attention) that accounts for conditional dependencies of prediction errors. The methodology comprises the following steps: First, PV power data undergo preprocessing operations including missing value imputation, outlier replacement, and nighttime data removal. Second, the MIC method analyzes correlations between PV power data and multi-dimensional features for feature selection engineering. Third, a Bayesian algorithm-optimized CNN-BiLSTM-Attention method achieves high-precision point prediction of PV output for target time periods. Finally, the K-shape clustering algorithm matches historical optimal similar days for target prediction periods, and adaptive bandwidth kernel density estimation (ABKDE) performs kernel density estimation on error sample sets of similar days, thereby achieving interval prediction of PV output. Experimental results demonstrate that the proposed method achieves higher prediction accuracy compared to various benchmark prediction models and combined prediction models.
The main contributions of the proposed method in this paper are as follows:
-
(1)
Introduced the Adaptive Bandwidth Kernel Density Estimation (ABKDE) method for PV power prediction error modeling that dynamically adjusts bandwidth parameters according to local sample density characteristics. Unlike conventional fixed-bandwidth approaches, ABKDE employs iterative optimization to determine optimal local bandwidth factors, providing enhanced density estimation capabilities for unevenly distributed prediction error datasets.
-
(2)
Integrated the K-shape time series clustering algorithm to establish a conditional error dependency-aware interval prediction architecture. This approach identifies historical similar days based on temporal shape patterns rather than numerical value similarity, enabling weather-specific error distribution modeling that recognizes the relationship between meteorological conditions and prediction error characteristics.
-
(3)
Designed a hierarchical CNN-BiLSTM-Attention framework that combines multi-scale convolutional feature extraction with bidirectional temporal dependency modeling. The architecture employs progressive CNN layers with kernel sizes (1, 3, 5) for hierarchical feature learning, BiLSTM networks for capturing forward and backward temporal relationships, and temporal pattern attention mechanisms for adaptive time-step weighting.
-
(4)
Established a systematic interval prediction methodology that integrates shape-based clustering with adaptive density estimation to address prediction error conditional dependency. This comprehensive approach provides a novel framework for uncertainty quantification in PV forecasting by systematically accounting for the non-uniform distribution of prediction errors across different environmental and temporal conditions.
Theoretical algorithm
Maximal information coefficient
Feature selection engineering is necessary to avoid overfitting of the prediction model and reduce the computational burden on the model. Given that photovoltaic power exhibits strong nonlinear relationships with multidimensional input features36, and considering the limitations of commonly used linear analysis methods such as Pearson correlation analysis, this study adopts the Maximal Information Coefficient (MIC) to analyze the correlations between photovoltaic power and multidimensional input features and to perform feature selection engineering. MIC is a statistical method that can capture both linear and nonlinear relationships while being relatively insensitive to noise in the data37. Its core principle involves locally maximizing the mutual information between two variables to identify the strongest relationship. The calculation process consists of three main steps: First, the two-dimensional data space is divided into a and b intervals in the X and Y directions, respectively, forming ab grids. Second, the mutual information between the two variables is calculated within each grid. Finally, the maximum mutual information value is identified across all grids and used as the MIC value. The mutual information between two variables x and y is shown in Eq. (1), and the MIC calculation formula is presented in Eq. (2).
where: n is the total number of samples; C(n) is a function that represents the upper limit of meshing, generally taken as C(n) = n0.638.
Convolutional neural network
CNNs are widely-used deep learning models for processing grid-structured data and time series feature extraction39. As shown in Fig. 1, CNNs comprise input layers, convolutional layers, pooling layers, and fully connected layers. Data enters through the input layer and undergoes convolution operations via Eq. (4) to capture hierarchical structures and learn local features.
where x represents input data; y denotes convolutional layer output (extracted features); w is the convolution kernel; k represents kernel size; i and j indicate output positions; m and n represent kernel positions; wb denotes bias terms; f is the activation function.
Pooling layers, positioned between consecutive convolutional layers, reduce spatial dimensions of extracted features while preserving essential information. This study employs max pooling, which selects maximum values from each region as representatives via Eq. (5).
where maxpooling represents the pooling result; x denotes the input matrix; sh and sw represent vertical and horizontal strides respectively; h and w represent pooling window dimensions.
Fully connected layers implement forward propagation via Eq. (6), flattening convolutional and pooling outputs into vectors for classification or regression tasks.
where yk represents the k-th neuron output; xi denotes the i-th element of the flattened feature vector; Wk,i represents connection weights from the i-th input to k-th output neuron; bk denotes the k-th output neuron bias; t represents input feature vector dimensionality.
Bidirectional long short-term memory network
BiLSTM builds upon LSTM improvements, with LSTM architecture shown in Fig. 2. LSTM’s core concept involves memory cells and gating mechanisms for adaptive information reading, writing, and forgetting to address long-term dependencies in sequence processing. As illustrated in Fig. 2, the forget gate ft controls information retention from previous cell state ct − 1 through sigmoid activation, determining which historical information to discard. The input gate it regulates current input importance, collaborating with candidate values \(\widetilde{c}_{t}\) to determine new information storage. Candidate values \(\widetilde{c}_{t}\) generate new candidate information via tanh activation. Cell state ct maintains long-term memory through interaction with previous state ct-1 and forget gate ft, while incorporating short-term information via it·\(\tilde{c}\)t. The output gate ot controls which cell state information outputs to hidden state ht, enabling selective information transmission. This coordinated gating mechanism enables LSTM to effectively maintain and transmit critical information across long sequences, overcoming traditional RNN gradient vanishing problems. LSTM computations follow Eq. (7).
where xt represents input at time t; ht − 1 denotes hidden state at time t − 1; it represents input gate state; ct denotes cell state; σ represents sigmoid activation; tanh denotes hyperbolic tangent activation; Wf, Wi, Wc, Wo and Uf, Ui, Uc, Uo are weight matrices for forget, input, candidate, and output gates multiplying xt and ht-1 respectively; bf, bi, bc, bo are corresponding bias terms.
Unlike traditional LSTM, BiLSTM comprises forward and backward LSTM components processing bidirectional temporal information, better capturing contextual information in time series data with bidirectional dependency capabilities while retaining LSTM’s gradient vanishing mitigation40. BiLSTM structure appears in Fig. 3.
As shown in Fig. 3, forward LSTM processes from sequence start, receiving current input xt and previous forward hidden state hf t-1, updating forward hidden states and memory cells via Eq. (8) to obtain hf t.
where LSTM(·) represents LSTM unit computations following Eq. (7) for hidden state updates.
Simultaneously, backward LSTM processes from sequence end, receiving current input xt and subsequent backward hidden state hb t-1, updating backward hidden states and memory cells via Eq. (9) to obtain hb t.
Finally, Eq. (10) concatenates forward and backward hidden state sequences temporally to produce final output sequence yt.
where Wy represents weight matrices for linear combination of hidden layer states; by denotes bias terms.
Attention mechanism
While CNN-BiLSTM models can capture nonlinear relationships between PV power data and multi-dimensional input features, prediction performance may suffer from increasing time series lengths. Attention mechanisms simulate biological attention, analogous to humans focusing on specific regions due to limited visual resources, successfully applied in machine translation and time series prediction41. Applying temporal attention mechanisms to CNN-BiLSTM outputs enables adaptive selection of relevant historical time series state information, obtaining corresponding temporal sequence hidden state weights and determining temporal state influence on current PV power output, thereby extracting critical temporal information to improve prediction performance. The temporal attention mechanism operates as follows:
Encode BiLSTM’s final hidden layer output h = {h1,h2,ht,···,hT} to obtain query vector q; then employ additive attention to compute historical temporal state importance for current output. The attention scoring formula follows Eq. (11).
where Score(ht) represents similarity scores indicating correlation between historical temporal state ht and output; Ws denotes weight matrices; bs represents bias terms.
Subsequently, softmax normalization produces input vector weights at via Eq. (12).
Based on step (1) attention weight calculations for historical temporal states, weighted averaging produces final attention-optimized output h via Eq. (13).
K-shape clustering
The K-shape clustering algorithm is specifically designed for time series data clustering. Unlike traditional clustering algorithms such as K-means, K-means++, and DBSCAN, which rely primarily on numerical values, K-shape identifies the underlying “shape” or pattern of time series data. This capability stems from K-shape’s ability to perform normalization, horizontal translation, and vertical stretching operations on the clustered objects. Consequently, K-shape is particularly well-suited for handling time series data that exhibit obvious dynamic changes and nonlinear characteristics42. The specific clustering steps of K-shape are as follows:
Measure the distance between time series data. Unlike traditional clustering algorithms, K-shape uses a shape-based distance (SBD) based on slope, intercept and normalized cross correlation coefficient (NCCC) to measure the distance between time series data. The SBD-based distance metric ensures that the scale and offset of time series data do not affect the distance calculation accuracy. For two time series x = (x1, x2, ···, xn) and y = (y1, y2, ···, yn), the distance metric formula is as follows:
where: CCω (x,y) represents the inter-correlation sequence of length 2m − 1, ω ∈ {1,2, ···,2m − 1}. The value of SBD(x,y) is in the range of [0, 2], and the smaller the value, the smaller the degree of dissimilarity between the two time series data.
Extract representative features from time series data. K-shape calculates cluster centroids and assigns each sequence to distinct cluster groups based on their distance relationships to these centroids. The centroid computation process resembles an optimization problem, where the algorithm seeks to identify the cluster centroid μi* that maximizes the squared similarities to all constituent time series sequences. The process of calculating the cluster center is similar to the optimal value problem, and the goal is to find the cluster center corresponding to the maximum quadratic similarity, which is calculated as shown in Eq. (18).
where Ci represents the i-th homoscedastic cluster class. μi represents the initial center of mass of the i-th cluster class.
Perform shape-based clustering of time series. The algorithm implements shape-based clustering through the aforementioned distance measurement and feature extraction procedures. Initially, cluster centroids are randomly initialized, followed by iterative computation of each cluster’s centroid based on the optimization framework described in step 2). The algorithm employs SBD metrics to evaluate clustering effectiveness and refine cluster assignments. The iterative process continues until convergence is achieved, which occurs when cluster assignments remain unchanged between consecutive iterations or when the maximum number of iterations is reached (typically 100 iterations), following the standard k-means convergence paradigm. The specific flow of K-shape clustering is shown in Fig. 4.
Adaptive bandwidth kernel density estimation
Kernel density estimation (KDE) is a nonparametric method for estimating the probability density function of data, which has the advantages of high adaptability and flexibility, and its main principle is to estimate the density of a sample with the help of a sliding window. Considering that a fixed window bandwidth may not be well adapted to the estimation of the probability density of the PV output prediction error, this paper decides to use ABKDE to realize the kernel density estimation of the PV output prediction error. Adaptive bandwidth kernel density estimation allows the bandwidth to vary according to local sample density and determines optimal local bandwidth through iterative optimization. This approach provides particularly effective density estimation for unevenly distributed samples43. Typical density estimation functions based on Gaussian kernel functions are shown in Eqs. (19) and (20). The formulas for iteratively solving the optimal local bandwidth are shown in Eqs. (21)–(23).
where σK(ξ0) represents the kernel density at ξ0 of the sample to be estimated. K(·) represents the kernel function. ξi represents the ith sample, and μ is the bandwidth of the kernel density estimation, whose value is relatively smooth when the kernel density distribution is large, and relatively steep when the kernel density distribution is small. λi,k+1 represents the local bandwidth factor obtained from the kth iteration, and K represents the number of iterative solution times. σK,k(ξ0) is the kernel density at ξ0 obtained from the kth iteration. g represents the kernel density normalization factor obtained from the kth iteration. α is the sensitivity factor, which indicates the sensitivity of local bandwidth to the sparsity of the sample distribution. σk(ξ) is the kernel density at ξ obtained in the kth iteration. gk represents the kernel density normalization factor obtained in the Kth iteration. α is the sensitivity factor, which indicates the sensitivity of the local bandwidth to the sparsity of the sample distribution, and if α is 0, then the ABKDE degenerates to a fixed-bandwidth kernel density estimation.
BO-CNN-BILSTM-attention-based short-term PV power interval prediction method considering conditional dependence of prediction errors
This study integrates deep learning algorithms, time series clustering algorithms, and statistical probabilistic density analysis methods, proposing a short-term PV output interval prediction method based on BO-CNN-BiLSTM-Attention considering prediction error conditional dependency. First, MIC analysis comprehensively examines correlations between input features and PV power data for feature selection. Then, BO-CNN-BiLSTM-Attention hybrid neural networks achieve high-precision point prediction, establishing foundations for subsequent interval prediction. Based on point predictions, K-shape algorithms match historical similar-day PV power data for target prediction periods, while ABKDE fits prediction errors from historical similar days to construct prediction intervals.
Considering numerous hyperparameters in hybrid neural networks directly impact prediction performance, hyperparameter optimization becomes essential. For neural network hyperparameter optimization, Bayesian algorithms efficiently explore and exploit hyperparameter spaces with adaptive search strategies, providing reliable optimization results for complex black-box functions. These algorithms utilize prior evaluation results to update posterior probability distributions of objective functions, guiding subsequent sample selection to identify superior hyperparameter settings efficiently. Detailed Bayesian optimization algorithms appear in literature44. Figure 5 presents the integrated framework combining Bayesian optimization workflows with the proposed methodology.
Considering the strong volatility and high stochasticity of PV power, a single recurrent neural network prediction model, such as RNN, GRU, LSTM, BiLSTM, etc., may not be able to fit the extremely complex nonlinear relationship between PV power and each input feature well. Therefore, in this paper, three-layer CNN is used to process the input data and extract its local spatial features, and the convolution kernel sizes of the three-layer CNN are set to 1, 3, and 5 sequentially to fully extract the time series features at different scales layer by layer. Then, the long-term dependence between the PV power and the input features is fully captured based on the three-layer BiLSTM to realize the nonlinear fitting. Finally, Temporal Pattern Attention mechanism is employed to focus on identifying important temporal patterns in the CNN-BiLSTM output data and dynamically weighting different time steps to further improve the prediction accuracy of the model. The overall framework of CNN-BiLSTM-Attention proposed in this paper is shown in Fig. 6.
Case study
In this paper, the PV power dataset and the corresponding NWP dataset of a PV power plant with a capacity of 50 MW in Xinjiang, China, are selected for the study. The time span of the dataset is February 28, 2019–May 31, 2019, and the sampling interval is 15 min, with a total of 8832 sets of data, and the dataset has no missing data.
Data pre-processing
The original PV output data with zero values during nighttime periods were excluded, and only valid PV output data from 56 sampling points during 7:00–20:45 were retained. Thus, the total sample size is reduced from 8832 to 5152. Meanwhile, considering the magnitude differences between historical PV power data and multidimensional features, which could affect model prediction accuracy if used directly as input features, the Min–Max normalization method is applied to linearly scale the original data as follows.
where: x' is the normalized data; x is the original data; xmax, xmin are the maximum and minimum values in the original data.
Feature selection
MIC was used to analyze the magnitude of correlation between the PV power data and each of the input features, and all the features and their correlation magnitudes are shown in Table 1. As seen in Table 1, the three meteorological factors directly related to solar irradiance, global horizontal irradiance, direct normal irradiance, and diffuse horizontal irradiance, have the greatest impact on PV power, all above 0.7. This is followed by the three characteristics: historical data of PV power one day, two days, and one week ago. It is worth noting that the module temperature of the PV power generation device also has a large impact on the PV power, which is close to 0.5. Only the ambient temperature and relative humidity have a smaller impact on the PV power, and the MIC value after retaining two decimals is only 0.17. In summary, in order to give full play to the feature extraction ability of CNN as much as possible, this paper only excludes the two features of ambient temperature and relative humidity, which have very little correlation with the PV power, and the remaining eight features are all used as the input features for PV power prediction.
Bayesian-based hyperparameter optimization
The CNN-BiLSTM-Attention combination prediction model can achieve highly accurate photovoltaic power point prediction. However, the model’s performance is critically dependent on proper hyperparameter configuration. If the high-dimensional hyperparameters are not reasonably set, the prediction effectiveness will be significantly compromised. Unfortunately, commonly used optimization approaches such as grid search methods and other intelligent optimization algorithms consume considerable computational time and resources. Therefore, this paper adopts the Bayesian optimization algorithm based on mathematical statistics to realize the efficient optimization of high-dimensional hyperparameters. This study implements Bayesian optimization using the Tree-structured Parzen Estimator (TPE) algorithm within the Optuna package. The optimization process is configured with 200 search iterations and incorporates a pruning strategy to enhance computational efficiency. Additionally, a patience parameter of 50 is implemented as an early stopping mechanism. Specifically, if 50 consecutive training iterations fail to improve the objective function value, the optimization process terminates early to prevent unnecessary computational overhead. The optimization range of hyperparameters and optimization results are shown in Table 2.
Indicators for the assessment of projected results
Indicators for evaluating the results of point forecasting
In order to accurately assess the accuracy of the model in this paper, the mean absolute error (MAE) and root mean square error (RMSE) were selected to evaluate the point prediction results. The calculation formula of each assessment index is as follows:
where: both RMSE and MAE are in MW. n represents the number of test set data. \({\widehat{y}}_{i}\) is the predicted value of the ith prediction sample. yi is the true value of the ith test sample.
Indicators for evaluating the results of interval forecasting
The prediction interval coverage EPICP, the average width of the prediction interval EPINAW, and the composite indicator ECWC are selected as the evaluation indexes. The larger the EPICP, the smaller the EPINAW, the lower the composite indicator ECWC, the better the prediction effect. The calculation equations of EPICP, EPINAW, and ECWC are shown below.
where: n represents the number of samples; τi is the Boolean quantity of the ith sample, if the predicted value falls within the prediction interval, then τi is 1, otherwise it is 0. δ is the range of the predicted target, which represents the difference between the maximum value and the minimum value, and is used for the normalization of EPINAW . U(xi) and L(xi) are the upper and lower boundaries of the prediction interval, respectively. xi is the input variable; γ(·) is the penalty coefficient. η is the confidence penalty coefficient of EPICP. When EPICP does not satisfy the requirement of the confidence level μ, the composite indicator ECWC will be amplified due to the amplification of the difference between the value of EPICP and μ, and when ECWC meets the requirement of the confidence level μ, the value of ECWC will be determined by the joint decision of EPINAW and η.
Analysis of the results of point forecasting
The dataset was divided into training, validation, and test sets according to a 7:2:1 ratio for single-step photovoltaic power prediction. To validate the effectiveness of the proposed BO-CNN-BiLSTM-Attention method for point prediction, five comparative models were constructed: BO-CNN-LSTM-Attention45, BO-CNN-BiLSTM46, BO-FCN-BiLSTM47, BO-BiLSTM48, and CNN-BiLSTM-Attention49. For fairness, Bayesian optimization was applied to models45,46,47,48,49, while only the CNN-BiLSTM-Attention model proposed in49 was retained to demonstrate the superiority of Bayesian optimization. The prediction performance metrics of each model on the test set are presented in Table 3, and the violin plots of prediction errors for each model on the test set are illustrated in Fig. 7.
As shown in Table 3, the proposed BO-CNN-BiLSTM-Attention point prediction model achieves the best predictive performance, outperforming other models from the literature45,46,47,48,49 in both the Mean Absolute Error (MAE), which reflects overall error levels, and the Root Mean Square Error (RMSE), which captures extreme values. Compared to the CNN-LSTM-Attention model employed in45, the present study utilizes BiLSTM as the nonlinear fitting model, which better captures bidirectional temporal dependencies in photovoltaic power. Therefore, under the premise that both CNN-LSTM-Attention and the CNN-BiLSTM-Attention used in this study are optimized through Bayesian optimization, the hybrid model employed in this work still achieves superior predictive performance. Furthermore, the adoption of the Temporal Pattern Attention (TPA) mechanism enables the proposed model to learn more complex and detailed time series patterns. Consequently, compared to the CNN-BiLSTM model in46, even when Bayesian-based hyperparameter optimization is also applied, the proposed model demonstrates significantly improved prediction accuracy. It can also be observed that both the prediction accuracy and overall error level of the proposed model substantially outperform the CNN-BiLSTM-Attention model presented in49, thereby highlighting the importance of Bayesian optimization algorithms for enhancing the prediction accuracy of deep learning ensemble forecasting models. Finally, as intuitively shown in Fig. 7, the prediction errors of the proposed model are predominantly concentrated around zero, indicating that the overall prediction error level of the proposed method is relatively low. Moreover, the proposed point prediction model exhibits smaller extreme outliers compared to the other five models. In summary, the superiority of the proposed point prediction model for photovoltaic power forecasting applications can be validated.
Clustering of similar day data and construction of prediction intervals
To verify the effectiveness of the proposed interval construction method, this study conducts comprehensive validation experiments. The K-shape and ABKDE-based approach, which accounts for conditional dependence of prediction errors, is applied to three distinct weather scenarios: sunny, cloudy, and overcast conditions. For each weather type, one representative day is randomly selected for interval prediction analysis. The results are subsequently visualized and analyzed to assess method performance. The specific process is as follows: first, similar day data are matched for the time period to be predicted based on the K-shape clustering algorithm, as shown in Fig. 8. Then, the prediction error distribution of similar data clustering under different weather types is analyzed, and the histograms of prediction error distribution of similar data clustering under three weather types are given in Fig. 9. Finally, the ABKDE method is used to fit the histograms of the frequency distributions of different weather types to realize the interval prediction of the time period to be predicted.
Analysis of interval forecast results
To validate the advantages and disadvantages of the proposed ABKDE method, in addition to the ABKDE approach employed in this study, the kernel density estimation (KDE) method proposed in50 and the Bootstrap method proposed in51 were used as comparative interval estimation methods, with confidence levels set at 90% and 95% for interval estimation. The interval estimation evaluation metrics for the three methods under different weather conditions are presented in Tables 4, 5 and 6, respectively.
Overall, the iterative convergence-based ABKDE interval estimation method employed in this study outperforms the other two interval estimation methods across all three weather conditions, followed by the KDE method proposed in46, while the Bootstrap method proposed in47 demonstrates the poorest performance. From the perspective of weather type differences, interval estimation generally performs well under sunny conditions, moderately under cloudy conditions, and poorest under rainy conditions. As shown in Tables 4, 5 and 6, under sunny conditions, the ABKDE interval estimation accuracy reaches as high as 98.21% at the 95% confidence level and achieves 92.86% at the 90% confidence level. Under cloudy weather conditions, the accuracy is only 96.43% at the 95% confidence level, while under rainy weather conditions, it drops to as low as 94.64%. Figure 10 presents the interval estimation performance under different weather conditions achieved through the ABKDE method. As intuitively shown in Fig. 10, under sunny and cloudy weather conditions, the prediction intervals constructed by the proposed interval construction method at the 95% confidence level can essentially cover the vast majority of photovoltaic output points. However, under rainy conditions, due to significant variations in meteorological conditions that lead to rapid changes in photovoltaic power, the difficulty of interval prediction increases substantially, with photovoltaic output points that cannot be captured by the prediction interval at the 90% confidence level exceeding 10%. Nevertheless, at the 95% confidence level, most photovoltaic output points can still be covered, except for some points with rapid photovoltaic power fluctuations. In comparison, the interval prediction performance of KDE and Bootstrap methods is inferior to that of ABKDE, particularly the Bootstrap method. Although the average width of prediction intervals for Bootstrap is narrower than those of ABKDE and KDE, the interval coverage rate is very small, reaching only 0.9286 under the 95% confidence interval. In summary, the proposed iterative convergence-based ABKDE interval estimation method demonstrates superior predictive performance and can achieve effective interval prediction.
Conclusion
Aiming at the problem of PV power volatility, stochasticity, and significant nonlinear features that make it difficult to predict accurately, this paper proposes a short-term PV power interval prediction method based on BO-CNN-BiLSTM-Attention taking into account the conditional dependence of the prediction error, and draws the following conclusions from the experiments:
-
1.
The BO-CNN-BiLSTM-Attention hybrid neural network point prediction method fully combines the feature extraction capability of CNN, the nonlinear fitting capability of BiLSTM, and the TPA mechanism for the identification of complex time series patterns, and adopts an efficient Bayesian optimization algorithm to optimize the hyper-parameters of the combined prediction model, which can bring out the maximum performance of the combined prediction model and achieve better results than BO-FCN-BiLSTM, BO-CNN-BiLSTM and other models, and lays the foundation for the subsequent realization of accurate uncertainty prediction.
-
2.
Considering the conditional dependence of the prediction error of PV power, this paper realizes the construction of future PV power prediction intervals based on the characteristics of the prediction error distribution of historical similar PV power data, and adopts the K-shape time series clustering algorithm to match the new predicted value of PV power for the time period to be predicted with historical similar PV power data to support the prediction of PV power intervals under different weather conditions.
-
3.
The ABKDE method based on iterative convergence is used to realize the probability density estimation of PV power prediction error, which achieves better interval estimation than the KDE and Bootstrap methods, and is conducive to better capturing PV power output points.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
References
Mathieu, D. et al. Value of deterministic day-ahead forecasts of PV generation in PV+ storage operation for the Australian electricity market. Sol. Energy 224, 672–684 (2021).
Lai, C. et al. Review of photovoltaic power output prediction technology. Trans. China Electrotech. Soc. 34(06), 1201–1217 (2019).
Li, B. & Zhang, J. A review on the integration of probabilistic solar forecasting in power systems. Sol. Energy 210, 68–86 (2020).
Zhi, Y., Sun, T. & Yang, X. A physical model with meteorological forecasting for hourly rooftop photovoltaic power prediction. J. Build. Eng. 75, 106997 (2023).
János, M. M., Gyula, G. Extensive comparison of physical models for photovoltaic power forecasting. Appl. Energy (prepublish)116239- (2020).
Makridakis, S. & Hibon, M. ARMA models and the Box–Jenkins methodology. J. Forecast. 16(3), 147–163 (1997).
Fara, L., Diaconu, A., Craciunescu, D. & Fara, S. Forecasting of energy production for photovoltaic systems based on ARIMA and ANN advanced models. Int. J. Photoenergy. 2021(1), 6777488 (2021).
Binbin, Z., Ying, W. & Bin, W. Photovoltaic power prediction in distribution network based on ARIMA model time series. Renew. Energy Resour. 37 (06), 820–823 (2019).
Pieri, E. et al. Forecasting degradation rates of different photovoltaic systems using robust principal component analysis and ARIMA. IET Renew. Power Gener. 11(10), 1245–1252 (2017).
Bouzerdoum, M., Mellit, A. & Pavan, A. M. A hybrid model (SARIMA–SVM) for short-term power forecasting of a small-scale grid-connected photovoltaic plant. Sol. Energy 98, 226–235 (2013).
Mouatasim, A. E., Darmane, Y. Regression analysis of a photovoltaic (PV) system in FPO. In AIP Conference Proceedings, Vol. 2056, no. 1 (AIP Publishing, 2018).
Zhang, Y., Li, G. & Li, X. Short-term forecasting method for regional photovoltaic power based on typical representative power stations and improved SVM. Electr. Power Autom. Eequipment. 41 (11), 205–210 (2021).
Ding, M., Wang, L. & Bi, R. An ANN-based approach for forecasting the power output of photovoltaic system. Procedia Environ. Sci. 11, 1308–1315 (2011).
Shi, S. et al. Short time solar power forecasting using P-ELM approach. Sci. Rep. 14(1), 30999 (2024).
Liu, X., Wang, Y. & Ji, Z. Short-term wind power prediction method based on random forest. J. Syst. Simul. 33(11), 2606 (2021).
Raza, Q. M., Mithulananthan, N. & Summerfield, A. Solar output power forecast using an ensemble framework with neural predictors and Bayesian adaptive combination. Sol. Energy 166, 226–241 (2018).
Liu, L., Zhan, M. & Bai, Y. A recursive ensemble model for forecasting the power output of photovoltaic systems. Sol. Energy 189, 291–298 (2019).
Konstantinou, M., Peratikou, S. & Charalambides, A. G. Solar photovoltaic forecasting of power output using lstm networks. Atmosphere 12(1), 124 (2021).
Sun, Y., Venugopal, V. & Brandt, R. A. Short-term solar power forecast with deep learning: Exploring optimal input and output configuration. Sol. Energy 188, 730–741 (2019).
Zhang, Z. et al. Solar-mixer: An efficient end-to-end model for long-sequence photovoltaic power generation time series forecasting. IEEE Trans. Sustain. Energy 14(4), 1979–1991 (2023).
Theocharides, S. et al. Day-ahead photovoltaic power production forecasting methodology based on machine learning and statistical post-processing. Appl. Energy 268, 115023 (2020).
Chai, S., Xu, Z., Jia, Y., et al. A robust spatiotemporal forecasting framework for photovoltaic generation. IEEE Trans. Smart Grid 99(1) (2020).
Li, Y. et al. A TCN-based hybrid forecasting framework for hours-ahead utility-scale PV forecasting. IEEE Trans. Smart Grid 14(5), 4073–4085 (2023).
Niu, Y. et al. De-trend first, attend next: A mid-term PV forecasting system with attention mechanism and encoder–decoder structure. Appl. Energy 353, 122169 (2024).
Niu, Y. et al. Amplify seasonality, prioritize meteorological: Strengthening seasonal correlation in photovoltaic forecasting with dual-layer hierarchical attention. Appl. Energy 394, 126104 (2025).
Cao, Y. et al. Stacking algorithm based framework with strong generalization performance for ultra-short-term photovoltaic power forecasting. Energy 322, 135599 (2025).
Ni, Q. et al. An ensemble prediction intervals approach for short-term PV power forecasting. Sol. Energy 155, 1072–1083 (2017).
Zhao, K. et al. Probabilistic forecasting for photovoltaic power based on improved Bayesian neural network. Power Syst. Technol. 43(12), 4377–4386 (2019).
Han, Y. et al. A PV power interval forecasting based on seasonal model and nonparametric estimation algorithm. Sol. Energy 184, 515–526 (2019).
Wen, Y. et al. Performance evaluation of probabilistic methods based on bootstrap and quantile regression to quantify PV power point forecast uncertainty. IEEE Trans. Neural Netw. Learn. Syst. 31(4), 1134–1144 (2020).
Yang, H., Yang, M. & Xin, Su. Intraday photovoltaic output interval prediction method considering the spatiotemporal-conditional dependence of prediction error. South. Power Syst. Technol. 17(02), 128–136 (2023).
Bruninx, K. et al. A statistical description of the error on wind power forecasts for probabilistic reserve sizing. IEEE Trans. Sustain. Energy 5(3), 995–1002 (2014).
Khosravi, A. et al. Prediction intervals for short-term wind farm power generation forecasts. IEEE Trans. Sustain. Energy 4(3), 602–610 (2013).
Wang, C. et al. Ultra-short-term power output forecasting of distributed photovoltaic based on error classification. South. Power Syst. Technol. 9(04), 41–46 (2015).
Zhao, J. et al. A method of probabilistic distribution estimation of conditional forecast error for photovoltaic power generation. Autom. Electr. Power Syst. 39(16), 8–15 (2015).
Liu, R. et al. A short-term probabilistic photovoltaic power prediction method based on feature selection and improved LSTM neural network. Electr. Power Syst. Res. 210, 108069 (2022).
Wei, J. et al. Ultra-short-term forecasting of wind power based on multi-task learning and LSTM. Int. J. Electr. Power Energy Syst. 149, 109073 (2023).
Reshef, N. D. et al. Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011).
Wang, F. et al. Wavelet decomposition and convolutional LSTM networks based improved deep learning model for solar irradiance forecasting. Appl. Sci. 8 (8), 1286 (2018).
Xiang, J. & Yin, K. Short-term bus load forecasting model based on KICEEMDAN and IWOA optimized bidirectional long- and short-term memory network. In 2023 IEEE International Conference on Image Processing and Computer Applications (ICIPCA), Changchun, China 458–466 (2023).
Lin, Z., Cheng, L. & Huang, G. Electricity consumption prediction based on LSTM with attention mechanism. IEEJ Trans. Electr. Electron. Eng. 15(4), 556–562 (2020).
Paparrizos, J. & Gravano, L. k-Shape: efficient and accurate clustering of time series. SIGMOD Record 45(1), 69–76 (2016).
Zhang, K. et al. Medium-and Long-term industry load forecasting method considering multi-dimensional temporal features. Autom. Electr. Power Syst. 47(20), 104–114 (2023).
Snoek, J., Larochelle, H. & Adams, R. P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Proc. Syst. 25, 2951–2959 (2012).
Rui, L., Ming, W., Wan-Xing, S., et al. Prediction of distributed photovoltaic power based on CNN-LSTM-attention fusion model. In 2024 International Symposium on Electrical, Electronics and Information Engineering (ISEEIE) 541–545 (IEEE, 2024).
Anu Shalini, T. & Sri, R. B. Power generation forecasting using deep learning CNN-based BILSTM technique for renewable energy systems. J. Intell. Fuzzy Syst. 43(6), 8247–8262 (2022).
Naz, A. et al. Electricity consumption forecasting using gated-fcn with ensemble strategy. IEEE Access 9, 131365–131381 (2021).
Li, Y. et al. Short-term PV power prediction based on meteorological similarity days and SSA-BiLSTM. Syst. Soft Comput. 6, 200084 (2024).
Liu, W. & Mao, Z. Short-term photovoltaic power forecasting with feature extraction and attention mechanisms. Renew. Energy 226, 120437 (2024).
Shi, M., Yin, R., Wang, Y., et al. Photovoltaic power interval forecasting method based on kernel density estimation algorithm. In IOP Conference Series: Earth and Environmental Science, Vol. 615, no. 1 012062 (IOP Publishing, 2020).
Herrera-Casanova, R., Conde, A. & Santos-Pérez, C. Hour-ahead photovoltaic power prediction combining BiLSTM and Bayesian optimization algorithm, with bootstrap resampling for interval predictions. Sensors 24(3), 882 (2024).
Author information
Authors and Affiliations
Contributions
Y. C.: Project administration, Investigation, Supervision. X. W.: Software, Visualization, Formal analysis, Writing—original draft. R. H.: Supervision, Investigation, Formal analysis, Writing—original draft, Writing—review & editing. G. Y.: Writing—review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, Y., Wang, X., Huang, R. et al. Photovoltaic power interval prediction with conditional error dependency using Bayesian optimized deep learning. Sci Rep 15, 43887 (2025). https://doi.org/10.1038/s41598-025-19602-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-19602-6









