Introduction

Clean energy targets and falling technology costs are driving a rapid increase in the share of wind and solar generation, also known as Variable Renewable Energy (VRE), in electricity systems1,2. At the same time, electricity demand is growing due to air conditioning adoption to mitigate the increase in average temperatures3, growing use of electric vehicles4, and increasing deployment of data centers5. Accurately forecasting demand and supply at hourly and sub-hourly resolution is essential for power system operators to commit adequate generation, storage, and demand resources a day ahead of the actual dispatch in order to maintain reliability of the electricity system. However, increasing electricity demand and weather-dependent energy sources, as wind and solar, add variability and uncertainty to both demand (Fig. 1a–c) and supply (Fig. 1d–h), increasing the challenges for power system operators to forecast these resources. These challenges are even more intensified under the effects of climate change and extreme weather conditions6.

Fig. 1: Raising day-ahead forecast errors from electricity demand and renewable generation increases operating requirements and area control errors.
Fig. 1: Raising day-ahead forecast errors from electricity demand and renewable generation increases operating requirements and area control errors.The alternative text for this image may have been generated using AI.
Full size image

Electricity demand variability experienced by California’s three main Investor-owned Utilities, Pacific Gas and Electric (PG& E) (a), Southern California Edison (SCE) (b), and San Diego Gas and Electric (SDG& E) (c). Solar generation variability in trading zone NP15 (d), SP15 (e), and ZP26 (f). Wind generation variability in the trading zone NP15 (g) and SP15 (h). The gray regions represent the spread of the 95% Confidence Interval (CI), and the lines are the average hourly demand or generation in 2020. i The map show the approximate boundaries of the zones managed by CAISO (adapted from oasis.caiso.com). j Areas served by three major utilities (PG& E, SCE, and SDG& E) in CAISO's electricity market. k Hourly errors in the CAISO's day-ahead forecast for wind and solar generation and electricity demand, and net demand. l Imports and exports from CAISO in the energy imbalance market. (m) Average annual ancillary services capacity requirements for regulation up and down, spinning, and non-spinning reserves in CAISO. Regulation down and non-spinning reserves had a major increase, while spinning reserves were the only product that decreased in 2023.

In the United States, Independent System Operators (ISOs) or Regional Transmission Organizations (RTOs) operate wholesale electricity markets, manage the power grid, and serve about 70% of electricity demand, the rest of which is served by vertically integrated utilities7. The main goal of the power system operators is to balance electricity demand and supply to ensure the reliability of the electricity system. Through the day-ahead wholesale electricity market, the system operators schedule the least-cost generators, subject to power flow constraints, based on day-ahead forecasts of electricity demand and generator (and storage) availability8. To account for day-ahead forecast errors, the system operators run a real-time market an hour or two before real-time dispatch. In the case of CAISO, the real-time market has evolved into an Energy Imbalance Market (EIM) that spans across several regions in the Western United States, which was designed to reduce costs by allowing access to many more generation resources beyond California. To address the remaining differences between committed resources in the real-time market and actual demand, as well as for contingency events such as generator or transmission outages, the power system operator also procures operating reserves in the day-ahead market. Operating reserves are additional capacity purchased from committed generators to compensate for forecasting errors9 and ensure reliability during contingency events10. As the share of weather-dependent generation has grown, aggregated system-level forecast errors have also grown (see CAISO day-ahead forecast errors of wind and solar generation, electricity demand, and net demand or the difference between demand and wind and solar generation in Fig. 1k). Larger forecast errors lead to significant changes in the generation, storage, and transmission scheduling close to real-time dispatch11, which has increased the electricity exchange in the energy imbalance market12 (see imports and exports in CAISO’s Energy Imbalance Market in Fig. 1l), and the operating reserve requirements necessary to ensure a reliable electricity supply13,14 (see CAISO’s reserves or Ancillary Services requirements in Fig. 1m), both of which have increased operating costs15.

The additional costs associated with operating reserves (Supplementary Fig. 1), procured in the day-ahead ancillary services markets, are ultimately borne by electricity ratepayers16. The generators that respond the fastest to provide upward operating reserves are often natural gas combustion turbines, which are expensive and have higher greenhouse gas and criteria emissions compared to other technologies17. Storage, including battery technologies that also provide operating reserves, may also have greenhouse gas and criteria emissions associated with charging energy and energy losses, depending on the grid conditions during charging. Failure to compensate for forecast errors through scheduling changes and operating reserves leads to area control errors, variations in system frequency, and blackouts in severe cases6. Therefore, reducing the errors of the electricity supply and demand forecast is critical to minimize costs18, emissions, and reliability issues19.

Addressing these challenges introduced by the uncertainty in the electricity demand and the VRE generation requires characterizing the relation between weather and energy accurately20, identifying the information sources from Numerical Weather Forecasts (NWFs) that improve day-ahead energy forecasts21. Furthermore, probabilistic forecasts could enable us to determine operating reserve levels dynamically based on the uncertainty in the prediction, which may reduce the requirements for and costs of operating reserves.

Previous research focused on Machine Learning (ML) to improve electricity demand and VRE generation forecasts from NWFs instead of using physical models22. NWFs have numerous weather features, so identifying the most informative variables is essential to reduce collinearity23. The discovery of patterns leads to an increase in the effectiveness of a forecast24. Deep learning methods learn patterns from high-dimensional data with spatial structures and efficiently deal with collinearity, but they require substantial amounts of data25,26,27,28. With fewer observations, structured sparsity regularization methods efficiently uncover spatial patterns and reduce the dimensionality of input features. More recently, deep learning methods based on Temporal Fusion Transformers29,30, Informers31, and TimesNets32 were used in energy forecasts. However, the proposed methodologies generally forecast a single energy feature (demand, solar, or wind) and do not provide a predictive multivariate density function to draw predictive scenarios that preserve the time structure in risk assessment applications.

Probabilistic day-ahead energy forecasts at the asset level based on pattern similarity improved on Bayesian forecasts33 by adding time correlation between intervals to generate predictive scenarios34,35. Both studies assume a uniform relation between the input weather features to forecast a single resource, but they do not consider collinearity reduction and the joint nature of weather-dependent resources. System-level forecasts are less researched despite their role in determining operating reserve requirements. In addition, asset-level demand and forecasts28,36,37 generally utilized the open-source NWFs provided by the European Centre for Medium-Range Weather Forecast (ECMWF), which has a 9 × 9 km spatial resolution38, or historical data. Yet, the High-Resolution Rapid Refresh (HRRR) NWFs is also publicly available and provide continental-scale NWFs with a temporal resolution of 1 h and 3 × 3 km spatial resolution39,40, superior to global-scale NWFs41,42.

In this study, we show that joint probabilistic day-ahead forecasts of electricity demand and wind and solar generation improve system-level uncertainty characterization using data from the HRRR NWF. We develop a probabilistic ML framework43 that combines sparse feature selection44 with multi-task Gaussian Process (GP) regression45,46 to jointly model electricity demand, solar generation, and wind generation at the system level. The approach produces full predictive density functions and time-consistent predictive scenarios, enabling a probabilistic assessment of forecast uncertainty and operating reserve requirements. We apply the proposed methodology encompassing processing, modeling, and model selection (Supplementary Fig. 2), to the electricity system operated by California Independent System Operator (CAISO), producing hourly day-ahead forecasts for three nodal regions (Fig. 1i)—Northern (NP15), Southern (SP15), and Central (ZP26) California—and major load-serving utilities (Fig. 1j)—Pacific Gas & Electric (PG&E), Southern California Edison (SCE), and San Diego Gas & Electric (SDG&E). Using multivariate proper scoring rules47, we evaluate multiple combinations of sparse and Bayesian learning methods (4 sparse × 4 Bayesian, plus 4 joint models at the regional and nodal level) and demonstrate improved forecast calibration and skill relative to existing approaches. These results highlight the value of joint probabilistic forecasting for improving operational planning, reserve allocation, and reliability in electricity systems with high shares of variable renewable energy.

Results

AI-based probabilistic models enhance the performance of a day-ahead energy forecast

Deterministic and probabilistic forecasts differ fundamentally. Probabilistic forecasts, specifically Bayesian forecasts, predict a density function, whereas deterministic forecasts predict point estimates. However, the predictive mean of probabilistic forecasts could be compared with the point estimates of deterministic forecasts. In this study, we compare the results of the proposed probabilistic forecasts with three different deterministic forecasts—the persistence (naive), the climatology (autoregressive), and reference forecasts (CAISO), which is a standard practice in the day-ahead forecasting literature48.

In the first step of our analysis, we identify the reference forecast, which is the baseline forecast with the lowest Root Mean Squared Error (RMSE), to compare with the proposed forecast (see the forecast’s operational characteristics in Fig. 2i). We define the day-ahead forecast for each energy feature \(\widehat{y}\)—electricity demand \(({\mathcal{L}})\), solar \(({\mathcal{S}})\) and wind (\({\mathcal{W}}\))—at a node z as \({\widehat{y}}_{t,z}\), where t is the hour of the day (1, …, 24) and z corresponds to each node (NP15, SP15, ZP26). The system-wide forecast can then be estimated by aggregating the nodal forecasts as \({\widehat{y}}_{t}={\sum }_{z=1}^{Z}{\widehat{y}}_{t,z}\). Similarly, we estimate day-ahead forecasts for net demand (\({\mathcal{N}}\)) as \({\widehat{y}}_{t,z}^{{\mathcal{N}}}={\widehat{y}}_{t,z}^{{\mathcal{L}}}-{\widehat{y}}_{t,z}^{{\mathcal{S}}}-{\widehat{y}}_{t,z}^{{\mathcal{W}}}\) at the nodal level (NP15, SP15, and ZP26), and \({\widehat{y}}_{t}^{{\mathcal{N}}}={\sum }_{z=1}^{Z}{\widehat{y}}_{t,z}^{{\mathcal{L}}}-{\widehat{y}}_{t,z}^{{\mathcal{S}}}-{\widehat{y}}_{t,z}^{{\mathcal{W}}}\) at the system level. We then evaluate RMSE over all hours (Supplementary Note 1) and normalize by the mean target value \(\bar{y}=\frac{1}{KT}{\sum }_{k,t}{y}_{k,t}\), k is the day and K is the number of days in the testing set. CAISO forecast has the lowest Normalized RMSE (NRMSE) for electricity demand (4.3%), solar (23.2%), and wind generation (16.3%); see Fig. 2a. Similarly, CAISO day-ahead forecasts for net demand have the lowest NRMSE at NP15 (6.6%), SP15 (13.1%), and ZP26 (15.1%); and system level (8.6%); see Fig. 2b. Note that ZP16 does not have wind resources.

Fig. 2: Performance comparisons between day-ahead probabilistic and deterministic baseline forecasting models.
Fig. 2: Performance comparisons between day-ahead probabilistic and deterministic baseline forecasting models.The alternative text for this image may have been generated using AI.
Full size image

Normalized Root Mean Squared Error (RMSE) achieved by the baseline models (NRMSE lower is better). The baselines are the persistence, climatology, and CAISO forecasts. a System-level day-ahead forecast NRMSE for each energy feature (electricity demand, solar, and wind generation), and b net demand forecast NRMSE at each node (NP15, SP16, ZP26) and at the system level. Forecast Skill Scores (SS) for (b) independent and (e) joint day-ahead forecasts for each energy feature aggregated across nodes (SS higher is better), and net demand (e) independent and (f) joint probabilistic forecast SS at nodal and system levels ( marks the model with highest SS). The colors vary by sparse learning methods: Lasso, Orthogonal Matching Pursuit (OMP), Elastic Net (EN), and Group Lasso (GL). The markers denote the Bayesian methods: Bayesian Linear Regression (BLR), RVM (Relevance Vector Machine), GPR (Gaussian Process for Regression), and System-Level GPR (SLGPR). Empirical Cumulative Distribution Function (eEDF) of the absolute percentage residual value (g) and the residuals (f) for the day-ahead system-level net demand forecast from the persistence, the climatology, CAISO, and the best node-level (NL) ensemble and the best system-level (SL) ensemble. i Proposed day-ahead forecast for day d submitted in d − 1 with lag ( = 8 h) and horizon (H = 24 hours). NWF (HRRR) is the Numerical Weather Forecast High-Resolution Rapid Refresh model.

In the second step, we compare our proposed day-ahead forecast to the CAISO forecast (reference). The Skill Score (SS) assesses improvements in the RMSE (Supplementary Note 1), resulting in a different SSRMSE for each combination of sparse and Bayesian methods (Fig. 2a, b). Our proposed sparse methods, Lasso, Orthogonal Matching Pursuit (OMP), Elastic Net (EN), and Group Lasso (GL) have different formulations (see Section Sparse Learning). The objective is to identify the most effective regularization, which forces the model to discover simpler patterns in the input feature vectors. The feature vectors for electricity demand (\({{\bf{x}}}_{i}^{{\mathcal{L}}}\)), solar (\({{\bf{x}}}_{i}^{{\mathcal{S}}}\)) and wind generation (\({{\bf{x}}}_{i}^{{\mathcal{W}}}\)) are from the reanalysis dataset (\({\mathcal{A}}\)); see in the Feature Vectors for Sparse Learning.

The Bayesian learning methods explore different data assumptions: Bayesian Linear Regression (BLR) assumes linearity, Relevance Vector Machine (RVM) emphasizes sparsity, Gaussian Process Regression (GPR) accounts for non-linearity, and Multi-Task GPR (MTGPR) models joint distributions among response variables. In particular, the System-Level MTGPR (SLGPR) assumes a joint distribution across nodes (NP15, SP16, or ZP26) for the independent energy features (electricity demand, solar or wind generation); see Fig. 2c, d. In addition, the Node-Level MTGPR (NLGPR) assumes a joint distribution across energy features for the independent nodes (Fig. 2e, f). The non-linear properties come from mapping the feature vectors to high-dimensional space with a kernel function (Supplementary Note 2). The formulations include the model chain to preserve the time structure (see Sections Bayesian learning and Model Chain). The graphical representations of the algorithms are in Supplementary Fig. 3. The feature vectors are from the forecasts dataset (\({\mathcal{F}}\)) for electricity demand (\({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{L}}}\)), solar (\({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{S}}}\)), wind generation (\({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{W}}}\)), and joint demand and generation (\({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{E}}}\)); see in the Pattern Vectors for Bayesian Learning. These feature vectors differ from those used in the sparse learning step. The hyperparameters, which control the different aspects of the learning process in the sparse and Bayesian learning methods, are cross-validated jointly (see Section Experimental Setup).

The proposed forecasts improve over the CAISO forecasts for each energy feature and net demand at the nodal and system levels (see results in Supplementary Tables 1 and 2). The models with higher SSRMSE with independent energy features at the system level are Lasso-SLGPR (6.1%) and EN-SLGPR (16.8%) with a linear kernel (\({{\mathcal{K}}}_{L}\)) for electricity demand and solar generation, and EN-SLGPR (5.9%) with Matérn kernel and parameter ν = 2.5 (\({{\mathcal{K}}}_{{M}_{2.5}}\)) for wind generation (Fig. 2c, e). The models with higher SSRMSE with independent nodes are EN-BLR (19.4%) at NP15, OMP-BLR (16.7%) at SP15, and OMP-NLGPR with \({{\mathcal{K}}}_{L}\) (10.1%) at ZP26 (Supplementary Fig. 2d,f). EN-BLR (25.2%) and Lasso-NLGPR with \({{\mathcal{K}}}_{L}\) (24.8%) have the highest SSRMSE at the system level (CAISO) when assessing a model across nodes or energy features, respectively (Fig. 2f).

In the third step, we analyze the day-ahead net demand forecast residuals at the system level. The empirical Cumulative Distribution Function (eCDF) of absolute percentage residuals gives a sense of the probability of large forecasting errors (5833 test samples); see Fig. 2g. Day-ahead forecasts using persistence or climatology, which do not include information from NWF, have large errors. CAISO’s forecast significantly reduces the probability of errors greater than 25% in the net demand (0.1), while our SLGPR and NLGPR models reduce it even further (0.05). Furthermore, errors over 50% of net demand from the SLGPR and NLGPR are almost negligible when the CAISO forecast has a similar probability to the persistence and climatology (0.05). The residual eEDF indicates a potential bias in the CAISO forecast to overestimate net demand (Fig. 2h). When calculating the absolute residual statistics for each day hour (243 test samples), SLGPR and NLGPR have a low median at all hours, and the CAISO forecast has the lowest median from 6 pm to 9 pm (Supplementary Fig. 4). The absolute residual statistics calculated for the different net demand percentiles (5832 samples) and CAISO have the highest median in the 1st percentile and the lowest in the 10th percentile (Supplementary Fig. 5).

Model selection of a probabilistic day-ahead energy forecasts with independent energy features

A joint forecast is often more desirable because it captures dependencies between response variables. Multivariate proper scoring rules assess the advantage of a joint distribution between nodes (Supplementary Note 1). The proposed scoring rules are the Energy Score (ES), Variogram Score with p = 0.5 (VS0.5), and Interval Score (IS). Each score evaluates a different property of a probabilistic forecast, so assessing multiple scores is necessary to select a balanced model47.

The ES assesses how well the predictive scenarios \({\widehat{{\bf{y}}}}_{t}^{\star }=[{y}_{t,1}^{\star }\cdots {y}_{t,Z}^{\star }]\) from the predictive distribution \({\widehat{{\bf{y}}}}_{t}^{\star } \sim {\mathcal{N}}({\widehat{{\boldsymbol{\mu }}}}_{t}^{\star },{\widehat{{\mathbf{\Sigma }}}}_{t}^{\star }+\,{\rm{diag}}\,({{\boldsymbol{\sigma }}}_{{n}_{t}}^{2}))\) represent a predictive distribution. VS0.5 evaluates how well the scenarios preserve the time structure. IS evaluates the intervals derived from the predictive density \({\mathcal{N}}({\widehat{\mu }}_{t,z}^{\star },{\widehat{\sigma }}_{t,z}^{2\star })\) at different confidence levels (60%, 80%, 90%, 95%, and 97.5%). IS is not a multivariate score, so it is evaluated aggregated across nodes \({\widehat{y}}_{t}^{\star }={\sum }_{z}{\widehat{y}}_{t,z}^{\star }\), and only with variances \({\widehat{\sigma }}_{t,z}^{2\star }=\,{\rm{diag}}\,({\widehat{{\mathbf{\Sigma }}}}_{t}^{\star })+{\sigma }_{{n}_{t,z}}^{2}\) obtained from the predictive covariance \({\widehat{{\mathbf{\Sigma }}}}_{t}^{\star }+\,{\rm{diag}}\,({{\boldsymbol{\sigma }}}_{{n}_{t}}^{2})\).

The most suitable model for a probabilistic day-ahead electricity demand forecast is EN-BLR (see results in Supplementary Table 3). This model has the lowest ES (39.3) and IS (8,087) in the test (Fig. 3a and g), but Lasso-BLR has the lowest VS0.5 (2380); see Fig. 3d. We find that not all combinations produce an improvement in SSRMSE when expanding Fig. 2b to include all model combinations (Fig. 3j), but both Lasso-BLR (2.4%) and EN-BLR (4.2%) do. The computation time required to train both models is under 100 s, but EN-BLR generates 100 predictive scenarios faster (less than 1 s); see Fig. 3m. OMP-SLGPR with \({{\mathcal{K}}}_{L}\) achieved similar results with a much higher computational time of around 2000 s (train) and 10 s (test). EN has difficulties identifying the weather features correlated with an electricity demand but selected the discomfort index (Supplementary Note 3) and assigned higher weights around the Bay Area and the Central Valley for NP15 and around the greater Los Angeles for SP16 (Supplementary Fig. 6c–e).

Fig. 3: Model selection based on multivariate scoring rules.
Fig. 3: Model selection based on multivariate scoring rules.The alternative text for this image may have been generated using AI.
Full size image

Proper scoring rules are applied to evaluate the different day-ahead probability forecasting models of electricity demand (a, d, g), solar generation (b, e, h), and wind generation (c, f, i). The marker color represents Lasso, Orthogonal Matching Pursuit (OMP), Elastic Net (EN), and Group Lasso (GL). The marker shape denotes Bayesian Linear Regression (BLR), Relevance Vector Machine (RVM), Gaussian Process for Regression (GPR), and System-Level GPR (SLGPR)---Multi-Task GPR forecasts a single energy feature but jointly across multiple nodes. The scoring rules are Energy Score (ES) for the electricity demand (a), solar generation (b), and wind generation (c); Variogram Score (VS0.5) for the electricity demand (d), solar generation (e), and wind generation (f); and Interval Score (IS) for the electricity demand (g), solar generation (h), and wind generation (i). Lower ES, VS0.5, and IS is preferable. The enlarged points in (ai) have the lowest values, identifying the best performing model based on that scoring rule. jl The Skill Score RMSE-based (SSRMSE) compares the proposed forecast with the forecast provided by CAISO (lower is preferable). Training and testing computing time (lower time is preferable) for the proposed day-ahead forecasting models for the electricity demand (m), solar (n), and wind generation (o).

Following the same model selection procedure, GL-GPR with \({{\mathcal{K}}}_{L}\) is the most suitable model for a probabilistic day-ahead solar generation forecast. GL-GPR with \({{\mathcal{K}}}_{L}\) has the lowest ES (32.2) and the highest SSRMSE (13.7%) in Fig. 3b, k Lasso-SLGPR with \({{\mathcal{K}}}_{L}\) has the lowest VS0.5 (2,118), and OMP-GPR with \({{\mathcal{K}}}_{L}\) kernel has the lowest IS (4628); see Fig. 3e, h. OMP models consistently have low computation costs (Fig. 3n), and SLGPR models have very low VS0.5 but very high IS (Fig. 3e, h). BLR and GPR models achieved similar VS0.5, but the OMP-GPR with \({{\mathcal{K}}}_{L}\) had the lowest IS and similar ES (32.6) at lower computation costs (less than 1000 s in training), and it achieved similar SSRMSE (12.5%) than GL-GPR (12.8%). GL selects features across California (Supplementary Fig. 6f–h), but features are consistent and have more weight in regions with installed solar capacity.

The model selected for a probabilistic day-ahead wind generation forecast is EN-SLGPR with a Matérn kernel and parameter ν = 1.5 (\({{\mathcal{K}}}_{{M}_{1.5}}\)). EN-SLGPR with \({{\mathcal{K}}}_{{M}_{1.5}}\) has the lowest ES (30.7) and the highest SSRMSE (5.9%); see Fig. 3c, l. GL-GPR with rational quadratic kernel (\({{\mathcal{K}}}_{RQ}\)) has the lowest VS0.5 (2135) and a SSRMSE of 4.5%; most all GPR and SLGPR models achieved similar results (Fig. 3f and l). GL-GPR also has the lowest IS (4,716), BLR and ARM achieved a similar score (Fig. 3i), but their SSRMSE is negative for most models (Fig. 3l). EN-SLGPR and GL-GPR have similar computing performances in the test ( < 10 s), but their training times are different (around 3000 s and 6000 s), which makes EN-SLGPR with \({{\mathcal{K}}}_{{M}_{1.5}}\) the most suitable model (Fig. 3o). EN selected features from two regions with high installed wind capacity (Supplementary Fig. 6i, j).

Predictive density function, intervals, and scenario generation

To assess the performance of our forecast under stress events, we evaluated our proposed forecast on Sep 2022 (Supplementary Fig. 7). Sep 2022 registered a record high mean temperature in western North America49, producing several consecutive days of high demand and peak demand50. We show the forecasts of the proposed probabilistic day-ahead forecast in southern California (SDG&E and SP15) in Fig. 4, and northern and central California (PG&E, SCE, NP15, and ZP26) in Fig. 4.

Fig. 4: Independent electricity demand, solar and wind generation day-ahead probabilistic forecasts.
Fig. 4: Independent electricity demand, solar and wind generation day-ahead probabilistic forecasts.The alternative text for this image may have been generated using AI.
Full size image

Probabilistic day-ahead SDG& E electricity demand forecast during the high peak demand event on Sep 6, 2022, predictive intervals (a), predictive scenarios (b), and density function detail at 17:00 (c). Probabilistic day-ahead solar generation forecast at SP15 in Sep 6, 2022, predictive intervals (d), predictive scenarios (e), and density function detail at 17:00 (f). Probabilistic day-ahead wind generation forecast at SP15 on Sep 6, 2022, predictive intervals (d), predictive scenarios (e), and density function detail at 17:00 (f). a, d, g Probabilistic forecast predictive mean μ compared to the actual y, and the baselines \({\widehat{{\bf{y}}}}^{\star }\) (persistence, climatology, and CAISO). The represented predictive intervals (60%, 80%, 90%, 95%, and 97.5%) have a color gradient from dark (60%) to light gray (97.5%). c, f, i dashed lines mark detailed hours in (b, e, h). The color gradient in the predictive scenarios represents the probability \(\,\Pr ({\mathop{y}\limits^{ \sim }}_{s}^{\star }| {\mathcal{D}},\widehat{\Theta },{\widehat{{\bf{X}}}}^{\star })\)ftive scenarios represents the probabilit of the sth scenario (\({\widehat{y}}_{s}^{\star }\)) of electricity demand (\({\mathcal{L}}\)), solar (\({\mathcal{S}}\)) or wind (\({\mathcal{W}}\)) generation (darker means higher probability).

The probabilistic day-ahead electricity demand forecast, EN-BLR, estimated SDG&E peak demand in a high peak event on Sep 6, 2022 (Fig. 4a) with less than 5% error. The persistence forecast overestimated demand during off-peak hours and underestimated demand during peak hours (from 4 pm to 9 pm), while the climatology forecast produced a result similar to the proposed forecast. The CAISO forecast had high-magnitude errors (16%) in the morning but accurately predicted peak demand and time (9 pm). Similarly, the CAISO forecast matches the predictive mean from our forecast but is more accurate at the peak demand hour for the PG&E and SCE (Fig. 4a, d). Our solar generation forecast (GL-GPR with \({{\mathcal{K}}}_{L}\)) at SP15 had a lower error than the persistence and CAISO forecast during off-peak hours when the CAISO forecast overestimated the generation and the persistence underestimated it by 10% (Fig. 4d). In contrast, the climatology forecast produced a result similar to the proposed forecast mean. The persistence, climatology, and CAISO forecast overestimate solar generation by 10%, every hour, at NP15 and ZP16 (Supplementary Fig. 8g, j). The wind generation forecast (EN-SLGPR with \({{\mathcal{K}}}_{{M}_{1.5}}\)) had low-magnitude errors (200 MW) until the evening when the errors reached 600 MW (55%). The persistence forecast produced low errors when the wind generation was negligible (morning and afternoon). The climatology forecast had high errors in the morning and evening. The CAISO forecast was similar to the mean of our forecast. Our forecast at NP15 failed by 50% during peak demand (5 pm) and by 17% during the evening (from 7 pm to 11 pm); see Supplementary Fig. 8m. Similarly, the persistence forecast had low errors, the CAISO forecast had large errors in the morning and evening (50%), and the climatology forecast had large errors.

The probabilistic day-ahead electricity demand forecast (EN-BLR) predicted the peak demand within the 80% interval in the same high peak event for PG&E, SCE, SDG&E (Supplementary Fig. 8a, d, and Fig. 4a). The actuals were outside the 97.5% interval at 10 pm and 11 pm for SDG&E, during off-peak hours for PG&E, and at 9 am and 10 am for SCE. The electricity demand scenarios generated from the predictive density functions capture the time structure (Fig. 4b) and correctly represent the density function at the time of the peak for SDG&E and SCE (Fig. 4c). The actual exceeded the most extreme scenario at night and off-peak hours for PG&E when the forecast error was high (17%); see (Supplementary Fig. 8b).

The uncertainty of the probabilistic day-ahead solar generation forecast (GL-GPR with \({{\mathcal{K}}}_{L}\)) at the NP15 (Fig. 4e), SP15 (Supplementary Fig. 8g), and ZP26 (Supplementary Fig. 8j) did not change significantly during off-peak hours despite providing an accurate forecast. In contrast, the forecast does not correctly predict the actual at NP15 from 2 pm to 4 pm but is within the 90% −97.5% interval. The distribution of predictive scenarios approximates the density function at SP15 (Fig. 4e and f) and ZP26 (Supplementary Fig. 8k), and the most extreme scenarios include the actual, even at the time of highest magnitude error (4 pm) at the NP15 (Supplementary Fig. 8h).

The probabilistic day-ahead wind generation forecast (EN-SLGPR with \({{\mathcal{K}}}_{{M}_{1.5}}\)) fails from 9 pm with high errors (500 MW), though the actual fell within the 90% interval. The predictive intervals are adaptable but are still wide when the model produces accurate forecasts during the morning (Fig. 4g). The predictive scenarios do not represent the entire range of the density function at the peak demand, and no scenario covers the actual generation at 10 pm (Fig. 4h, i). In contrast, the scenarios in our forecast at NP15 correctly represent the density function, and the scenarios enveloped the actuals (Supplementary Fig. 8m).

Joint probabilistic day-ahead energy forecast

Joint forecasts can reduce uncertainty and assist ISOs in operating power grids more efficiently by characterizing the dependencies between variables. A Multi-Task Gaussian Process for Regression (MTGPR) model captures the underlying correlation between multiple response variables to generate a joint forecast. However, estimating the joint predictive covariance between response variables with an MTGPR is computationally challenging. The confidence intervals derived from the predictive covariance may not accurately reflect the true distribution of the forecasting errors—i.e., the probability that actual realizations fall outside the upper and lower bounds of the intervals can exceed the stated confidence level (see Section Predictive Density Calibration). The formulation proposed in ref.45 accurately estimates the joint predictive covariance when the response variables belong to the same energy domain (e.g., electricity demand). However, this limitation persists when the response variables come from different energy domains. To address this, we adopt an approach based on conformal learning51,52. In this approach, the dataset is split into training and calibration sets during cross-validation to properly calibrate the confidence intervals in the joint forecast (see Section Experimental Setup).

We assess the joint forecast performance during an stress event when the CAISO forecast had high-magnitude errors on May 2022. The largest curtailment occurred on May 29, as an unusual storm crossed the western United States53, coincidental with an high CAISO forecast error event (Fig. 5), which produced large VRE curtailments54. Similarly, we evaluate the models with the Energy Score (ES), Variogram Score (VS0.5), and Interval Score (IS). ES and VS0.5 assess distribution and shape of multivariate predictive scenarios \(({\widehat{{\bf{y}}}}_{t,z}^{\star })\). IS measures how many samples are outside at different confidence intervals (60%, 80%, 90%, 95%, and 97.5%) and for how much. The confidence intervals are derived from the predictive net demand distribution \({\mathcal{N}}({\widehat{\mu }}_{t}^{{\mathcal{N}}\star },{\widehat{\sigma }}_{t}^{2{\mathcal{N}}\star })\) at the system level. The predictive mean of net demand is given by \({\widehat{\mu }}_{t,z}^{{\mathcal{N}}\star }={\sum }_{z}{\widehat{\mu }}_{t,z}^{{\mathcal{L}}\star }-{\widehat{\mu }}_{t,z}^{{\mathcal{S}}\star }-{\widehat{\mu }}_{t,z}^{{\mathcal{W}}\star }\) and the variance by \({\widehat{\sigma }}_{t}^{2{\mathcal{N}}\star }={\sum }_{z=1}^{Z}{\widehat{\sigma }}_{t,z}^{2{\mathcal{L}}\star }+{\widehat{\sigma }}_{t,z}^{2{\mathcal{S}}\star }+{\widehat{\sigma }}_{t,z}^{2{\mathcal{W}}\star }\). In the case of a joint forecast (SLGPR or NLGPR), the variances are derived from the predictive covariance \({\widehat{\sigma }}_{t,z}^{2\star }=\,{\rm{diag}}\,({\widehat{{\mathbf{\Sigma }}}}_{t,z}^{\star })+{\sigma }_{{n}_{t,z}}^{2}\) for each energy feature (\({\mathcal{L}}\), \({\mathcal{S}}\), and \({\mathcal{W}}\)).

Fig. 5: Joint day-ahead energy forecast model selection.
Fig. 5: Joint day-ahead energy forecast model selection.The alternative text for this image may have been generated using AI.
Full size image

Scores achieved by the proposed forecasting models when evaluated at the system level: a Energy Score (ES), b Variogram Score (VS0.5), c Interval Score (IS), d Skill Score (SSRMSE) compared to CAISO's forecast, and e computational time. Lower ES, VS0.5, IS, computational time, and higher SSRMSE are preferable. The different colors represent the sparse learning: Lasso, Orthogonal Matching Pursuit (OMP), Elastic Net (EN), and Group Lasso (GL). The shapes represent the Bayesian learning: Bayesian Linear Regression (BLR), Relevance Vector Machine (RVM), Gaussian Process for Regression (GPR), System-Level GPR (SLGPR), and Node-Level GPR (NLGPR). e Lasso-NLGPR and EN-NLGPR have the same computational requirements (hexagons overlap). Lasso-NLGPR forecast during an large CAISO forecast error at NP15 on May 28, 2022. f, i, l Joint forecast predictive mean μ, actual y and the baseline forecasts \({\widehat{{\bf{y}}}}^{\star }\) (persistence, climatology, and CAISO) of PG& E electricity demand (f), solar (i), and wind generation (l) at NP15. The lines are joint scenarios drawn from the predictive density function (the gray color intensity represents their probability). The highlighted scenario is a joint draw of electricity demand (g), solar generation (j), and wind generation (m). Density function details (h, k, n) are of the hours (marked by vertical dashed lines) with the largest error between the predictive mean \({\mu }_{t}^{\star }\) and the actual \({y}_{t}^{\star }\).

Lasso-NLGPR with a linear kernel \({{\mathcal{K}}}_{L}\) in NP15, SP15, and ZP26 (\({{\mathcal{K}}}_{L}\)) had the lowest ES in test (85.3), Lasso-SLGPR with \({{\mathcal{K}}}_{L,L,{M}_{1.5}}\) has the lowest VS0.5 (21,528) and EN-BLR the lowest IS (14,009); see Fig. 5a-c, and Supplementary Table 4. The model with higher SSRMSE is EN-BLR (25.2%), but Lasso-NLGPR has similar score (24.8%); see Fig. 5d. In contrast, SLGPR models have high SSRMSE but low IS compared to BLR and NLGPR. Lasso-SLGPR and OMP-SLGPR require 5 × more training and testing time than EN-BLR (Fig. 5c). However, the testing time is still less than 60 s. The training time increases from 30 min to greater than 2 h, but the proposed model only requires training updates once every half or one year.

The joint predictive electricity demand, solar, and wind scenarios include the actual demand realization (Fig. 5g) but do not include scenarios with the actual solar generation at 12 pm and 2 pm (Fig. 5j), and the actual wind generation at 9 pm and 11 pm (Fig. 5m); which indicates that 100 predictive scenarios are not sufficient to represent the full range of possible outcomes. Since, during the event of the largest errors between the predictive mean and the actual for the electricity demand (7 pm), solar (2 pm), and wind forecasts (8 pm), the actual electricity demand was within the 90% predictive interval (Fig. 5h), the solar generation was outside the 97.5% predictive interval (Fig. 5k) and the wind generation was within the 80% predictive interval (Fig. 5n). The joint predictive scenario (bright green) represents a mid-low electricity demand (Fig. 5g) with mid-low solar generation from 10 am to 2 pm (Fig. 5j) and mid-low electricity demand with mid-high wind generation (Fig. 5j) from 1 am to 10 am and from 8 pm to midnight. These results are for node NP15, but the findings in node SP15 are similar under high-magnitude forecasting errors (Supplementary Fig. 9).

Probabilistic day-ahead energy forecast for reserves allocation

Independent system operators (ISOs), including CAISO, procure ancillary services or operating reserves to address day-ahead electricity demand and wind and solar generation forecasting errors12,55, as well as to prepare for contingency events such as loss of a generator or transmission line56; see the estimation of different ancillary services in Section Energy Imbalance. As the growing share of VRE generation adds to the uncertainty introduced by electricity demand, a net demand forecast has become more informative in quantifying reserves requirements9. When positive net demand forecast errors (generation overestimation or demand underestimation) exceed upward reserves57, ISOs import energy from neighboring interconnected regions or shed demand in severe cases. When negative net demand forecast errors (generation underestimation or demand overestimation) exceed downward reserves, ISOs export electricity to neighboring regions or curtail VRE generation. In California, the CAISO imports from and exports to the Western Energy Imbalance Market12.

To reduce the risk of forecasting errors, we use the confidence intervals for the net demand forecast that adapt depending on the similarity of the input features to past patterns22. This section’s experiment validates the hypothesis that using confidence intervals to determine operating reserves will decrease imbalance market trades. The proposed methodology utilizes the predictive density function from the day-ahead net demand forecast to find the confidence interval in which the total aggregated capacity equals the total aggregated capacity allocated following CAISO’s methodology as in the Section Energy Imbalance. The area between the selected confidence interval’s lower and upper bounds indicates the reserve requirements. This methodology does not determine the aggregated reserves requirement but simply redistributes the aggregated reserves determined by CAISO across time. As such, this methodology is applied only to illustrate the value of using confidence intervals from probabilistic forecasts to determine operating reserves.

The idea is that a day-ahead net demand forecasting model with accurate confidence intervals will allocate the reserves following the proposed methodology more efficiently than CAISO’s methodology. Therefore, lower IS in validation is our model selection criterion, since IS is a proper scoring rule that rewards calibrated confidence intervals. We calculate the IS for confidence intervals of 60%, 80%, 90%, 95%, and 97.5% and aggregate to select a model for the application to determine operational reserves. The probabilistic day-ahead net demand forecast with the lowest IS at the system level in the test set is OMP-NLGPR with \({{\mathcal{K}}}_{L}\) (Fig. 6a).

Fig. 6: Day-Ahead operational reserves allocation.
Fig. 6: Day-Ahead operational reserves allocation.The alternative text for this image may have been generated using AI.
Full size image

a Interval Score (IS) from the proposed independent and joint probabilistic day-ahead forecasts calculated for the 60%, 80%, 90%, 95%, and 97.5% confidence intervals (lower values are better). The white dots indicated the best model. The independent and joint forecasts are aggregated at the system level. The independent models are Elastic Net and Bayesian Linear Regression (EN-BLR), Orthogonal Matching Pursuit (OMP) and Relevance Vector Machine (OMP-RVM), and OMP and Gaussian Process for Regression (GPR). The joint models are OMP and System-Level GPR (SLGPR), as well as OMP and Node-Level GPR (NLGPR). b Actual net demand (y) and net demand forecasts for OMP-NLGPR (μ) and CAISO on Sep 22, 2022. Net demand forecast reserves' upper bound (ub) and lower bound (lb) shown by lower solid and upper dashed lines and reserves (rs) shown by shaded areas between solid and dashed lines---the reserves' upper and lower bounds are the confidence interval lower and upper bounds for OMP-NLGPR (μ). c Upward + downward reserves allocation for CAISO and OMP-NLGPR (μ) forecasts. d Energy imports (im) (and demand shedding if energy is not available in the regional energy imbalance market) and exports (and wind or solar curtailment, ct, if demand is lacking in the energy imbalance market) resulting from the CAISO and the proposed allocation methodology using OMP-NLGPR (μ) forecast (positive means energy imports required, while negative means energy exports available). e Average imports and energy curtailment obtained from CAISO's forecast and the proposed ensemble of probabilistic day-ahead net energy forecasts across the simulated period from May 2022 to Feb 2023 (test samples). White dots represent the net balance between imports, exports, and energy curtailment.

We illustrate the reserves allocation results for Sep 23, 2022, a day in our sample when both CAISO and our method have imbalances. The reserves allocation implemented with CAISO’s forecast and method allocates a similar capacity across all day hours. However, the capacity slightly increases during the peak hour and decreases during midday and night hours (Fig. 6c). In contrast, our method of estimating operational reserves based on confidence intervals allocates more reserves during daylight hours (Fig. 6c), displacing reserves from the night to daylight hours. Although mid-day hours experience lower net demand, the uncertainty in the day-ahead solar generation forecast is compared to evening peak or non-solar hours, which drives the higher reserve requirement (Supplementary Fig. 7).

For all days in the test set (243), we evaluate whether the net demand is above the maximum upward reserves where the reserves respond to avoid exports (and VRE curtailment) or imports (and demand shedding) in the imbalance market; see equation (27). For the featured day in (Fig. 6b), using CAISO’s reserves allocation methodology, the net demand positive forecast error is greater than the maximum upward reserves available from 8 am to 2 pm, which required energy imports (Fig. 6d). With our proposed reserves allocation methodology, for the same day, the net demand negative forecast error is greater than the maximum downward reserves from 2 am to 6 am, which results in wind generation curtailment (Fig. 6d).

At an aggregate level across all days in test set, the reserve capacity estimated using the CAISO’s forecast and reserves allocation method did not produce exports (or VRE curtailment) but required 5 GWh of energy imports per day on average (Fig. 6e). In contrast, the proposed joint forecasting model at the nodal level OMP-NLGPR with \({{\mathcal{K}}}_{L}\), results in exports of 260 MWh and imports of 80 MWh per day on average. The total imbalance, calculated as daily exports (energy curtailment) plus imports (demand shedding), is 340 MWh per day on average. We compare the joint and independent forecasts with the lower IS to verify that a joint energy forecast produces fewer energy imbalances. The forecasting model with independent energy features that achieved the lowest IS, evaluated on the days in the test set, is EN-BLR (Supplementary Fig. 10a). This model produces more imbalances (410 MWh per day) than the joint energy features (OMP-NLGPR) model. On average, it has fewer exports (110 MWh) but more imports (300 MWh per day).

The probabilistic day-ahead net demand forecast with the lowest total imbalances is an ensemble model formed by the joint energy forecasting models with the lowest IS in each node: OMP-NLGPR with \({{\mathcal{K}}}_{L}\) at NP15 (Supplementary Fig. 10i), Lasso-NLGPR with \({{\mathcal{K}}}_{L}\) at SP15 (Supplementary Fig. 10j), GL-NLGPR with \({{\mathcal{K}}}_{L}\) at ZP26 (Supplementary Fig. 10k). The daily average exports (VRE curtailment) are 160 MWh, and imports (demand shedding) are 180 MWh, producing the lowest imbalance (340 MWh). The independent ensemble day-ahead net demand forecast with the lowest IS for each energy feature has EN-BLR for electricity demand (Supplementary Fig. 10f), EN-BLR for solar (Supplementary Fig. 10g), and EN-GPR with \({{\mathcal{K}}}_{L}\) for wind generation (Supplementary Fig. 10h). The average daily energy exports (or VRE curtailment) are 120 MWh, and imports are 230 MWh, while the average imbalance is 350 MWh. This result demonstrates that the IS is an effective indicator for selecting a reserve allocation method based on confidence intervals. Ensemble probabilistic day-ahead energy forecasts lead to less total imbalance in power grid operation in line with the No Free Lunch theorem58.

Additionally, jointly forecasting energy features minimizes total imbalance and reduces the bias, further enhancing the efficiency of the reserve allocation method and potentially minimizing the operating costs, while not affecting or even improving the effectiveness of the net demand forecast for coordinating the wholesale electricity markets and, overall, reducing the risk of large forecasting errors.

Discussion

Our results, using the case study of the California Independent System Operator (CAISO), show that a probabilistic day-ahead forecast for wind, solar, and electricity demand performs better than deterministic forecasts (Fig. 2) with the additional benefit of having a predictive density distribution to generate predictive scenarios and derive intervals. Consequently, probabilistic day-ahead energy forecasts based on Bayesian learning are more suitable and versatile for applications in power systems, including resource assessment, stochastic operational planning, and operational reserve allocation (Fig. 6).

We analyze our results to isolate the impacts on the performance of sparse learning, the joint distribution, and the kernel function. We first assess the sparse learning impact on the performance by adjusting the hyperparameters to achieve different sparsity levels (ranging from 10 to 1000). We find that the best models consistently include features in the magnitude order of 100s for electric load and solar and 10s for wind (Supplementary Fig. 6). However, sparse learning is always necessary to reduce the computational complexity of the problem. The Relevance Vector Machine (RVM) is an indicator of when the sparse model is unnecessary. The RVM underperforms because it cannot identify the most relevant features (Fig. 2b, e).

We then validate whether modeling the joint distribution across nodes or features improves performance. The performance increased when incorporating the joint distribution (across energy features at the nodal level) into the Gaussian Process for Regression (NLGPR) for electricity demand with Lasso, and solar generation with Elastic Net (EN), compared to independent models (Fig. 2b, c). We can also confirm this in ZP26, which does not have wind generators (Fig. 2e, f). Additionally, we assess the joint distribution (across nodes for a single energy feature) at the system level (SLGPR) when each feature has a kernel, and find that it provides an advantage for wind generation (Fig. 5l).

Finally, we can isolate the effect of the kernel function on the performance. The performance of wind generation declines because electricity load and solar generation favor a linear kernel (\({{\mathcal{K}}}_{L}\)), which negatively impacts the performance when evaluating net demand at the nodal level (Fig. 2f). Wind generation requires an EN-NLGPR with rational quadratic (\({{\mathcal{K}}}_{RQ}\)) or Matérn (\({{\mathcal{K}}}_{{M}_{1.5}}\)) kernel (Supplementary Tables 1–4), indicating the existence of non-linear relations between the input features and wind generation. Multiple kernel learning exploits non-linear relations by explicitly defining a transformation for each feature source59, but the current MTGPR library does not support multiple kernel functions45.

As a consequence, Elastic Net with Bayesian Linear Regression (EN-BLR) performs better overall. Still, when we consider ensembles to take advantage of the joint forecasts and evaluate them with proper scoring rules, we observe that joint forecasts are advantageous in power system applications, such as reserve allocations (Fig. 6e).

According to previous research, the most suitable model for a probabilistic day-ahead electricity demand forecast was the RVM60. In contrast, our research shows that Bayesian methods based on kernel learning improve the performance of an RVM by exploiting underlying non-linear relations in the input feature space to produce a joint forecast across features and nodes (Fig. 5). A previous non-linear probabilistic forecast of wind generation proposed a Radial Basis Function kernel (\({{\mathcal{K}}}_{RBF}\)). Instead, after cross-validating different kernels (including \({{\mathcal{K}}}_{RBF}\)), we found that the Matérn kernel (\({{\mathcal{K}}}_{{M}_{1.5}}\)) considerably reduces the forecasting errors in our experiments61.

Other recent studies have introduced day-ahead forecasting methods that minimize the energy score62. Yet our findings suggest that selecting a model based solely on a score leads to poor performance in other scores (skill score and interval score) under certain conditions (Fig. 5). Bayesian learning models that optimize a purely probabilistic loss and provide probability calibration frameworks, in conjunction with examining the model performance across multiple proper scoring rules, are a more explainable and risk-averse approach to encourage industry adoption. Furthermore, the best-performing models consistently selected sparse representations of the input space to enhance the effectiveness of a probabilistic day-ahead forecast (Supplementary Fig. 6), contradicting earlier approaches that relied on dense weather features from NWFs35.

For California, our findings indicate more uncertainty in electricity demand during daylight hours compared to night hours. This increase in the electricity demand uncertainty during daylight hours could likely be explained by social or behavioral patterns, rather than solely by weather features. Additional research is needed to understand which social patterns (e.g., traffic, rain, or electric vehicle charging patterns) significantly impact electricity demand forecasts and how to quantify these effectively. Contrarily, the confidence interval during the daytime or nighttime hours remains conservative even when baseline models and the proposed probabilistic forecast align with actual outcomes in solar (Fig. 4d) and wind generation (Fig. 4g), which reflects uncertainty introduced by the predicted weather features from the HRRR, congestion or generator outages.

CAISO’s approach for allocating operational reserves combines the deterministic day-ahead electricity demand forecast with historical forecasting errors55. In contrast, our approach utilizes the net demand predictive interval equivalent to the total operational reserves allocated following CAISO’s approach (Fig. 6b). The bias in CAISO’s forecast reduces curtailment but overestimates solar generation, producing substantial energy imports (Fig. 6e). This bias allows CAISO to effectively predict peak capacity and ramp-up steepness, which are critical during stress events on the power grid. However, during hours of high net demand, CAISO’s forecast performs worse than persistence and climatology, leading to high energy imports from the energy imbalance market (Fig. 2j). The proposed joint probabilistic day-ahead forecast has a negligible bias, making it more suitable for determining the reserves (Fig. 6e). Furthermore, optimizing the lower and upper predictive intervals could reduce energy curtailment.

The proposed dynamic reserves allocation serves as an example of how to utilize a probabilistic forecast in power system operations. Our imbalance market simulations do not consider real-time prices. In addition, the simulations do not enforce the minimum stable capacity constraint on generators during unit commitment. We also do not include the largest committed unit capacity and contracted imports and exports in CAISO’s method for reserves allocation, since this information was not available for comparison56.

We intentionally integrated the ability to generate predictive scenarios associated with probabilities in a day-ahead forecast to assess the Conditional Value at Risk (CVaR)—a risk measure to estimate the expected losses in the tails of a distribution—for dynamic reserves allocation and analyze the impact of spatial correlations on both demand and generation on grid congestion, considering congestion and committed units to account for energy imports and exports more accurately. However, this investigation will require methods and datasets beyond the scope of this study.

The results may vary when applied to other ISOs due to different regional demand patterns, generation mixes, and specific operational requirements (i.e., forecasting horizon and lag). However, the proposed joint day-ahead probabilistic forecasting method remains applicable and transferable with the appropriate datasets. Since our net demand forecast is based on a top-down approach that learns short-term patterns aggregated at the three main zones in CAISO, we implicitly model the bottom-up breakdown of disaggregated demand (e.g., residential, industrial, and weather-sensitive loads) and distributed generation. While this abstraction level may be insufficient for long-term disaggregated bottom-up demand forecasts, it is effective in our aggregated day-ahead net demand forecast application.

In summary, the proposed probabilistic day-ahead forecasts, based on Bayesian and kernel learning, improve the accuracy and effectively capture the uncertainty in electricity demand, wind, and solar generation. By jointly forecasting these energy features, the model enhances reserve allocation and generates electricity demand and supply scenarios for risk assessment in power grid planning. Ultimately, the proposed reserve allocation method has the potential to reduce energy imports and curtailment, promoting the integration of variable renewable energy sources and enabling a more efficient, sustainable, and resilient electricity system.

Methods

This section includes the description of the Data, the Processing and Filtering, the Theoretical Background, the Data Structure, and the Experimental Setup. The Theoretical Background is divided into Sparse Learning and Bayesian learning, where the different methods are explained. The sections about Scenario Smoothing and Predictive Density Calibration propose approaches to overcome limitations.

Data

The weather forecast is from the HRRR (Supplementary Note 5). It has a spatial resolution of 3 × 3 km and covers the Continental United States (CONUS) and Alaska. The NWF assimilates radar information every 15 min over 1 h to add further detail to the RAP weather forecast, which has a spatial resolution of 13 × 13 km. HRRR provides 192 weather variables, which include actuals (i.e. analyzed) and forecasts at different altitudes in the atmosphere. The forecast has a 48 h horizon and is provided 4 times a day (00, 06, 12, and 18 UTC). This investigation assesses the following weather variables obtained from the HRRR: atmospheric pressure, dew point, relative humidity, temperature, direct long-wave radiation flux, direct short-wave radiation flux, and wind velocity components at 10 m and 80 m.

The wind velocity magnitude is interpolated by applying the Power Law in supplementary equation 8 at 60 m, 100 m, and 120 m from the wind velocity at 10 m and 80 m63. The discomfort index is derived from the relative humidity and the temperature64, see supplementary equation 9. The Global Horizontal Irradiance (GHI) is estimated with a theoretical model65. The theoretical model requires the elevation and latitude of each point in the 3 × 3 km, in addition to the date (i.e., year, month, day, and hour). Ultimately, the model is capable of providing intra-hour estimation. The elevation information is from the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010)66, developed by the U.S. Geological Survey (USGS) and the National Geospatial-Intelligence Agency (NGA). The elevation estimation comprises data assimilated from multiple sources (i.e., radar and satellite information).

The aggregated electricity demand and solar and wind generation are from the Open Access Same-time Information System (OASIS) platform maintained by CAISO67. This platform provides real-time information about the transmission system and electricity market operation within the Western Interconnection68. OASIS publishes DA operational forecasts and actual realizations with a 1 h resolution for each training hub and utility in CAISO. Solar generation is available in NP15 (north), ZP26 (central), and SP15 (south) trading hubs. Wind generation is available in NP15 and SP15 trading hubs. Electricity demand forecasts and actuals are also available for MWD, PG&E (north-central), SCE (south-central), and SDG&E (south) utility companies. MWD and VEA are water companies and mainly operate hydroelectric pumps and storage systems, so we do not consider them in this investigation.

The solar and wind power plant locations and specifications are part of the California Energy Commission (CEC) critical infrastructure geospatial datasets69. The dataset contains information about the energy type and the total nameplate capacity of all plants with a nameplate capacity ≥1MW in California. It is updated annually from the CEC QFER-1304 Power Plant Owner Reporting Database. This investigation uses solar and wind plant locations in Supplementary Fig. 2b.

The information about the population density in California is from the Gridded Population of the World, Version 4 (GPWv4) dataset (see Supplementary Fig. 2b). GPWv4 provides high spatial resolution worldwide sub-national population density for multiple years in the number of persons per km2, with counts consistent with national census and population register70. This investigation utilizes data from the 2020 update.

Processing and filtering

The multiple spatial data sources (i.e., HRRR, GPWv4, and GMTED2010) were interpolated to have the same spatial resolution. The original resolution of HRRR is approximately \(1.{7}^{{\prime} }\times 1.{7}^{{\prime} }\), GPWv4 is \(2.{5}^{{\prime} }\times 2.{5}^{{\prime} }\), and GMTED2010 is 30 × 30. The final resolution is \({7}^{{\prime} }\times {7}^{{\prime} }\) equivalent to approximately 13 × 13 km grid. The method implemented was 2-dimensional nearest-neighbor interpolation. The result is a spatial forecast in a 104 × 88 grid (N × M) per weather feature.

The HRRR provides reanalyzed observations and weather forecasts. We downloaded the reanalyzed weather features that match the energy feature time series: electricity demand (\({\mathcal{L}}\)), solar (\({\mathcal{S}}\)), and wind generation (\({\mathcal{W}}\)). We call them the reanalysis dataset (\({\mathcal{A}}\)). Similarly, we downloaded the matching HRRR forecast from the previous day at 00 UTC (4 pm PTZ) for the same energy feature time series, and we call it the forecast dataset (\({\mathcal{F}}\)). We define a sample in the time series k (1, ] from the reanalysis or the forecast datasets as having a 1-day resolution and containing information for each hour t in a day T = 24. In particular, the sample k for hour t is \({{\bf{X}}}_{k,t}\in {{\mathbb{R}}}^{N\times M}\) when is from the reanalysis dataset (\({\mathcal{A}}\)), and \({\widehat{{\bf{X}}}}_{k,t}\in {{\mathbb{R}}}^{N\times M}\) when is from the forecast dataset (\({\mathcal{F}}\)).

The second part of the processing is to remove spatial dimensions from the HRRR forecast that do not contain population, or solar or wind power plants in proximity. However, the spatial distribution of errors in the weather forecast is unknown; we include information from nearby regions to the plants and let the sparse learning method select the most suitable features. We calculate the population density to define the electricity demand mask \({\psi }^{{\mathcal{L}}}\), the solar capacity density for solar mask \({\psi }^{{\mathcal{S}}}\), and the wind generation density for wind mask \({\psi }^{{\mathcal{W}}}\). The solar and wind density is derived from the power plant locations in Supplementary Fig. 2b. We apply the masks to the low-resolution weather features from the HRRR forecast. A detailed explanation of the spatial filtering steps is in Supplementary Note 4.

Theoretical background

The canonical formulation of a multivariate regression problem, having a set of N observations \({\mathcal{D}}=\{({y}_{i},{{\bf{x}}}_{i})\}\), is

$${y}_{i}={{\bf{w}}}^{\top }{{\bf{x}}}_{i}+{\varepsilon }_{i},$$
(1)

where the response variable is a scalar \({y}_{i}\in {\mathbb{R}}\), and the covariates are feature vectors \({{\bf{x}}}_{i}\in {{\mathbb{R}}}^{D}\). The error term \({\varepsilon }_{i} \sim {\mathcal{N}}\left(0,{\sigma }_{n}^{2}\right)\) is assumed a i.i.d. random variable, and \({\mathcal{N}}(\cdot )\) is a Normal distribution.

Sparse learning

The objective of the sparse learning is to discover which weather variables are more informative about the electricity demand (\({y}_{k,t,z}^{{\mathcal{L}}}\)) and solar (\({y}_{k,t,z}^{{\mathcal{S}}}\)) and wind (\({y}_{k,t,z}^{{\mathcal{W}}}\)) generation time series. Note that k = 1, …, K represents the day, and t = 1, …, T is the hour. Sparse learning models aim to find a solution to equation (1) in which only fractions of the model parameters w are non-zero. This part is the first stage of the workflow (Supplementary Fig. 2a) and utilizes reanalyzed weather features from the HRRR, \({\mathcal{A}}=\{({y}_{{j},{z}},{{\bf{x}}}_{j}) | \forall j=1,\ldots,K \cdot T,\ \forall z,=1,\ldots,Z\}\), where \({{\bf{x}}}_{j}\in {{\mathbb{R}}}^{{D}_{1}\times 1}\) (as found in the Feature Vectors for Sparse Learning) and z represent each node (NP15, SP16, and ZP26).

Lasso

This model was introduced for geophysical problems with high-dimensional data to perform variable selection (i.e., bandwidths) while providing interpretability71. The objective is to reduce the complexity of a linear regression model by selecting a reduced number of covariates. The Lagrangian formulation of Lasso is,

$${\widehat{{\bf{w}}}}_{z}=\mathop{{\rm{argmin}}}\limits_{{{\bf{w}}}_{z}} \, \left\Vert {{\bf{y}}}_{z}-{{\bf{w}}}_{z}^{\top }{\bf{X}} \right\Vert_{2}^{2}+\lambda \parallel {{\bf{w}}}_{z}{\parallel }_{1},$$
(2)

where 1 and 2 represent the L1-norm and L2-norm, respectively, and λ is the regularization term72.

Orthogonal matching pursuit (OMP)

It is the orthogonal version of Matching Pursuit73. The primal formulation of OMP is similar to Lasso. The difference is that OMP implements the L0-norm of the parameters wz instead of the L1-norm74,

$${\widehat{{\bf{w}}}}_{z}=\mathop{{\rm{argmin}}}\limits_{{{\bf{w}}}_{z}} \, \left\Vert {{\bf{y}}}_{z}-{{\bf{w}}}_{z}^{\top }{\bf{X}} \right\Vert_{2}^{2},\,\,{\rm{s.t.}}\,\,\parallel {{\bf{w}}}_{z}{\parallel }_{0}\le \beta .$$
(3)

The hyperparameter β represents the maximum number of non-zero elements in the model, and 0 is the L0-norm.

Elastic net

This model adds a quadratic regularization term (L2-norm) to the Lasso formulation to overcome the potential saturation (selecting many variables) or group (selecting a unique variable in a group) selection problems75. The Elastic Net formulation is

$${\widehat{{\bf{w}}}}_{z}=\mathop{{\rm{argmin}}}\limits_{{{\bf{w}}}_{z}} \, \left\Vert {{\bf{y}}}_{z}-{{\bf{w}}}_{z}^{\top }{\bf{X}} \right\Vert_{2}^{2}+{\Omega }_{1}\parallel {{\bf{w}}}_{z}{\parallel }_{1}+{\Omega }_{2}\parallel {{\bf{w}}}_{z}{\parallel }_{2}^{2},$$
(4)

the hyperparameters Ω1 and Ω2 weight the regularization terms. If Ω1 = 0 or Ω2 = 0, the model is equivalent to Ridge Regression or the Lasso, respectively.

Group Lasso

This model is an extension of the Lasso with grouped covariates76. We apply the Lasso regularization (L1-norm) to all model coefficients, and the group regularization (L1-norm) to the coefficients grouped by weather features in a location (i.e., coordinate pairs). The L2-norm group regularization is not squared, which makes the penalty non-differentiable at zero, enabling the group variable selection. Its optimization problem is

$${\hat{{\bf{w}}}}_{z}=\mathop{{\rm{a}}{\rm{r}}{\rm{g}}{\rm{m}}{\rm{i}}{\rm{n}}}\limits_{{{\bf{w}}}_{z}}{\left\Vert {{\bf{y}}}_{z}-\left(\mathop{\sum }\limits_{c\in {\mathcal{C}}}{{\bf{w}}}_{z,c}^{{{\top }}}{{\bf{X}}}_{c}\right)\right\Vert }_{2}^{2}+{\xi }_{1}\Vert {{\bf{w}}}_{z}{\Vert }_{1}+{\xi }_{2}\mathop{\sum }\limits_{c\in {\mathcal{C}}}{\sqrt{d}}_{c}\Vert {{\bf{w}}}_{z,c}{\Vert }_{2},$$
(5)

where the covariance in each group c are \({{\bf{X}}}_{c}\in {{\mathbb{R}}}^{M\times {d}_{c}}\), and \({{\bf{w}}}_{z,c}\in {{\mathbb{R}}}^{{d}_{c}\times 1}\) are the model parameters for group c, dc is the number of dimensions in group c, and \({\mathcal{C}}\) is the total number of groups. ξ1 is the Lasso regularization and ξ2 is the group regularization hyperparameter. If ξ2 = 0 the model becomes equivalent to the Lasso. The optimal coefficients are found with the fast iterative shrinkage-thresholding algorithm77.

Bayesian learning

Bayesian learning models estimate the uncertainty in the prediction produced by the intrinsic epistemic uncertainty in the parameters and aleatory uncertainty in the observations (i.e., noise). This part is the second stage of the workflow (Supplementary Fig. 2a) and utilizes the forecasted weather features from the HRRR, \({\mathcal{F}}=\left\{({y}_{k,t,z},{\widehat{{\bf{x}}}}_{k,t})| \forall k=1,\ldots,K,\,\forall t=1,\ldots,T,\,\forall z=1,\ldots,Z\right\}\), where \({\widehat{{\bf{x}}}}_{k,t}\in {{\mathbb{R}}}^{{D}_{2}\times 1}\) (as found in the Pattern Vectors for Bayesian Learning). Index z corresponding to the node number is omitted from the nomenclature for simplicity.

Bayesian linear regression (BLR)

The objective in this model is to find the parameters w that maximize the posterior probability72,

$$p\left({\bf{w}}| \widehat{{\bf{X}}},{\bf{y}},{\sigma }_{n}^{2}\right)\propto p\left({\bf{y}}| \widehat{{\bf{X}}},{\bf{w}},{\sigma }_{n}^{2}\right)p\left({\bf{w}}| {{\mathbf{\Sigma }}}_{p}\right).$$
(6)

The distribution of the response variable is \(p({\bf{y}}| \widehat{{\bf{X}}},{\bf{w}},{\sigma }_{n}^{2}) \sim {\mathcal{N}}({{\bf{w}}}^{\top }\widehat{{\bf{X}}},{\sigma }_{n}^{2})\), and the prior distribution of the model parameters is \(p({\bf{w}}| {{\mathbf{\Sigma }}}_{p}) \sim {\mathcal{N}}({\bf{0}},{{\mathbf{\Sigma }}}_{p})\), where \({{\mathbf{\Sigma }}}_{p}={\sigma }_{p}^{2}{{\bf{I}}}_{{D}_{2}\times {D}_{2}}\). In addition, it is possible to regularize the model by adding a conjugate prior of the model hyperparameters \(\widehat{\theta }=\{{\sigma }_{n}^{2},{\sigma }_{p}^{2}\}\), so that \(p\left({\sigma }_{n}^{2}| {\alpha }_{n},{\beta }_{n}\right) \sim {\mathcal{G}}\left({\alpha }_{n},{\beta }_{n}\right)\) and \(p({\sigma }_{p}^{2}| {\alpha }_{p},{\beta }_{p}) \sim {\mathcal{G}}({\alpha }_{p},{\beta }_{p})\), where \({\mathcal{G}}(\cdot )\) is a gamma distribution. This is known as Bayesian hierarchical linear regression (Supplementary Fig. 3a). The optimal parameters in hierarchical prior αnβnαp and βp are found with a Gaussian approximation78.

Relevance vector machine (RVM)

The prior of w is different when implementing the Automatic Relevance Determination (ARD) mechanism in BLR79,80. Here, the prior \({\bf{w}} \sim {\mathcal{N}}\left({\bf{0}},{\boldsymbol{\Gamma }}\right)\), has a standard deviation γj for each parameter wj, defined as \({\boldsymbol{\Gamma }}={\rm{diag}}([{\gamma }_{1}\cdots {\gamma }_{{D}_{2}}])\). This model has a hyperparameter γ0 which defines the threshold to eliminate dimensions in the input space81; see Supplementary Fig. 3b.

Gaussian process for regression (GPR)

The kernel trick enables a linear model to have nonlinear properties. A kernel function is a positive definite function that maps a feature vector \({\mathcal{K}}(\cdot,\cdot ):\widehat{{\mathcal{X}}}\times \widehat{{\mathcal{X}}}\to {\mathcal{R}}\) into reproducing kernel Hilbert space \({\mathcal{H}}\) spanned by a function \(\varphi :\widehat{{\mathcal{X}}}\to {\mathcal{H}}\) and reproduced by the inner product \({\mathcal{K}}\left({\widehat{{\bf{x}}}}_{i},{\widehat{{\bf{x}}}}_{i}^{{\prime} }\right)\triangleq \langle \varphi ({\widehat{{\bf{x}}}}_{i}),\varphi ({\widehat{{\bf{x}}}}_{i}^{{\prime} })\rangle\). As a consequence, applying the Generalized Representer Theorem to the model parameters allows us to express them as a linear combination of the data \({\bf{w}}={{\mathbf{\Sigma }}}_{p}^{1/2}{\mathbf{\Phi }}{\boldsymbol{\alpha }}\). In the context of BLR, the extension is called GPR46. The Maximum A Posteriori (MAP) estimation of α is

$$p\left({\boldsymbol{\alpha }}| {\mathbf{\Phi }},{\bf{y}},{\boldsymbol{\theta }},{\sigma }_{n}^{2}\right)\propto p\left({\bf{y}}| {\mathbf{\Phi }},{\boldsymbol{\alpha }},{\boldsymbol{\theta }},{\sigma }_{n}^{2}\right)p\left({\boldsymbol{\alpha }}| {{\mathbf{\Sigma }}}_{p},{\boldsymbol{\theta }}\right),$$
(7)

the likelihood is \(p({\bf{y}}| {\mathbf{\Phi }},{\boldsymbol{\alpha }},{\boldsymbol{\theta }},{\sigma }_{n}^{2}) \sim {\mathcal{N}}({{\boldsymbol{\alpha }}}^{\top }{{\mathbf{\Phi }}}^{\top }{{\mathbf{\Sigma }}}_{p}{\mathbf{\Phi }},{\sigma }_{n}^{2})\) and the prior is \(p({\boldsymbol{\alpha }}| {{\mathbf{\Sigma }}}_{p},{\boldsymbol{\theta }}) \sim {\mathcal{N}}({\bf{0}},{{\mathbf{\Phi }}}^{\top }{{\mathbf{\Sigma }}}_{p}{\mathbf{\Phi }})\). Similarly, the optimal representation of the dual parameters \(\widehat{{\boldsymbol{\alpha }}}\) has analytical solutions, and the optimal hyperparameters \(\widehat{{\boldsymbol{\theta }}}\) are found by minimizing the Negative Marginal Log-Likelihood (NMLL). The plate diagram is in Supplementary Fig. 3c.

Multi-task Gaussian process for regression (MTGPR)

The proposed multi-tasks regression problem82,83 aims to estimate τ response variables \({{\bf{y}}}_{k}\in {{\mathbb{R}}}^{\tau }\), in vector form yk = [ yk,1 yk,τ], from a feature vector \({\widehat{{\bf{x}}}}_{k}\) mapped \(\varphi ({\widehat{{\bf{x}}}}_{k})\) into a reproducing kernel Hilbert space \({\mathcal{H}}\) endowed with a dot product \(k({\widehat{{\bf{x}}}}_{k},{\widehat{{\bf{x}}}}_{l})\), where function k( , ) is a Mercer’s kernel84, with the following canonical model,

$${{\bf{y}}}_{k}={{\bf{W}}}^{\top }\varphi ({\widehat{{\bf{x}}}}_{k})+{{\boldsymbol{\varepsilon }}}_{k},$$
(8)

where W is the matrix of primal parameters \({\bf{W}}=\left[{{\bf{w}}}_{1}\cdots {{\bf{w}}}_{\tau }\right]\) that have a dual representation W = AΦ. The error term εk in the multi-task regression problem is a vector εk = [εk,1 εk,τ] that we assumed to have Gaussian distribution \(p\left({{\boldsymbol{\varepsilon }}}_{k}\right) \sim {\mathcal{N}}\left({\bf{0}},{{\mathbf{\Sigma }}}_{n}\right)\) with zero mean and covariance matrix Σn. Under this assumption, the likelihood function is also Gaussian \(p\left({{\bf{y}}}_{k}| {\bf{W}},{\mathbf{\Phi }},{{\mathbf{\Sigma }}}_{n}\right) \sim {\mathcal{N}}\left({{\bf{W}}}^{\top }\varphi ({\widehat{{\bf{x}}}}_{k}),{{\mathbf{\Sigma }}}_{n}\right)\), and we can obtain the MAP estimation of the dual parameters A by assuming a Gaussian prior on the prior parameters \(p\left(\,{\rm{vec}}\,({\bf{W}})\right) \sim {\mathcal{N}}({\bf{0}},{\bf{C}}\otimes {{\mathbf{\Sigma }}}_{p})\), where C is the inter-task covariance, Σp is the parameters covariance and denotes the Kronecker product between matrices.

In the problem at hand, we explore τ as the number of nodes (NP15, SP15, ZP26), so τ = 3 when forecasting electricity demand (\({\mathcal{L}}\)) and solar generation (\({\mathcal{S}}\)). When forecasting wind (\({\mathcal{W}}\)), τ = 2, since wind is only available in the NP15 and SP15 nodes. Similarly, when forecasting all energy features (\({\mathcal{L}}\), \({\mathcal{S}}\) and \({\mathcal{W}}\)) τ = 3 in node NP15 and SP15, but node ZP26 only has electricity demand and solar generation (τ = 2).

In a conditional one-output likelihood multi-task Gaussian process for regression (Cool-MTGPR)45, the task of estimating τ regressors \({{\bf{y}}}_{k}\in {{\mathbb{R}}}^{{\mathcal{T}}}\) from predictor \(\varphi ({\widehat{{\bf{x}}}}_{k})\) is done with the model

$${y}_{k,\tau }=\varphi {({\widehat{{\bf{x}}}}_{k})}^{\top }{{\bf{w}}}_{\tau }+{\varepsilon }_{k,\tau }=\varphi {({\widehat{{\bf{x}}}}_{k})}^{\top }{{\bf{w}}}_{x,\tau }+{{\bf{y}}}_{k,1:\tau -1}^{\top }{{\bf{w}}}_{y,\tau }+{\varepsilon }_{k,\tau }.$$
(9)

In this formulation, each factorized task yk is modeled as dependent of the previous ones, and therefore the corresponding weights are split into \({{\bf{w}}}_{k,\tau }\in {\mathcal{H}}\) for the input sample \({\widehat{{\bf{x}}}}_{k}\) and wyτ for the previous tasks. Indeed, weight vectors wτ in an MTGPR can be recovered as wτ = wx,τ + W1:τ−1wy,τ. Here, model error ε is assumed to have the form of a Gaussian distribution \(p\left({{\boldsymbol{\varepsilon }}}_{k}\right) \sim {\mathcal{N}}({\bf{0}},{{\mathbf{\Sigma }}}_{p})\).

By applying the chain rule of probability to the standard joint multitask likelihood, we can factorize it into a product of conditional probabilities, each one corresponding to each one of the conditional tasks in equation (9),

$$p\left({\rm{vect}}({\bf{Y}})| {\mathbf{\Phi }},{\bf{W}}\right)=\mathop{\prod }\limits_{k=1}^{T}p\left({{\bf{y}}}_{k}| {{\bf{Y}}}_{1:k-1},{\mathbf{\Phi }},{{\bf{w}}}_{x,\tau },{{\bf{w}}}_{y,\tau }\right),$$
(10)

where each conditional GP at the right side of the equation has a likelihood,

$$p\left({{\bf{y}}}_{k}| {{\bf{Y}}}_{1:k-1},{\mathbf{\Phi }},{{\bf{w}}}_{x,\tau },{{\bf{w}}}_{y,\tau }\right)={\mathcal{N}}\left({{\bf{y}}}_{k}| \,{{\mathbf{\Phi }}}^{\top }{{\bf{w}}}_{x,\tau }+{{\bf{Y}}}_{1:k-1}^{\top }{{\bf{w}}}_{y,\tau },{\sigma }_{k}^{2}I\right).$$
(11)

The prior distribution of each weight vector wx,τ is modeled as

$$p({{\bf{w}}}_{x,\tau })={\mathcal{N}}\left({{\bf{w}}}_{x,\tau }| {\bf{0}},{b}_{\tau }{{\mathbf{\Sigma }}}_{p}\right),$$
(12)

and where for each task, a conditional one output likelihood GP is modeled with mean \({{\bf{w}}}_{y,\tau }^{\top }{{\bf{y}}}_{k,1:\tau -1}\) and covariance bτK (see Figure 8d), where \({[{\bf{K}}]}_{k,l}=k({\widehat{{\bf{x}}}}_{k},{\widehat{{\bf{x}}}}_{l})\).

To solve for primal weight vectors wy,τ, we define a prior of these parameters with zero mean and identity covariance matrix45 and infer a posterior all parameters following the formulation of the standard GPR46

$$p\left([{{\bf{w}}}_{x,\tau },\,{{\bf{w}}}_{y,\tau }]| {\mathbf{\Phi }},{{\bf{Y}}}_{1:\tau }\right)={\mathcal{N}}\left(\left.\left[\begin{array}{c}{{\bf{w}}}_{x,\tau }\\ {{\bf{w}}}_{y,\tau }\end{array}\right]\,\right| \,\left[\begin{array}{c}{\bar{{\bf{w}}}}_{x,\tau }\\ {\bar{{\bf{w}}}}_{y,\tau }\end{array}\right],\,{{\bf{A}}}_{\tau }^{-1}\right).$$
(13)

The posterior is proportional to the product of the prior times the model likelihood. Solving for matrix A and vector \({\bar{{\bf{w}}}}_{y,\tau }\) gives the solution, which has to be obtained through a dual formulation, provided that the observations are transformed into space \({\mathcal{H}}\).

After this, parameters \({b}_{\tau },{\sigma }_{k}^{2}\) and the kernel parameters are solved by maximizing the joint log-likelihood over all tasks. Once this optimization is done, the solution for wx,τ is given in dual form as \({\bar{{\bf{w}}}}_{x,\tau }={\mathbf{\Phi }}{{\boldsymbol{\alpha }}}_{\tau }\), where

$${{\boldsymbol{\alpha }}}_{\tau }={{\bf{K}}}_{x,\tau }^{-1}\left({{\bf{y}}}_{\tau }-{\bar{{\bf{w}}}}_{y,\tau }^{\top }{{\bf{Y}}}_{1:\tau -1}\right).$$
(14)

In the equation, \({{\bf{K}}}_{x,\tau }=\left({b}_{\tau }{\bf{K}}+{\sigma }_{\tau }^{2}{\bf{I}}\right)\), and K is the kernel matrix containing the kernel dot products k(xkxl) between samples.

Data structure

The 48 h forecasting horizon provided by the HRRR is at 4 pm. The proposed day-ahead energy forecast is provided at 5 pm, assuming a 1 h lag. The data structure in the forecasting feature vectors is in this section.

Feature vectors for sparse learning

The feature vectors in the sparse models xi are from the reanalysis dataset (\({\mathcal{A}}\)); as found in the Processing and Filtering. We have a different multivariate feature vector for each energy feature: electricity demand (\({{\bf{x}}}_{i}^{{\mathcal{L}}}\)), solar generation (\({{\bf{x}}}_{i}^{{\mathcal{S}}}\)), and wind geHneration (\({{\bf{x}}}_{i}^{{\mathcal{W}}}\)). The weather features have an image-like structure and are filtered, ψ( ), to reduce the spatial dimensions. The spatial mask \({\psi }^{{\mathcal{L}}}(\cdot )\) applied to the electricity demand is based on the population density. The spatial mask \({\psi }^{{\mathcal{S}}}(\cdot )\) applied to the solar features is based on the installed solar generation capacity, and the spatial mask \({\psi }^{{\mathcal{W}}}(\cdot )\) applied to the wind features represents the installed wind generation capacity. The spatial mask \({\psi }^{{\mathcal{E}}}(\cdot )\) contains the intersection of \({\psi }^{{\mathcal{L}}}\cap {\psi }^{{\mathcal{S}}}\cap {\psi }^{{\mathcal{W}}}\) for the nodel-level model.

The weather features in the vector for electricity demand \({{\bf{x}}}_{i}^{{\mathcal{L}}}\) is formed by \({{\bf{r}}}_{k}^{{\prime} }={\psi }^{{\mathcal{L}}}({{\bf{r}}}_{i})\) (DSWRF), \({{\bf{d}}}_{i}^{{\prime} }={\psi }^{{\mathcal{L}}}({{\bf{d}}}_{i})\) (dew point), \({{\bf{h}}}_{i}^{{\prime} }={\psi }^{{\mathcal{L}}}({{\bf{h}}}_{i})\) (relative humidity), \({{\boldsymbol{\tau }}}_{i}^{{\prime} }={\psi }^{{\mathcal{L}}}({{\bf{t}}}_{i})\) (temperature), and \({{\bf{p}}}_{i}^{{\prime} }={\psi }^{{\mathcal{L}}}({{\bf{p}}}_{i})\) (discomfort index). The dimensions of the resulting features are \({{\bf{r}}}_{i}^{{\prime} },{{\bf{d}}}_{i}^{{\prime} },{{\bf{h}}}_{i}^{{\prime} },{{\boldsymbol{\tau }}}_{i}^{{\prime} },{{\bf{p}}}_{i}^{{\prime} }\in {{\mathbb{R}}}^{{D}_{1}^{{\mathcal{L}}}}\). The feature vector for electricity demand is,

$${{\bf{x}}}_{i}^{{\mathcal{L}}}=\left[{r}_{i,1}^{{\prime} }\cdots {r}_{i,{D}_{1}^{{\mathcal{L}}}}^{{\prime} }\,{d}_{i,1}^{{\prime} }\cdots {d}_{i,{D}_{1}^{{\mathcal{L}}}}^{{\prime} }\,{h}_{i,1}^{{\prime} }\cdots {h}_{i,{D}_{1}^{{\mathcal{L}}}}^{{\prime} }\,{\tau }_{i,1}^{{\prime} }\cdots {\tau }_{i,{D}_{1}^{{\mathcal{L}}}}^{{\prime} }\,{p}_{i,1}^{{\prime} }\cdots {p}_{i,{D}_{1}^{{\mathcal{L}}}}^{{\prime} }\right],\,\forall i=1,\ldots,K\cdot T.$$
(15)

Similarly, the features in the vector \({{\bf{x}}}_{i}^{{\mathcal{S}}}\) for solar generation are \({{\bf{i}}}_{i}^{{\prime} }={\psi }^{{\mathcal{S}}}({{\bf{i}}}_{i})\) (DLWRF), \({{\bf{r}}}_{i}^{{\prime} }={\psi }^{{\mathcal{S}}}({{\bf{r}}}_{i})\) (DSWRF), and \({{\bf{g}}}_{i}^{{\prime} }={\psi }^{{\mathcal{S}}}({{\bf{g}}}_{i})\) (GHI). The dimensions of the feature vector are \({{\bf{i}}}_{i}^{{\prime} },{{\bf{r}}}_{i}^{{\prime} },{{\bf{g}}}_{i}^{{\prime} }\in {{\mathbb{R}}}^{{D}_{1}^{{\mathcal{S}}}}\). The resulting feature vector for solar generation is,

$${{\bf{x}}}_{i}^{{\mathcal{S}}}=\left[{i}_{i,1}^{{\prime} }\cdots {i}_{i,{D}_{1}^{{\mathcal{S}}}}^{{\prime} }\,{r}_{i,1}^{{\prime} }\cdots {r}_{i,{D}_{1}^{{\mathcal{S}}}}^{{\prime} }\,{g}_{i,1}^{{\prime} }\cdots {g}_{i,{D}_{1}^{{\mathcal{S}}}}^{{\prime} }\right],\,\forall i=1,\ldots,K\cdot T.$$
(16)

The feature vector \({{\bf{x}}}_{i}^{{\mathcal{W}}}\) for wind generation contains wind speed features: \({{\boldsymbol{\omega }}}_{i}^{{\prime} 60}={\psi }^{{\mathcal{W}}}({{\boldsymbol{\omega }}}_{i}^{{\prime} 60})\) (wind speed at 60 m), \({{\boldsymbol{\omega }}}_{i}^{{\prime} 80}={\psi }^{{\mathcal{W}}}({{\boldsymbol{\omega }}}_{i}^{{\prime} 80})\) (wind speed at 80 m), \({{\boldsymbol{\omega }}}_{i}^{{\prime} 100}={\psi }^{{\mathcal{W}}}({{\boldsymbol{\omega }}}_{i}^{{\prime} 100})\) (wind speed at 100 m), and \({{\boldsymbol{\omega }}}_{i}^{{\prime} 120}={\psi }^{{\mathcal{W}}}({{\boldsymbol{\omega }}}_{i}^{{\prime} 120})\) (wind speed at 120 m), so that \({{\boldsymbol{\omega }}}_{i}^{{\prime} 60},{{\boldsymbol{\omega }}}_{i}^{{\prime} 80},{{\boldsymbol{\omega }}}_{i}^{{\prime} 100},{{\boldsymbol{\omega }}}_{i}^{{\prime} 120}\in {{\mathbb{R}}}^{{D}_{1}^{{\mathcal{W}}}}\). The features in the vector for wind generation are,

$${{\bf{x}}}_{i}^{{\mathcal{W}}}=\left[{\omega }_{i,1}^{{\prime} 60}\cdots {\omega }_{i,{D}_{1}^{{\mathcal{W}}}}^{{\prime} 60}\,{\omega }_{i,1}^{{\prime} 80}\cdots {\omega }_{i,{D}_{1}^{{\mathcal{W}}}}^{{\prime} 80}\,{\omega }_{i,1}^{{\prime} 100}\cdots {w}_{i,{D}_{1}^{{\mathcal{W}}}}^{{\prime} 100}\,{\omega }_{i,1}^{{\prime} 120}\cdots {\omega }_{i,{D}_{1}^{{\mathcal{W}}}}^{{\prime} 120}\right],\,\forall i=1,\ldots,K\cdot T.$$
(17)

Pattern vectors for Bayesian learning

The feature vectors in for Bayesian learning step are the same as in the sparse learning step but are from the forecast dataset (\({\mathcal{F}}\)) instead of the reanalysis dataset (\({\mathcal{A}}\)); as found in the Processing and Filtering. Therefore, we have a different feature vector for each energy feature: \({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{L}}}\) (electricity demand), \({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{S}}}\) (solar generation) and \({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{W}}}\) (wind generation). The feature vectors contain information for each sample k (days) and forecasting horizon t (day hour), so the data structure is different. The forecasted weather features have an additional filtering step, δ( ), that utilizes the coefficients from the sparse learning methods \({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{X}}}=\{{\widehat{x}}_{k,t,d}^{{\mathcal{X}}}| {\sum }_{z=1}^{Z}| {\widehat{w}}_{d,z}^{{\mathcal{X}}}| > 0,\forall d=1,\ldots,{D}_{1}^{{\mathcal{X}}}\}\) for each independent energy feature (\({\mathcal{X}}\in \{{\mathcal{L}},{\mathcal{S}},{\mathcal{W}}\}\)).

The features in the forecasting vector for electricity demand (\({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{L}}}\)) are: \({\widehat{{\bf{r}}}}_{k,t}^{{\prime\prime} }=\delta ({\widehat{{\bf{r}}}}_{k,t}^{{\prime} })\), \({\widehat{{\bf{d}}}}_{k,t}^{{\prime\prime} }=\delta ({\widehat{{\bf{d}}}}_{k,t}^{{\prime} })\), \({\widehat{{\bf{h}}}}_{k,t}^{{\prime\prime} }=\delta ({\widehat{{\bf{h}}}}_{k,t}^{{\prime} })\), \({\widehat{{\boldsymbol{\tau }}}}_{k,t}^{{\prime\prime} }=\delta ({\widehat{{\boldsymbol{\tau }}}}_{k,t}^{{\prime} })\), and \({\widehat{{\bf{p}}}}_{k,t}^{{\prime\prime} }=\delta ({\widehat{{\bf{p}}}}_{k,t}^{{\prime} })\), \({\widehat{{\bf{r}}}}_{k,t}^{{\prime\prime} },{\widehat{{\bf{d}}}}_{k,t}^{{\prime\prime} },{\widehat{{\bf{h}}}}_{k,t}^{{\prime\prime} },{\widehat{{\boldsymbol{\tau }}}}_{k,t}^{{\prime\prime} },{\widehat{{\bf{p}}}}_{k,t}^{{\prime\prime} }\in {{\mathbb{R}}}^{{D}_{2}^{{\mathcal{L}}}}\). The electricity demand vector includes \({{\bf{y}}}_{k-{\ell }_{1},t}^{{\mathcal{L}}}\) demand observations of the 1 past days at hour t, and 2 from the same operational day \({{\bf{y}}}_{k,t-{\ell }_{2}}^{{\mathcal{L}}}\). The vector also has temporal features (Supplementary Note 6): zk,1 (year), zk,2 (year day), and zk,3 (week day); and auxiliary variables zk,4 (weekend), zk,5 (Holiday), zk,6 (daylight saving time), and t (day hour),

$$\begin{array}{l}{\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{L}}}=\left[{\widehat{r}}_{k,t,1}^{\,{\prime\prime} }\cdots {\widehat{r}}_{k,t,{D}_{2}^{{\mathcal{L}}}}^{\,{\prime\prime} }\,{\widehat{d}}_{k,t,1}^{{\prime\prime} }\cdots {\widehat{d}}_{k,t,{D}_{2}^{{\mathcal{L}}}}^{{\prime\prime} }\,{\widehat{h}}_{k,t,1}^{{\prime\prime} }\cdots {\widehat{h}}_{k,t,{D}_{2}^{{\mathcal{L}}}}^{{\prime\prime} }\,{\widehat{\tau }}_{k,t,1}^{\,{\prime\prime} }\cdots {\widehat{\tau }}_{k,t,{D}_{2}^{{\mathcal{L}}}}^{\,{\prime\prime} }\right.\\ \left.{\widetilde{p}}_{k,t,1}^{{\prime\prime} }\cdots {\widehat{p}}_{k,t,{D}_{2}^{{\mathcal{L}}}}^{{\prime\prime} }\,{y}_{k-{\ell }_{1},t}^{{\mathcal{L}}}\cdots {y}_{k-1,t}^{{\mathcal{L}}}\,{y}_{k,t-{\ell }_{2}}^{{\mathcal{L}}}\cdots \,{y}_{k,t-1}^{{\mathcal{L}}}\,{z}_{k,1}\cdots {z}_{k,6}\,t\right].\end{array}$$
(18)

The features in vector \(({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{S}}})\) to forecast solar generation are \({\widehat{{\bf{i}}}}_{k,t}^{\,{\prime\prime} }=\delta ({\widehat{{\bf{i}}}}_{k,t}^{\,{\prime} })\), \({{\bf{r}}}_{k,t}^{{\prime\prime} }=\delta ({\widehat{{\bf{r}}}}_{k,t}^{\,{\prime} })\), and \({\widehat{{\bf{g}}}}_{k,t}^{{\prime\prime} }=\delta ({\widehat{{\bf{g}}}}_{k,t}^{{\prime} })\), \({\widehat{{\bf{r}}}}_{k,t}^{\,{\prime\prime} },{\widehat{{\bf{i}}}}_{k,t}^{\,{\prime\prime} },{\widehat{{\bf{g}}}}_{k,t}^{{\prime\prime} }\in {{\mathbb{R}}}^{{D}_{2}^{{\mathcal{S}}}}\). In addition, the feature vector includes time series from \({{\bf{y}}}_{k-{\ell }_{1},t}\) actual solar generation of 1 = 6 past operational days, and past 2 = 16 day hours \({{\bf{y}}}_{k,t-{\ell }_{2}}\) and temporal features zk,1, zk,2, and zk,6,

$${\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{S}}} =\left[{\widehat{i}}_{k,t,1}^{\,\,{\prime\prime} }\cdots {\widehat{i}}_{k,t,{D}_{2}^{{\mathcal{S}}}}^{\,\,{\prime\prime} }\,{\widehat{r}}_{k,t,1}^{\,{\prime\prime} }\cdots {\widehat{r}}_{k,t,{D}_{2}^{{\mathcal{S}}}}^{\,{\prime\prime} }\,{\widehat{g}}_{k,t,1}^{{\prime\prime} }\cdots {\widehat{g}}_{k,t,{D}_{2}^{{\mathcal{S}}}}^{{\prime\prime} }\,{y}_{k-{\ell }_{1},t}^{{\mathcal{S}}}\cdots {y}_{k-1,t}^{{\mathcal{S}}}\, \right. \\ \left.{y}_{k,t-{\ell }_{2}}^{{\mathcal{S}}}\cdots {y}_{k,t-1}^{{\mathcal{S}}}\,{z}_{k,1}\,{z}_{k,2}\,{z}_{k,6}\right].$$
(19)

The features in the vector to forecast wind generation (\({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{W}}}\)) are: \({\widehat{{\bf{w}}}}_{k,t}^{{\prime\prime} 60}=\delta ({\widehat{{\bf{w}}}}_{k,t}^{{\prime} 60})\), \({\widehat{{\bf{w}}}}_{k,t}^{{\prime\prime} 80}=\delta ({\widehat{{\bf{w}}}}_{k,t}^{{\prime} 80})\), \({\widehat{{\bf{w}}}}_{k,t}^{{\prime\prime} 100}=\delta ({\widehat{{\bf{w}}}}_{k,t}^{{\prime} 100})\), and \({\widehat{{\bf{w}}}}_{k,t}^{{\prime\prime} 120}=\delta ({\widehat{{\bf{w}}}}_{k,t}^{{\prime} 120})\), \({\widehat{{\bf{w}}}}_{k,t}^{{\prime\prime} 60},{\widehat{{\bf{w}}}}_{k,t}^{{\prime\prime} 80},{\widehat{{\bf{w}}}}_{k,t}^{{\prime\prime} 100},{\widehat{{\bf{w}}}}_{k,t}^{{\prime\prime} 120}\in {{\mathbb{R}}}^{{D}_{2}^{{\mathcal{W}}}}\). Similarly, the wind generation vector includes \({{\bf{y}}}_{k-{\ell }_{1},t}^{{\mathcal{W}}}\) observed wind energy generation from the 1 past days at the t time of the day, and 2 from the same day \({{\bf{y}}}_{k,t-{\ell }_{2}}^{{\mathcal{W}}}\). The vector has the same temporal features and auxiliary variables as the solar generation pattern \({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{S}}}\). The forecast vector for wind generation is

$$\begin{array}{l}{\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{W}}}=\left[{\widehat{w}}_{k,t,1}^{{\prime\prime} 60}\cdots {\widehat{w}}_{k,t,{D}_{2}^{{\mathcal{W}}}}^{{\prime\prime} 60}\,{\widehat{w}}_{k,t,1}^{{\prime\prime} 80}\cdots {\widehat{w}}_{k,t,{D}_{2}^{{\mathcal{W}}}}^{{\prime\prime} 80}\,{\widehat{w}}_{k,t,1}^{{\prime\prime} 100}\cdots {\widehat{w}}_{k,t,{D}_{2}^{{\mathcal{W}}}}^{{\prime\prime} 100}\right.\\ \left.{\widehat{w}}_{k,t,1}^{{\prime\prime} 120}\cdots {\widehat{w}}_{k,t,{D}_{2}^{{\mathcal{W}}}}^{{\prime\prime} 120}\,{y}_{k-{\ell }_{1},t}^{{\mathcal{W}}}\cdots {y}_{k-1,t}^{{\mathcal{W}}}\,{y}_{k,t-{\ell }_{2}}^{{\mathcal{W}}}\cdots {y}_{k,t-1}^{{\mathcal{W}}}\,{z}_{k,1}\,{z}_{k,2}\,{z}_{k,6}\right].\end{array}$$
(20)

The energy vector includes non-repeated features from the electricity demand \({\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{L}}}\), solar generation \({\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{S}}}\), and wind generation \({\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{W}}}\) vectors. The filter δ( ) applied to the joint energy features \({\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{X}}}=\{{\widehat{x}}_{k,t,d,z}^{{\mathcal{X}}}| | {\widehat{w}}_{d,z}^{{\mathcal{X}}}| > 0,\forall d=1,\ldots,{D}_{1}^{{\mathcal{X}}}\}\) selects weather features independently for electricity demand (\({\mathcal{L}}\)), solar generation (\({\mathcal{S}}\)) and wind generation (\({\mathcal{W}}\)), and does not sum the sparse learning coefficients \({\widehat{w}}_{d,z}^{{\mathcal{X}}}\) across nodes z (NP15, SP15, ZP26). A different energy feature vector \({\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{E}}}\) exists for each node. The nodal-level energy feature vectors are,

$${\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{E}}}=\left[{\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{L}}}\,{\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{S}}}\,{\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{W}}}\right].$$
(21)

The parameters 1 and 2 are set to 1 = 2 = 6 for all the proposed feature vectors (\({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{L}}}\), \({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{S}}}\), \({\widehat{{\bf{x}}}}_{k,t}^{{\mathcal{W}}}\), and \({\widehat{{\bf{x}}}}_{k,t,z}^{{\mathcal{E}}}\)).

Model chain

We propose a model based on a conditional one-output likelihood multi-task GPR formulation45,

$${y}_{k,t}={\varphi }^{\top }\left({\widehat{{\bf{x}}}}_{k,t}^{\star }\right){{\bf{w}}}_{\widehat{{\bf{x}}},t}+{{\bf{y}}}_{k,1:t-1}^{\top }{{\bf{w}}}_{{\bf{y}},t}+{\varepsilon }_{k,t},$$
(22)

where each solar generation forecasting horizon yk,t (task) is conditional to previous forecasting horizon yk,1:t−1 (tasks).

Scenario smoothing

The predictive probability density for a new sample (), contains predictions from previous models in the chain \(p({\widehat{y}}_{t,z}^{\star }| {{\mathcal{D}}}_{z},{\widehat{{\bf{x}}}}_{t,z}^{\star },{\widehat{y}}_{1,z}^{\star },\ldots {\widehat{y}}_{t-1,z}^{\star })\), but not previous predictions \({\widehat{{\bf{x}}}}_{1,z}^{\star },\ldots,{\widehat{{\bf{x}}}}_{t-1,z}^{\star }\) (as found in the Model Chain). Due to this partially independent assumption on the weather forecasts from previous time instants, the shape of the scenario is not completely realistic despite representing the full probabilistic density (i.e., VS is low). To overcome this limitation, we apply to predictive scenarios a 1-dimensional Gaussian convolutional kernel to smooth them and reduce oscillations introduced by discretizations,

$${\widetilde{y}}_{t,z,s}^{\star }=\mathop{\sum }\limits_{{t}^{{\prime} }=1}^{T}{\widehat{y}}_{{t}^{{\prime} },z,s}^{\star }\left(\frac{1}{\sqrt{2\pi {\sigma }_{z}}}\exp \left\{-\frac{{({t}^{{\prime} }-t)}^{2}}{2{\sigma }_{z}^{2}}\right\}\right),\forall t=1,\ldots,T,$$
(23)

where \({t}^{{\prime} }=1,\ldots,T\) is the scalar corresponding to hour. The convolutional kernel is applied to the scenarios s = 1, …, S generated from the predictive posteriors \({\widehat{y}}_{t,z,s}^{\star } \sim p({\widehat{y}}_{t,z}^{\star }| {{\mathcal{D}}}_{z},{\widehat{{\bf{x}}}}_{t,z}^{\star },{\widehat{y}}_{1,z}^{\star },\ldots {\widehat{y}}_{t-1,z}^{\star })\). The parameter σz is different for each energy feature (\({\mathcal{L}}\), \({\mathcal{S}}\) and \({\mathcal{W}}\)) and zone z, and requires cross-validation.

Predictive density calibration

The Bayesian methods predict the mean and covariance from the predictive density function. However, when validating confidence intervals, we found that the MTGPR does not produce a predictive covariance matrix under all circumstances. This is a common problem in other formulations of an MTGPR82,83. In particular, we observed a constant bias on the predictive covariance function on the MTGPR model when forecasting different energy features, electricity demand (\({\widehat{{\bf{y}}}}_{z}^{{\mathcal{L}}}\)), solar generation (\({\widehat{{\bf{y}}}}_{z}^{{\mathcal{S}}}\)) and wind generation (\({\widehat{{\bf{y}}}}_{z}^{{\mathcal{L}}}\)) from the same node (z = 1 for instance). The MTGPR model does not have this limitation when predicting the same energy feature at different nodes, for solar generation at NP15 (\({\widehat{{\bf{y}}}}_{z=1}^{{\mathcal{S}}}\)), SP15 (\({\widehat{{\bf{y}}}}_{z=2}^{{\mathcal{S}}}\)) and ZP26 (\({\widehat{{\bf{y}}}}_{z=3}^{{\mathcal{S}}}\)).

To unbias the predictive covariance matrix, we first calculate the true predictive covariance matrix Γk,t,z for a sample yk,t,z given that the predictive mean is \({\widehat{{\boldsymbol{\mu }}}}_{k,t,z}\),

$${{\boldsymbol{\Gamma }}}_{k,t,z}={\left({{\bf{y}}}_{k,t,z}-{\widehat{{\boldsymbol{\mu }}}}_{k,t,z}\right)}^{\top }\left({{\bf{y}}}_{k,t,z}-{\widehat{{\boldsymbol{\mu }}}}_{k,t,z}\right).$$
(24)

Then, we propose the following model to unbias the predictive covariance matrix \({\widehat{{\mathbf{\Sigma }}}}_{k,t,z}\),

$${{\boldsymbol{\gamma }}}_{k,t,z}= \, {{\bf{w}}}_{t,z}^{\top }{\widehat{{\bf{s}}}}_{k,t,z}+{{\bf{e}}}_{k,t,z}\\ {\widehat{{\bf{w}}}}_{t,z}= \, {\left({\widehat{{\bf{S}}}}_{t,z}^{\top }{\widehat{{\bf{S}}}}_{t,z}\right)}^{-1}{\widehat{{\bf{S}}}}_{t,z}^{\top }{{\boldsymbol{\Gamma }}}_{t,z}\\ {\widehat{{\boldsymbol{\gamma }}}}_{t,z}^{\star }= \, {\widehat{{\bf{w}}}}_{t,z}^{\top }{\widehat{{\bf{s}}}}_{t,z}^{\star }$$
(25)

where \({\widehat{{\bf{s}}}}_{k,t,z}=[\,{\rm{vec}}\,({\widehat{{\mathbf{\Sigma }}}}_{k,t,z})\,1]\), \({{\boldsymbol{\gamma }}}_{k,t,z}=\,{\rm{vec}}\,\left({{\boldsymbol{\Gamma }}}_{k,t,z}\right)\), and \({\widehat{{\boldsymbol{\gamma }}}}_{t,z}^{\star }\) is the unbiased covariance matrix in vector form for a new sample () at the hour t in the node z. The predictive probability distribution with the unbiased covariance matrix \({\widehat{{\boldsymbol{\Gamma }}}}_{t,z}^{\star }={{\rm{vec}}}^{-1}\left({\widehat{{\boldsymbol{\gamma }}}}_{t,z}^{\star }\right)\) is now \({\mathcal{N}}({\widehat{{\boldsymbol{\mu }}}}_{t,z}^{\star },{\widehat{{\boldsymbol{\Gamma }}}}_{t,z}^{\star }) \sim p({\widehat{{\bf{y}}}}_{t,z}^{\star }| {{\mathcal{D}}}_{z},{\widehat{{\bf{x}}}}_{t,z}^{\star },{\widehat{{\bf{y}}}}_{1,z}^{\star },\ldots {\widehat{{\bf{y}}}}_{t-1,z}^{\star })\). vec( ) represents the vectorization of a matrix, and vec−1( ) restores the vector to its original matrix shape.

Energy imbalance

The operational reserves are a statistic of the historical magnitude of the errors in the day-ahead forecast. In particular, the fraction Δup of up reserves is 6% of the day-ahead forecasted electricity demand at each time point in CAISO55, 50% must be spinning reserves. The system must have enough reserves to supply non-delivered contracted imports and guarantee the supply of contracted exports56. In addition, the current trend follows an increase in regulation down reserves equivalent to the regulation up plus spinning reserves combined, and an increase in non-spinning, so that regulation up, regulation down plus spinning reserves are approximately equal to the non-spinning12. Therefore, we assume a total estimation for the downward reserves is Δdown ≈ 0.12 (12%), and that the total upward and downward reserves are the same Δup = Δdown = Δ for simplicity.

The reserve levels are \({r}_{t}^{\,{\rm{CAISO}}\star }=({\Delta }^{{\rm{up}}}+{\Delta }^{{\rm{down}}})\cdot {\sum }_{z=1}^{Z}{\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{L}}\star }=2\Delta \cdot {\sum }_{z=1}^{Z}{\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{L}}\star }\). We want to find the z-interval ϱ that defines reserve levels \({r}_{t,z}^{\,{\rm{ML}}\,\star }=2\varrho \cdot {\sum }_{z=1}^{Z}({\widehat{\sigma }}_{t,z}^{{{\mathcal{L}}\star }^{2}}+{\widehat{\sigma }}_{t,z}^{{{\mathcal{S}}\star }^{2}}+{\widehat{\sigma }}_{t,z}^{{{\mathcal{W}}\star }^{2}})\) with the same capacity that \({r}_{t}^{\,{\rm{CAISO}}\,\star }\), but allocated according to the variance on the predictive posterior aggregated across nodes (assuming independence between \({\mathcal{L}}\), \({\mathcal{S}}\) and \({\mathcal{W}}\)). We propose to derive the z-interval ϱ from this equivalence,

$$\sum_{t=1}^{T}{r}_{t}^{\,{\rm{CAISO}}\,\star }= \mathop{\sum }\limits_{t=1}^{T}{r}_{t}^{\,{\rm{ML}}\,\star }\\ \varrho= \frac{2\Delta \cdot \sum_{t=1}^{T} \sum_{z=1}^{Z}{\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{L}}\star }}{2 \sum_{t=1}^{T} \sum_{z=1}^{Z}({\widehat{\sigma }}_{t,z}^{{{\mathcal{L}}\star }^{2}}+{\widehat{\sigma }}_{t,z}^{{{\mathcal{S}}\star }^{2}}+{\widehat{\sigma }}_{t,z}^{{{\mathcal{W}}\star }^{2}})}.$$
(26)

In this way, the operational reserves are the lower and upper \([{u}_{t}^{\star },{l}_{t}^{\star }]\) confidence bounds of the net demand \({\widehat{y}}^{{\mathcal{N}}}\star\) defined as \(\left[{\sum }_{z=1}^{Z}{\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{N}}\star }-0.5\cdot {r}_{t}^{\,{\rm{CAISO}}\,\star },{\sum }_{z=1}^{Z}{\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{N}}\star }+0.5\cdot {r}_{t}^{\,{\rm{CAISO}}\,\star }\right]\) forCAISO’s, and equivalently \(\left[{\sum }_{z=1}^{Z}{\widehat{\mu }}_{t,z}^{{\mathcal{N}}\star }-0.5\cdot {r}_{t}^{\,{\rm{ML}}\,\star },{\sum }_{z=1}^{Z}{\widehat{\mu }}_{t,z}^{{\mathcal{N}}\star }+0.5\cdot {r}_{t}^{\,{\rm{ML}}\,\star }\right]\) for the alternative approach based on a probabilistic forecast. The net demand is \({\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{N}}\star }={\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{L}}\star }-{\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{S}}\star }-{\widehat{y}}_{\,{\rm{CAISO}}\,,t,z}^{{\mathcal{W}}\star }\) for CAISO forecast and \({\widehat{\mu }}^{{\mathcal{L}}}={\widehat{\mu }}_{t,z}^{{\mathcal{L}}\star }-{\widehat{\mu }}_{t,z}^{{\mathcal{S}}\star }-{\widehat{\mu }}_{t,z}^{{\mathcal{W}}\star }\) for the probabilistic forecast.

Operators participate in the real-time imbalance markets to balance energy supply with demand. Assuming that a grid operator commits enough capacity to supply the lower confidence interval \({l}_{t}^{\star }\) and can regulate the dispatch up to \({u}_{t}^{\star }\), the operator will interact with the market to buy \({\iota }_{t}^{\star } > 0\) or curtail energy \({\iota }_{t}^{\star } < 0\) at each time t in response to the net demand \({y}_{t}^{{\mathcal{N}}\star }\) so that,

$${\iota }_{t}^{\star }=\left\{\begin{array}{ll}{y}_{t}^{{\mathcal{N}}\star }-{u}_{t}^{\star } & {y}_{t}^{{\mathcal{N}}\star } > {u}_{t}^{\star }\\ 0 & {l}_{t}^{\star }\le {y}_{t}^{{\mathcal{N}}\star }\le {u}_{t}^{\star }\\ {l}_{t}^{\star }-{y}_{t}^{{\mathcal{N}}\star } & {y}_{t}^{{\mathcal{N}}\star } < {l}_{t}^{\star }.\end{array}\right.$$
(27)

Therefore, if the net demand \({y}_{t}^{{\mathcal{N}}\star }\) is \({l}_{t}^{\star }\le {y}_{t}^{{\mathcal{N}}\star }\le {u}_{t}^{\star }\), then the system is balanced. If \({y}_{t}^{{\mathcal{N}}\star } > {u}_{t}^{\star }\), operators need to import energy from the imbalance market, and if \({y}_{t}^{{\mathcal{N}}\star } < {l}_{t}^{\star }\), an operator needs to curtail renewable energy (i.e., solar or wind generation).

Experimental setup

Data preprocessing

While the HRRR data is available from Sep. 30, 2014, the OASIS data structure is consistent only from Jun. 30, 2019. Thus, our dataset includes days from Jun. 30, 2019, to Feb. 11, 2023. 17% of the samples (184 days) have at least a missing entry (an hour) either in the HRRR or OASIS database. We excluded these samples from the datasets (\({\mathcal{A}}\) and \({\mathcal{F}}\)). After reducing the resolution (as found in the Processing and Filtering), each weather feature has 108 × 88 spatial dimensions (9,152).

The weather vectors from the HRRR include 5 weather features for electricity demand, 3 for solar generation, and 4 for wind generation. Therefore, the feature vector for electricity demand has 5 × 108 × 88 dimensions (47,520), the vector for solar generation has 3 × 108 × 88 dimensions (27,456), the one for wind generation has 4 × 108 × 88 dimensions (38,016), and node-level has 11 × 108 × 88 dimensions (104,544) before applying the spatial filtering. The covariates and dependent variable are standardized (see Section Data Structure), so that \({\bar{y}}_{i,t}=\left({y}_{i,t}-{\mathbb{E}}\left[{{\bf{y}}}_{t}\right]\right)/{{\mathbb{V}}}^{1/2}\left[{{\bf{y}}}_{t}\right]\), and \({\bar{x}}_{i,j}=({x}_{i,j}-{\mathbb{E}}[{{\bf{x}}}_{j}])/{{\mathbb{V}}}^{1/2}[{{\bf{x}}}_{j}]\), which implies \({\sum }_{i=1}^{N}{\bar{y}}_{i,t}={\sum }_{i=1}^{N}{\bar{x}}_{i,j}=0\), and \({\sum }_{i=1}^{N}{\bar{y}}_{i,t}^{2}={\sum }_{i=1}^{N}{\bar{x}}_{i,j}^{2}=N\).

Validation, training, and testing

The datasets \({\mathcal{A}}\) and \({\mathcal{F}}\) were divided into training (75%, or 729 days) and testing (25%, or 243 days) sets, maintaining the original time structure of the data (i.e., without random sampling). The hyperparameter’s cross-validation was performed on the training set, implementing a k-fold cross-validation method. The number of folds was set to k = 5 to limit the computational time. After the cross-validation, the model is trained using the entire training set with a fixed set of optimal hyperparameters. The optimal hyperparameter selection criterion (i.e., model selection) is the lowest ES in cross-validation. The performances are evaluated on the testing set. This investigation implements four multivariate proper scoring rules to evaluate different characteristics of a probabilistic forecast: Energy Score (ES), Variogram Score with p = 0.5 (VS0.5), and Interval Score (IS)47.

The proper scoring rules are evaluated in the solar hours of the day to avoid numerical problems with non-zero entries (8 am to 4 pm). In the case of the MTGPR with all energy features (electricity demand, solar generation, and wind generation), the proper scoring rules are evaluated at three different time intervals to avoid this problem: midnight-sunrise (0 am to 7 am), daylight (8 am to 4 pm), and sunset-midnight (5 pm to 11 pm). We evaluate the electricity demand and wind generation forecast in the midnight-sunrise and sunset-midnight periods, and the electricity demand, solar generation, and wind generation forecast during daylight. The scores of each period are added together to obtain the final score used for the evaluation.

Hyperparameters

The sparse learning methods have hyperparameters that require cross-validation. Lasso hyperparameter λ, swept between 10−4 and 10, in equation (2) adjusts a trade-off in the L1-norm. β, swept between 10 and 640, in equation (3), defines the number of coefficients. The hyperparameters in Elastic Net that define the trade-off between the L1-norm and L2-norm in equation (4) are cross-validated as Ω1 = κρ and Ω2 = (1 − κ)ρ, κ is swept between 10−4 and 1, and ρ between 0.01 and 0.75. The hyperparameters in equation (5) set the trade-off between the L1-norm and the group L2-norm for Group Lasso, which are defined as ξ2 = (1 − η)χ and ξ1 = ηχ, χ is swept between 1 and 100, η between 0.25 and 1.

The Bayesian learning methods also have hyperparameters that require cross-validation. We cross-validate the ARD hyperparameter γ0 between 10−4 and 100; as found in Relevance Vector Machine (RVM), and the kernel functions in the GPR and MTGPR (Supplementary Note 2): Linear (L), Rational Quadratic (RQ), and Matérn (M0.5, M1.5, and M2.5). The hyperparameters in BLR are initialized to αn = βn = αp = βp = 10−6 (non-informative) and they do not require cross-validation; as found in Bayesian Linear Regression (BLR). The BLR formulation is not equivalent to a GPR with a linear kernel (\({{\mathcal{K}}}_{L}\)). \({{\mathcal{K}}}_{L}\) has amplitude θ1 and bias θ2 hyperparameters (Supplementary Note 2).

The criterion for selecting the hyperparameters shown in the results of this article is the lowest ES in validation. When the criterion is lower VS0.5 or IS, the selected models may be different, see Supplementary Figs. 11-16.

Predictive scenarios smoothing and density calibration

The approaches proposed to smooth the generated scenarios (see Section Scenario Smoothing) have a parameter σz in equation (23). This parameter requires cross-validation for each node z and energy feature (\({\mathcal{L}}\), \({\mathcal{S}}\), and \({\mathcal{W}}\)). The testing set in the cross-validation is used to fine-tune σz, sweeping values from 0 to 1.25. The shape of the scenarios is evaluated with the ES and the VS0.5, and the smaller σz parameter with lower ES or VS0.5 is selected. The optimal σz is the average across the optimal values obtained for each 5-fold interaction.

The calibration of the predictive covariance in an MTGPR with all energy features requires estimating the parameters \({\widehat{{\bf{w}}}}_{t,z}\) for each hour t at each node z; see equation (25). In each interaction of the cross-validation, the testing set is used to find the optimal parameters \({\widehat{{\bf{w}}}}_{t,z}\). The optimal \({\widehat{{\bf{w}}}}_{t,z}\) is the average across the optimal values obtained for each 5-fold interaction. This approach is based on the conformal learning literature51,52.

The forecasting models are evaluated without smoothing or calibration to avoid overfitting issues. Only testing scores are calculated with smoothed scenarios and calibrated predictive covariance. A potentially better praxis is to split the validation set into training, calibration, and testing. However, we do not have enough observations to implement this approach.

Computing resources

The experiments were performed in POD, a cluster computer maintained by the Center for Scientific Computing (CSC) at the UC Santa Barbara. POD has 70 nodes with a Dual Intel Xeon Gold 6148 Processor at 2.40 GHz. Each node has 20 CPUs, 40 threads, and 187 GB of RAM.