Introduction

Accurate forecasting of electrical load is a fundamental requirement for effective power system operation, planning, and reliability management. As global electricity consumption continues to rise driven by economic growth, increasing electrification, and climate-induced variability the need for reliable and interpretable long-term load forecasts has become increasingly critical1,2,3. Long-term forecasting is inherently challenging because load behavior is shaped by diverse and dynamic factors, including seasonal patterns, economic conditions, weather fluctuations, and evolving consumer behavior4,5. These complexities introduce substantial uncertainty that traditional point-based forecasting methods fail to adequately address.

Existing forecasting approaches, ranging from classical econometric models to advanced machine learning (ML) and deep learning (DL) architectures, have achieved notable progress in improving predictive accuracy. Classical time-series models such as ARIMA and regression-based techniques offer interpretability but struggle with nonlinear or rapidly changing load patterns. ML methods including support vector regression, gradient boosting, and tree-based ensembles provide increased nonlinear modeling capability, yet they continue to produce deterministic forecasts that do not quantify uncertainty. DL models, including LSTM networks, CNNs, transformers, and hybrid spatiotemporal architectures, have further advanced forecasting performance by capturing long-term dependencies, multiscale temporal relationships, and spatial correlations6,7,8. Recent innovations include geometric loss-based multi-energy forecasting9, clustering-enhanced quantile LSTM models10, graph-attention transformers for spatiotemporal residential load forecasting11, pyramidal graph structures for multiresolution load representation12, and Bayesian transformer frameworks for uncertainty decomposition across multiple energy carriers13. Hybrid models that integrate DL, fuzzy logic, ensemble learning, or reinforcement learning have demonstrated improved robustness to noise and irregular consumption patterns14,15.

Despite these advancements, several limitations remain. First, probabilistic forecasting often relies on restrictive distributional assumptions that may not hold in practice and can produce outputs that are difficult for non-experts to interpret16,17. Second, advanced DL architectures frequently impose high computational costs, complicating real-time deployment in large-scale systems. Third, model interpretability remains limited, posing challenges for regulatory acceptance and operational trust in critical energy infrastructure. Finally, the generalizability of these models across diverse climatic conditions, consumption patterns, and geographic regions remains insufficiently validated, particularly for long-term forecasting tasks.

These limitations underscore the need for forecasting methodologies that are uncertainty-aware, interpretable, and computationally efficient, while remaining adaptable to multiple temporal horizons and heterogeneous load dynamics.

To address these challenges, this study introduces a novel interval-based forecasting framework grounded in the principles of granular computing18. Rather than generating single-point predictions, the proposed methodology produces data-driven uncertainty intervals centered around a typical forecast value. At the core of this approach is a justification criterion that balances two inherently conflicting objectives: coverage, which measures the proportion of actual values contained within the predicted interval, and specificity, which quantifies the sharpness or informativeness of the interval19. By optimizing this trade-off, the model constructs intervals that avoid being excessively wide or unrealistically narrow20,21. The idea of justifiable granularity offers a structured method for creating intervals that are both supported by data and practically useful for decision-making22. This principle has proven effective in clustering20,21 and classification tasks20,23, where choosing the right level of granularity directly affects performance and interpretability. Its adaptive nature is especially valuable in multiscale forecasting, where uncertainty patterns vary significantly across daily, weekly, and monthly timeframes. In contrast to classical forecasting methods (Table 1), the proposed approach explicitly models prediction uncertainty through interval-valued granules. The main advantages include:

  • Quantified confidence: Each forecast is accompanied by an interpretable uncertainty bound.

  • Adaptability: Interval widths adjust dynamically based on local variability in the data.

  • Distribution-free design: The method does not rely on any assumed probability distribution.

Table 1 Comparison between classical and granule-based forecasting approaches.

To validate the proposed approach, real world electricity load data from the Sultanate of Oman for the years 2020-2022 are employed. This dataset presents a rich testing environment characterized by climatic variability, industrial load fluctuations, and strong seasonal demand patterns. Model generalizability is assessed using unseen data from 2023. Additionally, an interval overlap robustness analysis is conducted to evaluate the stability of the generated intervals across multiple years and temporal resolutions, offering deeper insight into the consistency and reliability of the proposed forecasting framework24.

Beyond its methodological contributions, the interval driven framework provides practical operational value. By expressing future load uncertainty in an interpretable manner, the resulting intervals can directly support grid operators in unit commitment, reserve allocation, and electricity market operations, enabling more risk aware and transparent decision making25.

This work proposes a data driven interval forecasting methodology based on justifiable granularity that produces interpretable, uncertainty aware predictions across multiple temporal scales. The key contributions of this study are as follows:

  • Developing an interval based forecasting methodology that generates uncertainty aware forecasts rather than single point predictions.

  • Introducing a justification based mechanism rooted in granular computing to construct adaptive and data supported intervals.

  • Incorporating an interval overlap robustness analysis to assess the stability and reliability of the generated intervals.

  • Validating the framework using three years of load data from Oman (2020-2022) and assessing generalizability using unseen 2023 data.

The remainder of this paper is organized as follows. Section "Proposed methodology: interval based load forecasting using granular computing" describes the proposed interval based modeling framework, including the justification mechanism. Section "Results and discussion" presents the case study, implementation details, and experimental evaluation. Section "Conclusions and future directions" concludes the paper and discusses potential directions for future research.

Proposed methodology: interval based load forecasting using granular computing

Forecasting long term energy demand is inherently challenging due to unpredictable variations in environmental, socio-economic, and operational factors. To address this uncertainty, an interval forecasting approach is proposed based on granular computing. Unlike traditional models that generate single point predictions, the proposed method constructs informative prediction intervals that not only indicate expected values but also quantify the associated uncertainty. This section explains how these intervals are formed, how they are optimized, and how they are applied to real world energy load data.

Granular computing is a paradigm that represents knowledge through granules groups of elements that are similar, functionally related, or close in proximity. In the context of forecasting, each granule corresponds to an interval prediction centered around an estimate of the target variable. For a given time step \(t_i\), the prediction granule is expressed as:

$$G_i = [L_i, U_i] = [\hat{y}_i - \delta _{iL}, \hat{y}_i + \delta _{iU}]$$

Here, \(\hat{y}_i\) denotes the central prediction (such as the mean or median), and \(\delta _i\) defines the half width of the interval, capturing the uncertainty associated with the prediction.

When analyzing complex data like electric load over time, looking at the big picture alone can sometimes hide important details. Breaking the data down into intervals, or “granules,” helps in identifying local patterns and changes that might be overlooked with a simple overall summary.

Using granularity offers several benefits:

  • Making the data easier to understand by focusing on meaningful chunks instead of overwhelming details.

  • Reducing noise and the effect of outliers, thus providing more reliable insights.

  • Achieving a balance between too much detail, which can be confusing, and too little, which can overlook important trends.

  • Allowing the use of interval based measures such as coverage and specificity to better evaluate how well the data and predictions represent reality.

Justifiable granularity and interval optimization

The principle of justifiable granularity19 provides a systematic mechanism for constructing interval valued information granules that balance reliability and informativeness. The key objective is to generate intervals that are sufficiently wide to cover the underlying data while remaining narrow enough to convey meaningful information. This balance is achieved through the optimization of a justification index that integrates both coverage and specificity.

Granule definition and interval metrics

Consider a set of M granules

$$\{G_i = [a_i, b_i]\}_{i=1}^{M},$$

where each granule \(G_i\) contains a local set of data points

$$X_i = \{x_{i1}, x_{i2}, \ldots , x_{in_i}\}.$$

For each candidate interval, we define the following metrics:

$$\begin{aligned} \text {Coverage}_i(a_i, b_i)&= \frac{1}{n_i} \sum _{j=1}^{n_i} \textbf{1}\{x_{ij} \in [a_i, b_i]\}, \end{aligned}$$
(1)
$$\begin{aligned} \text {Specificity}_i(a_i, b_i)&= 1 - \frac{b_i - a_i}{\text {Range}}, \end{aligned}$$
(2)
$$\begin{aligned} \text {justifiability index}_i(a_i, b_i)&= \text {Coverage}_i(a_i, b_i) \times \text {Specificity}_i(a_i, b_i). \end{aligned}$$
(3)

Here, Range is a global normalization factor (e.g. \(\max (x)-\min (x)\)), and the indicator \(\textbf{1}\{\cdot \}\) evaluates whether a point lies inside the interval.

Joint optimization of coverage and specificity

The optimal boundaries \((a_i^*, b_i^*)\) are obtained by maximizing the justification index:

$$\begin{aligned} (a_i^*, b_i^*) = \arg \max _{a_i \le b_i} \Bigg [ \frac{\#\{x_{ij} \in [a_i, b_i]\}}{n_i} \cdot \left( 1 - \frac{b_i - a_i}{\text {Range}}\right) \Bigg ]. \end{aligned}$$
(4)

This optimization enforces two key constraints:

  1. 1.

    Boundary validity: \(a_i \le b_i\) must always hold.

  2. 2.

    Data consistency: interval boundaries must lie within the global data range, i.e.,

    $$a_i, b_i \in [\min (x), \max (x)].$$

The product in (4) ensures that increasing coverage (favoring wider intervals) must be balanced with increasing specificity (favoring narrower intervals). The optimal interval is therefore the one where both terms are jointly maximized, avoiding overly wide or overly narrow solutions.

For each granule, the algorithm evaluates all candidate boundaries generated from the local data. Since each boundary is selected from the \(n_i\) data points, the search space is:

$$\mathcal {O}(n_i^2) \quad \text {per granule}.$$

For M granules, the worst-case complexity is:

$$\mathcal {O}\left( \sum _{i=1}^{M} n_i^2\right) .$$

In practice, the granules are small because they correspond to natural temporal partitions (daily, weekly, monthly), making the approach computationally tractable even for multi year datasets.

To enhance clarity and reproducibility, the interval construction procedure is summarized below.

Algorithm
Algorithm
Full size image

Interval optimization using justifiable granularity.

Once all intervals are optimized, aggregate metrics are computed:

$$\begin{aligned} \text {Total Coverage}&= \frac{1}{M} \sum _{i=1}^{M} \text {Coverage}_i, \end{aligned}$$
(5)
$$\begin{aligned} \text {Total Specificity}&= \frac{1}{M} \sum _{i=1}^{M} \text {Specificity}_i, \end{aligned}$$
(6)
$$\begin{aligned} \text {Total justifiability index}&= \frac{1}{M} \sum _{i=1}^{M} \text {justifiability index}_i. \end{aligned}$$
(7)

To assess forecasting robustness, interval overlap is measured between predicted and true intervals. Let \([y_i^L, y_i^U]\) denote the true interval and \([\hat{y}_i^L, \hat{y}_i^U]\) the predicted interval:

$$\begin{aligned} \text {Overlap}_i = \frac{ \max \bigl (0, \min (y_i^U, \hat{y}_i^U) - \max (y_i^L, \hat{y}_i^L)\bigr ) }{ \max (y_i^U, \hat{y}_i^U) - \min (y_i^L, \hat{y}_i^L) } \times 100\%. \end{aligned}$$
(8)

The average overlap across all N intervals is:

$$\begin{aligned} \text {Average Overlap} = \frac{1}{N} \sum _{i=1}^{N} \text {Overlap}_i. \end{aligned}$$
(9)

A high overlap indicates stable and reliable interval estimation across temporal scales. The metrics measure in Eq. (8) indicate agreement between true and predicted intervals as a percentage, with 100% for a perfect match and 0% for no intersection. It is computed for each time period and summarized by the average to show overall alignment. However, it does not distinguish between small shifts and large deviations, as both yield zero overlap, reflecting the limitations of percentage–based overlap metrics. While intuitive for overall assessment, it may understate differences in mismatch severity.

The proposed interval forecasting framework shifts the focus from point accuracy to reliability aware prediction. The method provides interpretable and data driven uncertainty quantification. This framework complements existing ML and DL forecasting techniques and offers actionable insights for energy system planning and operation.

To effectively capture uncertainty in time series data such as electricity load, a granule based forecasting framework that constructs interval valued predictions is adopted. These intervals are assessed using three core metrics: coverage, specificity, and their combined trade off justifiable granularity. The coverage and specificity are inherently opposing metrics: increasing the interval length improves coverage but reduces specificity, while narrowing the interval increases specificity but reduces coverage. The justifiability index balances these two metrics and determines the optimal interval width. Importantly, outliers will never be included in the justifiable interval, as including extreme values would force the interval to expand excessively, driving specificity toward zero and lowering the overall justification score. The method begins by dividing the data into local granules (intervals) and evaluating how well each interval includes the corresponding data points (coverage), how narrow the interval is (specificity), and the product of the two (justifiability index). The granules are then optimized to balance informativeness and reliability. Furthermore, when true and predicted intervals are both available, the degree of agreement between them is quantified using an overlap based robustness analysis.

Results and discussion

In this section, the results of applying both traditional ML models and advanced DL techniques to short and medium term electric load forecasting are presented. The performance of these models is compared on both the training and testing datasets.

To evaluate the performance, commonly accepted metrics like RMSE, MAPE and \(R^2\) are adopted. The results reveal shed light on the strengths and weaknesses of both classical and DL approaches.

The dataset used in this study consists of electric load measurements from the Omani Main Interconnected System, recorded at 30-minute intervals over a four-year period from 2020 to 2023. This extensive dataset provides a comprehensive view of the system’s load behavior across different seasons and operating conditions, making it well-suited for detailed analysis and modeling. Figure 1a–d depict the annual load patterns, highlighting both the overall consistency and the minor variations observed in certain months across different years.

Fig. 1
Fig. 1
Full size image

Annual electric load patterns for 2020–2023.

One common approach for forecasting a specific data point is to use a fixed number of recent past values. Table 2 presents a performance comparison of ML, DL, and classical time series models for short term forecasting, using the five most recent data points to predict the next point (i.e. lag = 5, output = 1).

Overall, all models demonstrate relatively good performance. However, classical time series models such as ARIMA and SARIMA exhibit the weakest results, confirming that ML models are better suited for capturing the nonlinear patterns present in the data. Among the DL models, LSTM architecture perform competitively, highlighting its potential when trained with sufficient data and proper tuning.

Table 2 Performance comparison of ML, DL, and time series models (lag = 5, output = 1).

Table 3 presents the performance of various models across different forecast horizons. The forecast horizon significantly impacts model accuracy. For one step ahead predictions, all models show substantial improvement, with boosted ensembles and LSTM delivering the best results. When the forecast horizon extends to 48 steps (approximately two days), performance generally declines, as indicated by lower \(R^2\) values and higher error metrics. Nevertheless, Linear Regression and KNN continue to offer relatively stable results, suggesting that simpler models may be more robust for medium range forecasting within the same dataset.

At very long horizons such as 336 steps (around two weeks) most models struggle considerably. Only LSTM and Transformer models exhibit some degree of resilience, while traditional models like ARIMA and SARIMA perform poorly, reaffirming their limitations in capturing nonlinear patterns over extended periods.

It is worth noting that model performance continues to degrade as the number of forecasted time points increases, especially when attempting to predict an entire year. This highlights the inherent difficulty of long term forecasting in complex, nonlinear time series.

Table 3 Model performance comparison with varying forecast horizons.

Another approach for forecasting a specific future data point is to use similar data points from historical data. When using three data points from previous years as inputs to predict a future value, models generally exhibit higher errors compared to using recent lagged inputs, as shown in Table 4, indicating that older data, while informative, may not capture recent trends well. Linear Regression and Bagging methods provide a balanced trade off between training and testing errors, suggesting reasonable generalization. The zero training error reported for Decision Tree implies perfect fitting of training data but reduced test performance, indicating overfitting. Neural networks do not outperform traditional regression here, possibly due to limited feature complexity or insufficient data to train deeper models effectively.

Table 4 Performance comparison of models using 3 inputs from previous years.

These results emphasize the importance of choosing the right model based on forecast horizon and input features. Simpler models can sometimes outperform complex ones depending on the forecasting context and dataset characteristics. DL models shine particularly for short term forecasting but require careful tuning.

Another approach for predicting load profiles is proposed based on the granular interval method discussed in Sect. 3.4. Figure 3 illustrates the load interval variations on a daily, weekly, and monthly basis for the years 2020 through 2023. Three interval construction methods are employed in this analysis: the minimum-maximum range, the interquartile range (25th-75th percentile), and the mean ± one standard deviation. A consistent trend of increasing stability and predictability is observed over the years, particularly when transitioning from daily to weekly intervals. As anticipated, daily intervals exhibit higher variability due to the finer temporal resolution, whereas weekly intervals present smoother trends, underscoring the benefits of temporal aggregation in enhancing forecast stability. Notably, the interval widths vary depending on the chosen time window, raising an important consideration regarding the optimal interval width for accurately representing the load profile. Furthermore, the requirements of the forecasting task itself must be taken into account. In certain cases, narrower prediction intervals may not present significant limitations, as the aggregation of predictions over extended periods can yield more informative insights for operational planning and maintenance scheduling.

To determine the appropriate type of interval, the impact of interval length on coverage, specificity, and justification is analyzed, as illustrated in Fig. 2. As the interval length increases, coverage generally improves, while specificity declines. Consequently, the optimal interval length can be identified by maximizing the justification metric, defined as the product of coverage and specificity. Therefore, directly applying the previously constructed intervals shown in Fig. 3 would not necessarily yield maximum justification. Instead, applying the granular interval approach allows for the generation of daily, weekly, and monthly load profiles with optimized and justifiable intervals centered around the median load values, as shown in Fig. 4. Figure 4(b) presents the interval analysis for the combined data from 2020 to 2022 across daily, weekly, and monthly resolutions, which will serve as the predicted interval load profile for 2023, based on the consistent patterns observed throughout the year. The granular interval plots for weekdays and weekends, reveal no significant differences between weekday and weekend load profiles, with both exhibiting closely aligned consumption patterns. This observation suggests that, for this dataset, modeling weekdays and weekends together is sufficient, as their load profiles are largely similar.

Figures 5, 6 and 7 depict the coverage, specificity, justifiability index (v), and overlap (p) for daily, weekly, and monthly granular intervals, respectively. Across all granularities, coverage tends to increase from daily to monthly intervals, reflecting improved model confidence with more aggregated data. Specificity remains relatively stable across granularities, suggesting consistent model permance across all intervals. The justifiability index (v) and overlap (p) further support the conclusion that longer intervals yield better predictive performance, with monthly intervals showing the highest values for these indices. This conclusion indicates that the model’s predictions are more reliable and less ambiguous at coarser granularities. Table 5 quantitatively summarizes these trends, showing an increase in coverage from 0.31 for daily to 0.53 for monthly intervals in the combined dataset, with similar trends for the 2023 subset. The overlap metric also improves, reaching 0.57 for monthly intervals in 2023, demonstrating better alignment between predicted and actual loads. The overlap plots in Fig. 8 demonstrates the degree to which the predicted load intervals coincide with the actual observed values. The improvement in overlap from daily to monthly intervals further confirms the enhanced prediction accuracy achieved through data aggregation. The consistent trend across years indicates that the model’s generalization capabilities remain stable over time. Although the justifiability index v for the daily interval is relatively low, the prediction intervals still overlap with the true intervals, making this form of quantification suitable and informative for planning and maintenance purposes.

Furthermore, Table 6 presents the combined performance indices of load prediction across different time intervals, ranging from one to 18 hours, and various aggregation levels. As expected from previous results, coverage and overlap tend to decrease as the time interval shortens, while specificity remains almost stable.

The analysis confirms that load prediction models benefit significantly from temporal aggregation, with monthly intervals providing the most reliable and interpretable forecasts. Justifiability and overlap indices offer a multi-faceted evaluation of model performance, allowing for more informed interpretation of prediction quality. It should be noted that the performance metrics observed for the combined three year intervals are influenced by the natural variability across the years. The lower metric values are not a limitation of the method but rather a direct consequence of optimizing interval boundaries under multi year fluctuations. In contrast, constructing intervals using a single year of data would typically yield higher coverage and specificity, since the intervals would fit a more homogeneous distribution. However, such intervals would have weaker generalization capability and may not remain valid when applied to data from other years.

Fig. 2
Fig. 2
Full size image

Coverage, specificity, and justifiability index vs length of the interval.

Fig. 3
Fig. 3
Full size image

Granular interval representations daily, weekly, and monthly (from left to right) for the years (a) 2020 and (b) 2023.

Fig. 4
Fig. 4
Full size image

Granular interval representations daily, weekly, and monthly (from left to right) for the years (a) 2020, (b) combined 2020-2023, and (c) 2023.

Fig. 5
Fig. 5
Full size image

Performance index of daily granular intervals (a) coverage (b) specificity (c) v (d) p.

Fig. 6
Fig. 6
Full size image

Performance index of weekly granular intervals (a) coverage (b) specificity (c) v (d) p.

Fig. 7
Fig. 7
Full size image

Performance index of monthly granular intervals (a) coverage (b) specificity (c) v (d) p.

Fig. 8
Fig. 8
Full size image

Overlap area of combined and 2023 loads (a) daily, (b) Weekly, and (c) Monthly.

Table 5 Performance indices of load prediction.
Table 6 Combined performance indices of load prediction for different time intervals and aggregation levels.

To complement the granular evaluation framework, we conducted additional experiments using representative interval forecasting methods, including Quantile Regression (QR), Conformal Prediction, LSTM, and multi layer neural models. Unlike these models, which generate prediction intervals by extrapolating from recent input values, the proposed granular approach constructs intervals directly from the training set and performs daily, weekly, and monthly forecasting by intersecting information granules. This yields a conceptually different evaluation mechanism, where interval validity is assessed based on granule-granule overlap rather than pointwise prediction. For a fair comparison, the probabilistic baselines were evaluated using standard calibration metrics Prediction Interval Coverage Probability (PICP), Mean Prediction Interval Width (MPIW), and the Winkler Score as shown in Table 7.

Table 7 Interval approaches result, lag = 48.

Conclusions and future directions

In this paper, the challenges of long-term electric load forecasting are addressed by moving beyond traditional point predictions and introducing an interval based modeling framework grounded in granular computing. Beyond their methodological contribution, the justifiable intervals developed here offer practical value for power system operators. These interval based predictions balance coverage and specificity, serving as reliable uncertainty envelopes for operational decisions. Grid operators can use the predicted bounds to support unit commitment, ensuring flexible scheduling for demand or renewable output variations. The intervals also guide reserve capacity planning by quantifying the load range, helping operators allocate reserves more effectively. Integrating this uncertainty into operational planning enhances system reliability and reduces the risk of relying solely on point forecasts.

Using historical load data from the Sultanate of Oman between 2020 and 2022, intervals at daily, weekly, and monthly resolutions are generated and validated on unseen data from 2023. The overlap analysis demonstrated a strong alignment between predicted intervals and actual loads, highlighting the model’s robustness and generalizability across different time scales. The visualizations further emphasize how these intervals can offer practical insights by clearly expressing forecast uncertainty.

Future research could focus on integrating external factors like weather, economic trends, or policies to improve interval accuracy. Developing adaptive methods that respond to changing data patterns may boost reliability. Applying this approach to domains like renewable energy and grid management could reveal broader benefits and stability insights.