Shapley value-driven superior subset selection algorithm for carbon price interval forecast combination

Yang, Jingling; Chen, Liren; Chen, Huayou

doi:10.1038/s41598-025-90006-2

Download PDF

Article
Open access
Published: 27 February 2025

Shapley value-driven superior subset selection algorithm for carbon price interval forecast combination

Jingling Yang¹,
Liren Chen² &
Huayou Chen¹

Scientific Reports volume 15, Article number: 7087 (2025) Cite this article

1617 Accesses
1 Citations
Metrics details

Subjects

Abstract

Interval prediction requires not only accuracy but also the consideration of interval width and coverage, making model selection complex. However, research rarely addresses this challenge in interval combination forecasting. To address this issue, this study introduces a model selection for interval forecast combination based on the Shapley value (MSIFC–SV). This algorithm calculates Shapley values to measure each model’s marginal contribution and establishes a redundancy criterion on the basis of changes in interval scores. If the removal of a model does not decrease the interval score, it is considered redundant and excluded. The selection process starts with all the models and ranks them by their Shapley values. Models are then assessed for retention or removal according to the redundancy criterion, which continues until all redundant models are excluded. The remaining subset is used to generate interval forecast combinations through interval Bayesian weighting. Empirical analysis of carbon price shows that MSIFC–SV outperforms individual models and derived subsets across metrics such as prediction interval coverage probability (PICP), mean prediction interval width (MPIW), coverage width criterion (CWC), and interval score (IS). Comparisons with benchmark methods further demonstrate the superiority of MSIFC–SV. Furthermore, MSIFC–SV is also successfully extended to the public dataset-housing price dataset, this indicates its universality. In summary, MSIFC–SV provides reliable model selection and delivers high-quality interval forecasts.

Smart system for forecasting financial outcomes and supporting strategic choices

Article Open access 23 November 2025

Paying attention to distortion: improving the accuracy of multistep-ahead carbon price forecasting with shape and temporal criteria

Article Open access 12 June 2025

Interfacial-active green surfactants from sugarcane bagasse and its biochar for enhanced oil recovery

Article Open access 06 January 2026

Introduction

Combined forecasting is widely recognized as an effective method to increase prediction accuracy. By leveraging the strengths of multiple models, combined forecasts often outperform individual models^1,2,3,4,5. However, the success of combined forecasting depends heavily on the quality of the selected models. Not all combined forecasts exceed the performance of individual models, and including redundant models can reduce accuracy⁶. Therefore, eliminating redundant models is crucial for improving the performance of combined forecasts^6,7. Most existing model selection methods focus on point forecasts, utilizing techniques such as the Akaike information criterion (AIC)⁸, the Bayesian information criterion (BIC)^9,10, encompassing tests^1,7, dynamic model selection⁴, two-stage procedures¹¹, Harvey‒Leybourne‒Newbold (HLN) tests^2,12, mutual information¹³, redundancy-removing linear mutual information model selection (RRLS)¹⁴, and comprehensive evaluation index (CEI) methods^{15,16,17,18,19}. While these methods have been successful in point forecasting, the inherent uncertainty in predictions makes errors inevitable.

Interval forecasting directly quantifies prediction uncertainty by constructing prediction intervals, providing a range within which future values are likely to fall, thus reducing the risk of prediction failure. Current interval forecasting methods can be broadly categorized into two types: those based on statistical assumptions, such as the delta method²⁰, Bayesian methods^21,22, mean-variance estimation (MVE)^23,24, bootstrapping^25,26, Gaussian processes²⁷, and distribution estimation (DE)²⁸; and those that directly generate prediction intervals, such as quantile regression (QR)^29,30,31 and the lower upper bound estimation (LUBE) method³². Although these methods are theoretically sound and widely used, individual forecasting models are often constrained by their assumptions, limiting their ability to capture data complexity fully. In contrast, interval combination models, which integrate the strengths of multiple models, generally offer more accurate predictions.

In recent years, Wu et al. (2021) proposed a method that combines heterogeneous prediction intervals via continuous Young optimal weighted arithmetic averaging (C-YOWA) and geometric (C-YOWG) operators, with the Theil index as the optimization criterion³³. Wang et al. (2023) enhanced interval forecast combination by adopting a ‘divide-and-conquer’ approach³⁴. Meira et al. (2021) suggested eliminating underperforming models before applying the AIC for model selection³⁵, whereas Long et al. (2022) developed an interval forecasting strategy based on linear regression³⁶. Although these studies emphasize model diversity, they often overlook redundancy and interactions between models, limiting further improvements in forecast performance. Thus, the key challenge for enhancing interval forecast performance lies in maintaining model diversity while removing redundant models and accounting for their interactions. Furthermore, interval forecasting must balance not only prediction accuracy but also interval width and coverage, which complicates the selection of suitable models for combination. Existing model selection methods are focused primarily on point forecasts and rarely address the selection of models for interval forecasting. Moreover, interval combinations often assign equal weights to models, which may not reflect each model’s actual contribution. Weight allocation significantly impacts the performance of ensemble forecasting, making it crucial to assign weights that fairly represent each model’s contribution to improve the accuracy of interval combination forecasts.

The Shapley value, a concept from cooperative game theory, quantifies each member’s marginal contribution to a coalition, ensuring a fair distribution of benefits³⁷. Chen (2003) was the first to apply the Shapley value in combined forecasting to determine model weights, treating each predictive model as a participant in a cooperative game³⁸. Subsequent studies have further validated the effectiveness of the Shapley value in optimizing weight allocation. By not only quantifying each model’s contribution but also accounting for interactions between models naturally, the Shapley value ensures that the collective contribution of multiple models is greater than or equal to the sum of their individual contributions. Ding et al. (2023)³⁹ applied the Shapley value for single-item model selection in portfolio modeling, demonstrating its validity, superiority, and robustness in this context. However, their approach focuses on point prediction model selection and employs a simple average weighting method, without considering interval-based model selection or performance-driven weighting strategies.

To address the challenges of model selection in interval combination forecasting and enhance prediction accuracy, this paper proposes a model selection for interval forecast combination based on the Shapley value (MSIFC–SV). The method calculates each model’s marginal contribution via interval scores (ISs) and establishes a redundancy criterion. The model selection process starts with all the models included, with Shapley values determining the order of elimination. Models with lower contributions are prioritized for removal until all redundant models are excluded. Inspired by weighting methods used in point forecasting, interval Bayesian weighting (IBSW), inverse CWC weighting (ICWCW), and inverse IS weighting (IISW) are developed to further optimize the weight allocation for the combined models. The final combined prediction is generated via the best-performing method, interval Bayesian weighting. This approach avoids the complexity of evaluating all possible model combinations and significantly improves the efficiency of model selection.

To ensure model diversity, we selected 10 individual interval prediction models from various forecasting frameworks, including QR, LUBE, DE, and bootstrapping. each individual forecast is generated independently. The LUBE method, combined with neural networks, directly generates prediction intervals through dual outputs. QR constructs intervals by estimating specific quantiles of the target variable. DE assumes that the target variable follows a known distribution, generating intervals by estimating distribution parameters. Bootstrapping, a nonparametric resampling method, generates new samples through repeated sampling to estimate the distribution of statistics and build prediction intervals. The effectiveness and robustness of our selection method are validated through empirical analysis of housing and carbon price datasets.

The main contributions of this paper are as follows:

(1) This provides a new approach to model selection in interval combination forecasting, enhancing prediction accuracy to some extent.

(2) This study innovatively applies the Shapley value from cooperative game theory to model selection in interval combination forecasting. By quantifying each model’s marginal contribution and considering interactions between models, a more rational model selection process is achieved.

(3) A redundancy criterion based on interval score increments is established. By assessing changes in the interval score after removing a model, the effectiveness and precision of model selection are enhanced.

(4) The introduction of IIS, ICWCW, and interval Bayesian weighting methods optimizes the weight allocation in interval ensemble forecasting, further enhancing the reliability and accuracy of the prediction results.

The remaining sections of this paper are as follows. Section "Preliminary knowledge" lists the technologies involved in the proposed MSIFC–SV. The specific framework of the proposed MSIFC–SV is listed in Sect. "Selection procedure for the superior subset". The experimental preparations and two experiments are presented in Sect. "Experiment". Section "Conclusions" concludes our study.

Preliminary knowledge

This section introduces two prevalent predictors, artificial neural networks and long short-term memory networks, along with four single-interval prediction frameworks: lower-upper bound estimation, bootstrap, quantile regression, and distribution estimation, which do not require uncertainty labels. It also defines key evaluation metrics for interval prediction: prediction interval coverage probability, mean prediction interval width, coverage width-based criterion, and interval score.

Predictors

Artificial neural network

An artificial neural network (ANN) is a mathematical model known for its adaptability, self-learning capabilities, and ability to learn in real time. It consists of a vast network of interconnected neurons that process information distributively. Typically, an ANN includes three parts: an input layer, one or more hidden layers, and an output layer, as shown in Fig. 1. The ANN creates a nonlinear mapping between input and output samples by leveraging the self-learning capability of its neurons. Each layer of the network consists of multiple nodes connected by weights across layers. Upon receiving an input signal, a neuron applies an activation function to perform a nonlinear transformation of the signal before forwarding it to the next layer. Initially, weights are set randomly but are adjusted continuously during training to align the neuron outputs with a predefined target, thereby iteratively refining the network’s accuracy in mapping inputs to outputs.

Long short-term memory

First introduced by Hochreiter & Schmidhuber (1997)⁴⁰, the LSTM network is an advanced variant of recurrent neural networks (RNNs) designed to address issues such as ‘gradient explosion’ and ‘gradient vanishing’, which are common in traditional RNNs. LSTM, a key deep learning algorithm, excels at analyzing and predicting time series data with long intervals and delays. An LSTM cell consists of four interconnected components: an input gate, an output gate, a forget gate, and a cell state. These components work in concert to enable the LSTM’s time memory function through the toggling of gates. The cell architecture is illustrated in Fig. 2.

The cell state is the central component of the LSTM network, acting as its ‘memory’, which can be selectively retained or discarded through a ‘gate’ mechanism. This allows the network to maintain or forget information as needed. Its operation is governed by the following formula:

$${f_t}=\sigma ({W_f}[{h_{t - 1}},{x_t}]+{b_f}),$$

(1)

$${i_t}=\sigma ({W_i}[{h_{t - 1}},{y_t}]+{b_i}),$$

(2)

$${g_t}=\tanh ({W_c}[{h_{t - 1}},{y_t}]+{b_c}),$$

(3)

$${C_t}={C_{t - 1}}{f_t}+{i_t}{g_t}$$

(4)

$${o_t}=\sigma ({W_o}[{h_{t - 1}},{y_t}]+{b_o})$$

(5)

$${h_t}={o_t}\tanh ({C_t}),$$

(6)

where ${i_t}$, ${f_t}$ and ${o_t}$ represent the input, forget, and output gates, respectively, and where ${x_t}$ is the current input. ${W_f}$, ${W_i}$, ${W_c}$ and ${W_o}$ denote the weights corresponding to each gate, whereas ${b_f}$, ${b_i}$, ${b_c}$ and ${b_o}$ are the biases for each state. ${g_t}$ refers to the candidate vector at time t, and ${C_t}$ indicates the cell state at time t. The sigmoid function $\sigma$ and hyperbolic tangent function tanh are the activation functions used. ${h_t}$ denotes the output of the $t{\text{-}}th$ cell, and ${h_{t - 1}}$ is the output of the previous cell.

Methodology of interval forecasting

Interval forecasting

The principle of regression involves the use of a data generation function, denoted as $f( \cdot )$, in combination with additional noise to model the observable target values ${y_i}$.

$${y_i}=f({x_{(i - 1)}})+{\sigma _{noise}},$$

(7)

where ${x_{i - 1}}=[{x_{1(i - 1)}},{x_{2(i - 1)}}, \cdots ,{x_{m(i - 1)}}]$ and where m represents the number of independent variables. For ${x_i}$ and ${y_i}$, $i=1,2, \cdots ,n$. ${\sigma _{noise}}$ is referred to as irreducible noise or data noise, which may arise from explanatory variables that are not observed within x or from inherently stochastic processes in the data.

The size of a minibatch is n, and ${x_{i - 1}}$ is the $i{\text{-}}th$ input of ${y_i}$. The concept of uncertainty quantification is to forecast intervals $[\hat {y}_{{_{i}}}^{L},\hat {y}_{{_{i}}}^{U}]$such that ${y_i}$ satisfies the following:

$$\Pr [\hat {y}_{{_{i}}}^{L} \leqslant {y_i} \leqslant \hat {y}_{{_{i}}}^{U}] \geqslant {P_c},$$

(8)

where $\Pr [y_{{_{i}}}^{L} \leqslant {y_i} \leqslant y_{{_{i}}}^{U}]$ represents the PI coverage probability (PICP) and where ${P_c}$ denotes the preset confidence level. The PICP is a measurement correlated with the credibility of the created PI and is represented as:

$$PICP=\frac{1}{n}\sum\limits_{{i=1}}^{n} {{c_i}} ,$$

(9)

where ${c_i}=1$ if ${y_i}$ is within the interval $[\hat {y}_{{_{i}}}^{L},\hat {y}_{{_{i}}}^{U}]$; otherwise, ${c_i}=0$. Note that a PI is meaningless if $\hat {y}_{{_{i}}}^{L}$ is extremely small and if $\hat {y}_{{_{i}}}^{U}$ is excessively large. The ideal PI should be as narrow as possible while satisfying Eq. (8)⁴¹.

MPIW denotes the mean PI width^23,42,43,44, which is defined as:

$$MPIW=\frac{1}{n}\sum\limits_{{i=1}}^{n} {(\hat {y}_{{_{i}}}^{U} - \hat {y}_{{_{i}}}^{L})} .~$$

(10)

Lower upper bound estimation (LUBE)

In⁴¹, the loss function of the uncertainty-based PI (UBPI) was defined with two terms:

$$L={L_{MPIW}}+{\lambda _1}{L_{PI}},$$

(11)

where ${L_{MPIW}}$ represents the loss of the PI width and point estimate and where ${L_{PI}}$ denotes the loss of the PI coverage probability. ${\lambda _1}$ is the hyperparameter that determines the relative importance of two terms.

${L_{MPIW}}$ is obtained by estimating the uncertainty in the regression task^42,45. ${L_{MPIW}}$ is defined as follows:

$$\begin{gathered} {L_{MPIW}}=\frac{n}{2}[\frac{{\frac{1}{n}\sum\limits_{{i=1}}^{n} {{{({y_i} - {{\hat {y}}_i})}^2}} }}{{{\sigma ^2}}}+\log ({\sigma ^2})] \hfill \\ {\text{ }} \approx \frac{n}{2}[\frac{{MSE}}{{MPIW}}+\log MPIW], \hfill \\ \end{gathered}$$

(12)

where MSE represents the mean squared error of a minibatch, and⁴⁴ employs the midpoint of the interval as the point estimation.

$${\hat {y}_i}=\frac{{\hat {y}_{{_{i}}}^{L}+\hat {y}_{{_{i}}}^{U}}}{2},$$

(13)

$$MSE=\frac{1}{n}\sum\limits_{{i=1}}^{n} {{{({y_i} - {{\hat {y}}_i})}^2}} .$$

(14)

Studies^42,43 inspired the development of ${L_{PI}}$, which is developed as follows to learn the PI:

$${L_{PI}}={\left[ {\hbox{max} (0,{P_c} - PICP)} \right]^2}.$$

(15)

The Boolean values of ${c_i}$ in Eq. (9) result in a nondifferentiable ${L_{PI}}$ loss in the backpropagation algorithm because of the max operation in Eq. (15). It is therefore replaced with a softened version, as follows^32,44,46,47:

$${c_i}={f_{sigmoid}}(s \cdot (\hat {y}_{{_{i}}}^{U} - {y_i})) \cdot {f_{sigmoid}}(s \cdot ({y_i} - \hat {y}_{{_{i}}}^{L})),$$

(16)

where ${f_{sigmoid}}$ is the sigmoid activation function and where $s>0$ is the hyperparameter factor of softening.

Thus, the loss function of interval prediction proposed by⁴¹ can be defined as:

$$Los{s_{UBPI}}=\frac{n}{2}[\frac{{MSE}}{{MPIW}}+\log MPIW]+{\lambda _1}{\left[ {\hbox{max} (0,{P_c} - PICP)} \right]^2},$$

(17)

where n is the number of predicted ${y_i}$. ${\lambda _1}$ balances the importance between the PICP and MPIW.

Salem et al. (2020)⁴³ proposed an effective method. Their loss function is defined as follows:

$$\begin{gathered} Los{s_{SNM}}=(1 - {\lambda _2})(1 - {\lambda _3}) \cdot MPIW+{\lambda _2}(1 - {\lambda _3}) \cdot \hbox{max} {(0,{P_c} - PICP)^2} \hfill \\ {\text{ }}+{\lambda _3} \cdot MSE+\xi \cdot \frac{1}{n}\sum\limits_{{i=1}}^{n} {\left[ {\hbox{max} (0,\hat {y}_{{_{i}}}^{U} - {{\hat {y}}_i})+\hbox{max} (0,{{\hat {y}}_i} - \hat {y}_{{_{i}}}^{L})} \right]} , \hfill \\ \end{gathered}$$

(18)

where $\xi$ is a preset value, and with reference to previous studies, we take $\xi$= 10.

The flow chart for generating prediction intervals based on the LUBE framework with ANN and LSTM predictors is shown in Fig. 3.

Quantile regression (QR)

Regression analysis essentially minimizes the mean square error to estimate the conditional expectation of a dependent variable given independent variables. Quantile regression is the integration of the concept of quantiles into ordinary linear regression. It can predict any conditional quantile of the target variable and is not restricted to forecasting only the mean. The definition of a loss in QR is

$$L({\xi _i}/\alpha )=\left\{ \begin{gathered} \alpha {\xi _i}{\text{, }}if{\text{ }}{\xi _i} \geqslant 0, \hfill \\ (\alpha - 1){\xi _i},{\text{ }}if{\text{ }}{\xi _i}<0, \hfill \\ \end{gathered} \right.$$

(19)

where ${\xi _i}={y_i} - f({x_{i - 1}})$, $\alpha$ is the desired quantile. ${y_i}$ is the target value for the corresponding input ${x_{i - 1}}$, and $f( \cdot )$ is the quantile regression function.

The PI is constructed by integrating the QR loss function into ANN and LSTM models. Two separate models with various quantiles are trained, one for the upper bound (${\alpha _u}$) and the other for the lower bound (${\alpha _l}$). For example, to create a PI with a 95% confidence level, we apply the 97.5% and 2.5% quantiles in the QR loss function. This method circumvents issues such as model complexity, the need for prior distribution assumptions, and biased parameter estimation errors often encountered in traditional interval predictions. The structure is illustrated in Fig. 4.

Bootstrap

The bootstrap method is a statistical technique that involves repeated sampling from the original sample and is also known as the self-help method. It does not require prior assumptions about sample distribution⁴⁸. Bootstrapping is an effective method for estimating machine learning uncertainty. It obtains L sets of training samples by randomly sampling the original data with replacement and trains L learning models separately to construct the model estimation error variance $\sigma _{i}^{2}({x_{i - 1}})$, which can represent model uncertainty caused by human factors such as model structure and parameter settings⁴⁹. On the basis of the obtained $\sigma _{i}^{2}({x_{i - 1}})$ and $\varepsilon ({x_{i - 1}})$, a prediction interval with a confidence level $(1 - \alpha )\%$ is constructed:

$${\bar {\hat {y}}_i} \pm {Z_{1 - \frac{\alpha }{2}}}\sqrt {\sigma _{i}^{2}({x_{i - 1}})+\varepsilon ({x_{i - 1}})} ,$$

(20)

where ${Z_{_{{1 - \frac{\alpha }{2}}}}}$ is the $1 - \frac{\alpha }{2}$ quantile of the standard normal distribution.

The structure of the bootstrap-ANN and bootstrap-LSTM is shown in Fig. 5, with the following steps:

(1) Data preprocessing. The samples are normalized to the interval [0, 1].

(2) Bootstrap resampling. The original training set $D=\left\{ {({x_{i - 1}},{y_i})} \right\}_{{i=1}}^{n}$ is sampled L times with replacement to create L training sample sets $\left\{ {({x_{i - 1}}^{l},{y_i}^{l})} \right\}_{{i=1}}^{n},l=1,2, \cdots L$.

(3) The first bootstrap sample $\left\{ {({x_{i - 1}}^{1},{y_i}^{1})} \right\}_{{i=1}}^{n}$ is used to train the ANN and LSTM and to obtain the predicted values ${\hat {y}_1}(x_{{i - 1}}^{1})$.

(4) The above steps are repeated L times to obtain L predicted values$({\hat {y}_1}(x_{{i - 1}}^{1}),{\hat {y}_2}(x_{{i - 1}}^{2}), \cdots ,{\hat {y}_L}(x_{{i - 1}}^{L}))$ and the model error variance $\sigma _{i}^{2}$.

(5) The mean of the predicted values $\bar {\hat {y}}({x_{i - 1}})$ of the ANNs and LSTMs is obtained from the above steps, and the model error variance $\sigma _{i}^{2}$ is used to construct a new training sample $D{}_{{new}}$.

(6) Similar to Steps (2)-(5), L draws from the training sample $D{}_{{new}}$ are used to train ANNs and LSTMs to obtain the data noise error variance $\sigma _{\varepsilon }^{2}$.

(7) The prediction interval is obtained from $\bar {\hat {y}}({x_{i - 1}})$, $\sigma _{i}^{2}$, and $\sigma _{s}^{2}$.

Distribution estimator (DE)

The interval prediction method using distribution estimators was proposed by Saeed et al. (2022)²⁸. DE can predict the distribution of predictions without any retraining. The core concept involves training a system to predict two values: the forecast value and an estimation of the error of the forecast value. It is assumed that potential predictions follow a normal distribution. Hence, the distribution estimator considers the forecast as the mean and uses the forecast error as an approximation of the distribution’s standard deviation. The distribution is estimated as a Gaussian function centered on the forecast, with its standard deviation equal to the estimation error. DE then uses these estimates and estimation errors to generate prediction intervals.

To obtain the prediction values and prediction errors, DE employs a two-step system, as shown in Fig. 6. The first step trains the model on the training dataset to generate the predictions for the test set. The first-step model is referred to as the base model. The loss function for model training is the MSE, which is defined as:

$${L_y}=\frac{1}{n}\sum\limits_{{i=1}}^{n} {({y_i} - } {\hat {y}_i}{)^2},$$

(21)

where ${\hat {y}_i}$ is the predicted value of the base model and where ${y_i}$ is the observation value.

The second step is to predict the variance, and the model used in this step is known as the error model. The model uses the test data as input, and the predicted label is the square of the difference between the predicted value of the first step and the actual value. The structure of the second step model is similar to that of the base model, with the loss function being defined as

$${L_\varepsilon }=\frac{1}{n}\sum\limits_{{i=1}}^{n} {(({y_i} - } {\hat {y}_i}{)^2} - \hat {\varepsilon }_{i}^{2}{)^2},$$

(22)

where $\hat {\varepsilon }_{i}^{2}$ denotes the value predicted from the error model. By utilizing the outputs from both the base and error models, prediction intervals can be generated with the distribution estimator. These intervals are constructed according to Eqs. (23) and (24).

$$\hat {y}_{i}^{L}={\hat {y}_i} - B,$$

(23)

$$\hat {y}_{i}^{U}={\hat {y}_i}+B,$$

(24)

where $\hat {y}_{i}^{L}$ and $\hat {y}_{i}^{U}$ denote the lower and upper bounds of the PI at $i - th$, respectively. B denotes the error range, which is defined by

$$B={Z_{\frac{\alpha }{2}}} \times s,$$

(25)

where s is an unbiased estimate of $\sigma$. s is defined by Eq. (26)^28,48.

$$s=\sqrt {\frac{1}{{n - 2}}\sum\limits_{{i=1}}^{n} {\hat {\varepsilon }_{i}^{2}} }.$$

(26)

where ${Z_{\frac{\alpha }{2}}}$ is a standard normal variable with a cumulative probability level of $\frac{\alpha }{2}$.

The constructed ANNDE and LSTMDE are able to estimate any quantile of the target without the need for retraining. The process is shown in Fig. 6.

Evaluation metrics for the PI

We evaluate the quality of the PI in terms of the PICP, MPIW, CWC and IS. The PICP and MPIW are defined as shown in Eqs. (9) and (10) above, respectively. The CWC and IS are defined as follows.

The PI normalized average width (PINAW) is designed as follows:

$$PINAW=\frac{1}{{nR}}\sum\limits_{{i=1}}^{n} {(\hat {y}_{i}^{U} - \hat {y}_{i}^{L}} ).$$

(27)

The CWC, which combines the coverage and width of the PI, is defined as follows:

$$CWC=\left\{ \begin{gathered} PINAW,{\text{ }}\;\;{\text{ }}\;{\text{ }}PICP \geqslant {P_c} \hfill \\ PINAW(1+\gamma (PICP){e^{( - \eta (PICP - {P_c}))}}){\text{ }},{\text{ }}PICP<{P_c} \hfill \\ \end{gathered} \right.,$$

(28)

$$\gamma (PICP)=\left\{ \begin{gathered} 0,{\text{ }}PICP \geqslant {P_c} \hfill \\ 1,{\text{ }}PICP<{P_c} \hfill \\ \end{gathered} \right.$$

(29)

where $R=\mathop {\hbox{max} }\limits_{{1 \leqslant i \leqslant n}} ({y_i}) - \mathop {\hbox{min} }\limits_{{1 \leqslant i \leqslant n}} ({y_i})$ is the range of the predicted target and where $\eta$ is the penalty coefficient used to penalize the case when the PICP is smaller than ${P_c}$. If the PICP is smaller than ${P_c}$, the prediction applies an exponential penalty to $\gamma (PICP)$ to minimize the gap between the PICP and ${P_c}$. The primary focus is on the PINAW when the PICP is greater than or equal to ${P_c}$.

The interval score (also known as the Winkler score)^50,51 is a scoring mechanism used to evaluate the quality of prediction intervals. It not only accounts for the interval width, which is defined as $\hat {y}_{i}^{U} - \hat {y}_{i}^{L}$, but also imposes penalties when the actual value ${y_i}$ falls outside the interval. The interval score is defined as follows:

$$I{S_i}=\left\{ \begin{gathered} \hat {y}_{i}^{U} - \hat {y}_{i}^{L},{\text{ }}\hat {y}_{i}^{L} \leqslant {y_i}{\text{ }} \leqslant \hat {y}_{i}^{U}{\text{ }} \hfill \\ (\hat {y}_{i}^{U} - \hat {y}_{i}^{L})+\frac{2}{{1 - {P_c}}}(\hat {y}_{i}^{L} - {y_i}),{\text{ }}{y_i}<\hat {y}_{i}^{L} \hfill \\ (\hat {y}_{i}^{U} - \hat {y}_{i}^{L})+\frac{2}{{1 - {P_c}}}({y_i} - \hat {y}_{i}^{U}),{\text{ }}{y_i}>\hat {y}_{i}^{U}{\text{ }} \hfill \\ \end{gathered} \right.,$$

(30)

$$IS=\frac{1}{n}\sum\limits_{{i=1}}^{n} {I{S_i}} ,$$

(31)

where $I{S_i}$ is the interval score at time i. In particular, IS represents the final interval score. The score encourages intervals that are narrow yet contain the actual value and penalizes cases where the actual value lies outside the predicted interval.

The PICP reflects the reliability of the PI, indicating the probability that the predicted target falls into the PI. The larger the PICP is, the more reliable the interval. The MPIW measures the accuracy of the PI, and a smaller MPIW indicates greater accuracy. When the PICP exceeds P_c, attention should focus on the width of the PI. Specifically, when the PICP is above P_c, a smaller MPIW is preferred. The CWC is an evaluation metric that considers both the PI’s coverage and width; a lower CWC signifies a higher quality of the PI. A lower IS indicates a more accurate and reliable prediction interval.

Selection procedure for the superior subset

Combined interval prediction

Combined interval forecasting is a method that uses specific mathematical models or algorithms to integrate multiple independent interval forecasts into a prediction by assigning specified weights. Assuming that there are k individual interval predictions, with ${\hat {f}_k}$ representing the prediction interval for the $k{\text{-}}th$ individual forecast, the combined interval forecast can be formulated as follows:

$${\hat f_c} = \mathop \sum \limits_{k = 1}^K {w_k}{\hat f_k},$$

(32)

where ${\hat {f}_c}$ represents the prediction interval of the combined model and where ${w_k}$ denotes the weight assigned to the $k{\text{-}}th$ individual interval prediction within the combined model. ${w_k}$ takes values in the range of [0, 1], and $\mathop \sum \limits_{k = 1}^K {w_k} = 1$.

We assume that $\{ {\hat {f}_k}\}$ is the prediction interval for the $k{\text{-}}th$ forecast and that $\hat {y}_{{kt}}^{L},\hat {y}_{{kt}}^{U}$ are the lower and upper bounds of the prediction interval for the $k{\text{-}}th$ forecast at time t, respectively.

$$\{ {\hat {f}_k}\} =\left\{ {[\hat {y}_{{kt}}^{L},\hat {y}_{{kt}}^{U}]} \right\},k=1,2, \ldots ,K.$$

(33)

This study introduces an interval Bayesian weighting (IBSW) algorithm based on Bayes’ theorem⁵². It calculates the posterior probability of each model by evaluating the relationship between the predicted intervals and observed values. The posterior probabilities are then used as weights to combine the models’ predictions.

Assuming that each model has an equal prior probability, the prior probability can be defined as:

$$P({M_k})=\frac{1}{k},\forall k \in \{ 1,2,\ldots ,K\} .$$

(34)

For each model, the likelihood function is defined on the basis of the predicted interval bounds $[\hat {y}_{i}^{L},\hat {y}_{i}^{U}]$ and the observed value ${y_i}$:

$$P({y_t}|{M_k})=\left\{ \begin{gathered} \frac{1}{{\hat {y}_{{tk}}^{U} - \hat {y}_{{tk}}^{L}}},{\text{ }}if{\text{ }}\hat {y}_{{kt}}^{L} \leqslant {y_t} \leqslant \hat {y}_{{kt}}^{U} \hfill \\ 0{\text{ }},{\text{ }}otherwise \hfill \\ \end{gathered} \right.$$

(35)

This function represents the probability of the observed value ${y_i}$ falling within the predicted interval $[\hat {y}_{i}^{L},\hat {y}_{i}^{U}]$.

According to Bayes’ theorem, the posterior probability of the model is calculated as follows:

$${W_k}=P({M_k}|{y_t})=\frac{{P({y_t}|{M_{tk}})P({M_{tk}})}}{{\sum\nolimits_{{l=1}}^{K} {P({y_t}|{M_{lt}})P({M_{lt}})} }},$$

(36)

where, $\sum\nolimits_{{l=1}}^{K} {P({y_i}|{M_{tl}})P({M_{tl}})}$ is the sum of the product of likelihoods and prior probabilities across all models, serving as a normalization factor.

Unlike point forecasting, interval forecasting must account for both interval width and coverage, which often conflict with each other. Evaluating interval forecasts on the basis of only one of these metrics is incomplete. CWC and IS are comprehensive metrics that consider both width and coverage. Inspired by the inverse variance weighting method used in point forecasting, this study proposes a new weighting algorithm based on the reciprocals of CWC and IS.

Assume that $CW{C_k}$ and $I{S_k}$ represent the CWC and IS of the $k{\text{-}}th$ interval prediction, respectively, which are defined as follows:

$$CW{C_k}=\left\{ \begin{gathered} PINA{W_k}=\frac{1}{{nR}}\sum\limits_{{t=1}}^{n} {(\hat {y}_{{kt}}^{U} - \hat {y}_{{kt}}^{L}),{\text{ }}PIC{P_k} \geqslant {P_c}} \hfill \\ PINA{W_k}(1+\gamma (PIC{P_k}){e^{-\eta (PIC{P_k}-{P_c})}}{\text{)}},PIC{P_k}<{P_c} \hfill \\ \end{gathered} \right.,$$

(37)

where $\gamma (PIC{P_k})=\left\{ \begin{gathered} 0,{\text{ }}PIC{P_k} \geqslant {P_c} \hfill \\ 1,{\text{ }}PIC{P_k}<{P_c} \hfill \\ \end{gathered} \right.$.

$$I{S_{ki}}=\left\{ \begin{gathered} \hat {y}_{{ki}}^{U} - \hat {y}_{{ki}}^{L},{\text{ }}\hat {y}_{{ki}}^{L} \leqslant {y_{ki}}{\text{ }} \leqslant \hat {y}_{{ki}}^{U}{\text{ }} \hfill \\ (\hat {y}_{{ki}}^{U} - \hat {y}_{{ki}}^{L})+\frac{2}{{1 - {P_c}}}(\hat {y}_{{ki}}^{L} - {y_{ki}}),{\text{ }}{y_{ki}}<\hat {y}_{{ki}}^{L} \hfill \\ (\hat {y}_{{ki}}^{U} - \hat {y}_{{ki}}^{L})+\frac{2}{{1 - {P_c}}}({y_{ki}} - \hat {y}_{{ki}}^{U}),{y_{ki}}>\hat {y}_{{ki}}^{U}{\text{ }} \hfill \\ \end{gathered} \right.,$$

(38)

$$I{S_k}=\frac{1}{n}\sum\limits_{{t=1}}^{n} {I{S_{ki}}} .$$

(39)

After individual forecasts are obtained, we can combine these models via various methods to obtain a more comprehensive reflection of the quality of the prediction intervals. The weighting method is defined as follows:

$${W_k}=\left\{ \begin{gathered} \frac{{CWC_{k}^{{ - 1}}}}{{\sum\limits_{{k=1}}^{K} {CWC_{k}^{{ - 1}}} }},for{\text{ }}CWC, \hfill \\ \frac{{IS_{k}^{{ - 1}}}}{{\sum\limits_{{k=1}}^{K} {IS_{k}^{{ - 1}}} }}{\text{ }},for{\text{ }}IS, \hfill \\ \end{gathered} \right.k=1,2, \ldots ,K.$$

(40)

Definition 1

IS_c represents the combined IS

$$I{S_{ct}}=\left\{ \begin{gathered} \hat {y}_{{ct}}^{U} - \hat {y}_{{ct}}^{L},{\text{ }}\hat {y}_{{ct}}^{L} \leqslant {y_{ct}}{\text{ }} \leqslant \hat {y}_{{ct}}^{U}{\text{ }} \hfill \\ (\hat {y}_{{ct}}^{U} - \hat {y}_{{ct}}^{L})+\frac{2}{{1 - {P_c}}}(\hat {y}_{{ct}}^{L} - {y_{ct}}),{\text{ }}{y_{ct}}<\hat {y}_{{ct}}^{L} \hfill \\ (\hat {y}_{{ct}}^{U} - \hat {y}_{{ct}}^{L})+\frac{2}{{1 - {P_c}}}({y_{ct}} - \hat {y}_{{ct}}^{U}),{y_{ct}}>\hat {y}_{{ct}}^{U}{\text{ }} \hfill \\ \end{gathered} \right.,$$

(41)

$$I{S_c}=\frac{1}{n}\sum\limits_{{t=1}}^{n} {I{S_{ct}}}$$

(42)

We assume that $F=\left\{ {{{\hat {f}}_1},{{\hat {f}}_2}, \cdots ,{{\hat {f}}_K}} \right\}$ represents a combination of k single-interval predictions. $I{S_c}(F)$ denotes the IS of F, which is defined as follows:

$$I{S_c}(F)=\sum\limits_{{k=1}}^{K} {{w_k}} I{S_k},$$

(43)

$$I{S_{ct}}(F)=\left\{ \begin{gathered} \hat {y}_{{ct}}^{U} - \hat {y}_{{ct}}^{L},{\text{ }}\hat {y}_{{ct}}^{L} \leqslant {y_{ct}}{\text{ }} \leqslant \hat {y}_{{ct}}^{U}{\text{ }} \hfill \\ (\hat {y}_{{ct}}^{U} - \hat {y}_{{ct}}^{L})+\frac{2}{{1 - {P_c}}}(\hat {y}_{{ct}}^{L} - {y_{ct}}),{\text{ }}{y_{ct}}<\hat {y}_{{ct}}^{L} \hfill \\ (\hat {y}_{{ct}}^{U} - \hat {y}_{{ct}}^{L})+\frac{2}{{1 - {P_c}}}({y_{ct}} - \hat {y}_{{ct}}^{U}),{y_{ct}}>\hat {y}_{{ct}}^{U}{\text{ }} \hfill \\ \end{gathered} \right.,$$

(44)

$$I{S_c}(F)=\frac{1}{n}\sum\limits_{{t=1}}^{n} {I{S_{ct}}} (F)$$

(45)

We define $\Delta IS$ as the increment of the $IS$, and $\Delta I{S_c}(F,F - \{ {\hat {f}_k}\} )$ represents the increment of the combination F containing the model $\{ {\hat {f}_k}\}$ compared with the combination F not containing $\{ {\hat {f}_k}\}$. If

$$\Delta I{S_c}(F,F - \{ {\hat {f}_k}\} )=I{S_c}(F) - I{S_c}(F - \{ {\hat {f}_k}\} ) \geqslant 0,$$

(46)

then $\{ {\hat {f}_k}\}$ is a redundant forecast in the combination.

Definition 2

We assume that $F=\left\{ {{{\hat {f}}_1},{{\hat {f}}_2}, \cdots ,{{\hat {f}}_K}} \right\}$ is a combination of k interval forecasts and that $S \subseteq F$. S is considered the superior subgroup within the combination F if it satisfies the following two conditions:

$$I{S_c}(S) \leqslant I{S_c}(S - \{ {\hat {f}_k}\} ),\;\;\forall \{ {\hat {f}_k}\} \in S,$$

(47)

$$I{S_c}(S) \leqslant I{S_c}(S \cup \{ {\hat {f}_k}\} ),\forall \{ {\hat {f}_k}\} \in F - S.$$

(48)

Shapley values for interval prediction combinations

By quantifying each model’s contribution and accounting for interactions between models, the Shapley value ensures that the combined contribution of multiple models is greater than or equal to the sum of their individual contributions. We apply Shapley value theory to model selection in interval forecast combinations, which we refer to as MSIFC-SV. Instead of profitability, the IS is used to calculate Shapley values for each interval forecast in the coalition. Because IS is the comprehensive metric that balances interval width and coverage, which are often conflicting criteria in interval forecasting. By using the opposite of IS, the Shapley values more accurately capture the overall performance of each interval forecast. The order in which individual interval predictions enter the selection process is determined by their Shapley values. Predictions are then evaluated sequentially to determine whether they are redundant without considering all possible combinations, thus improving the efficiency of the selection process.

Definition 3

We suppose that $M=\left\{ {1,2, \cdots ,K} \right\}$ is a coalition consisting of K participants. For any subset $S \subseteq M$, if there is a corresponding real-valued function $v(S):{2^M} \to R$ that satisfies $v(\phi )=0$, then $v(S)$ is said to be the characteristic function, representing the payoff to subset S of M when acting in unison. K denotes the number of participants.

$v(S)$ has superadditivity, that is,

$$v(M) \geqslant \sum\limits_{{k=1}}^{K} {v(\{ k\} )}.$$

(49)

Definition 4

We assume that $F=\left\{ {{{\hat {f}}_1},{{\hat {f}}_2}, \cdots ,{{\hat {f}}_K}} \right\}$ is a combination of K individual interval predictions. For any subset $S \subset F$,

$${v_f}(S)= - I{S_c}(S)$$

(50)

where ${v_f}(S)$ represents the opposite of the IS of set S and where ${\Gamma _f}=[F,{v_f}]$ is defined as the cooperative game of K individual interval predictions; then,

$$- IS(\phi )=0.$$

(51)

According to Eq. (49), we have

$$- I{S_c}(F) \geqslant \sum\limits_{{k=1}}^{K} { - I{S_k}}.$$

(52)

Definition 5

A cooperative game $\Gamma =[M,v]$ is given, and its Shapley values can be described as:

$$\varphi (v)=({\varphi _1}(v),{\varphi _2}(v), \cdots ,{\varphi _K}(v)),$$

(53)

where

$${\varphi _k}(v)=\sum\limits_{{S \subseteq M - \{ k\} }} {\frac{{(K - s - 1)!s!}}{{K!}}} [v(S \cup \{ k\} ) - v(S)],k{\text{=1,2,}} \cdots ,K.$$

(54)

In Eq. (54), $v(S \cup \{ k\} ) - v(S)$ is the marginal contribution of the $k{\text{-}}th$ participant when added to the coalition S in the cooperative game. The Shapley value is described as the weighted average of the marginal contributions of the $k{\text{-}}th$ participant in the cooperative game.

Since ${\Gamma _f}=[F,{v_f}]$ is regarded as a cooperative game with K participants, we present the following Corollary 1 from Definition 5.

Corollary 1

If ${\Gamma _f}=[F,{v_f}]$ defines a cooperative game defined by interval forecasting combinations, $F=\left\{ {{{\hat {f}}_1},{{\hat {f}}_2}, \cdots ,{{\hat {f}}_K}} \right\}$ represents a combination of K individual interval predictions, and the Shapley value is given by

$${\varphi _k}({v_f})= - \sum\limits_{{S \subseteq F - \{ {{\hat {f}}_k}\} }} {\frac{{(K - s - 1)!s!}}{{K!}}} [I{S_c}(S \cup \{ {\hat {f}_k}\} ) - I{S_c}(S)],k{\text{=1,2,}} \cdots ,K$$

(55)

where S is a subset of $F - \{ {\hat {f}_k}\}$ and s=|S| is the number of the set S.

The incremental IS of adding an interval prediction $\{ {\hat {f}_k}\}$ to coalition S can be expressed as

$$\Delta I{S_c}(S \cup \{ {\hat {f}_k}\} ,S)=I{S_c}(S \cup \{ {\hat {f}_k}\} ) - I{S_c}(S).$$

(56)

If ${\varphi _k}({v_f}) \leqslant {\varphi _j}({v_f})$, according to Eq. (46), we have

$$\begin{gathered} \sum\limits_{{S \subseteq M - \{ {{\hat {f}}_k}\} }} {\frac{{(K - s - 1)!s!}}{{K!}}} \Delta I{S_c}(S \cup \{ {{\hat {f}}_k}\} ),S) \geqslant \hfill \\ \sum\limits_{{S \subseteq M - \{ {{\hat {f}}_j}\} }} {\frac{{(K - s - 1)!s!}}{{K!}}} \Delta I{S_c}(S \cup \{ {{\hat {f}}_j}\} ),S). \hfill \\ \end{gathered}$$

(57)

Equation (57) shows that a prediction with a larger Shapley value has a smaller IS increment when it is added to the coalition by weighting the inverse of the IS.

On the basis of the above, we obtain the following inference:

(1) $I{S_k}$ is used to measure the quality of the prediction intervals for $\{ {\hat {f}_k}\}$, with smaller values of $I{S_k}$ causing a larger ${\varphi _k}({v_f})$.

(2) $I{S_c}(F - \{ {\hat {f}_k}\} )$ reflects the quality of the prediction intervals for coalition $F - \{ {\hat {f}_k}\}$, and as $I{S_c}(F - \{ {\hat {f}_k}\} )$ decreases, ${\varphi _k}({v_f})$ increases.

(3) $I{S_c}(F,F - \{ {\hat {f}_k}\} )$ describes the IS between the combinations F and $F - \{ {\hat {f}_k}\}$. As $I{S_c}(F,F - \{ {\hat {f}_k}\} )$ increases, ${\varphi _k}({v_f})$ decreases.

Model selection for interval prediction combination based on shapley value

In this section, the MSIFC-SV algorithm is used to identify the superior subset from a set of interval forecasts. Initially, all the K interval predictions are included in the superior combination. During the selection procedure, the order in which individual interval predictions are evaluated is determined by their Shapley values. The algorithm iteratively eliminates redundant models in a loop until all redundant models have been eliminated.

The specific steps of the selection algorithm based on the Shapley value are as follows:

Step 1

Input the prediction intervals and the actual observed values for all individual forecasts.

Step 2

Compute the IS for each forecast via Eqs. (30) and (31).

Step 3

Initialize the superior combination set to include all individual forecasts.

Step 4

Define the prediction coalition ${F_0}$ as containing ${n_0}$ interval forecasts. For any $S \subseteq {F_0}$, let ${v_f}(S)= - I{S_c}(S)$. ${\Gamma _0}=[{F_0},{v_f}]$ is a cooperative game comprising ${n_0}$ individual interval forecasts in ${F_0}$.

Step 5

For ${n_0}$ interval forecasts within ${\Gamma _0}$, the Shapley values ${\varphi _k}({v_f})$, where $k=1,2, \cdots ,{n_0}$, of the different prediction coalitions can be calculated via Eq. (55). Let $V=\left\{ {{\varphi _1}({v_f}),{\varphi _2}({v_f}), \cdots ,{\varphi _{{n_0}}}({v_f})} \right\}$ be the set of Shapley values.

Step 6

The minimum value in V is denoted by MinV, and the size of V is defined as $\left| V \right|$.

Step 7

Determine whether the interval forecast $\{ {\hat {f}_k}\}$ corresponding to MinV is considered redundant in accordance with Definition 1.

If $\{ {\hat {f}_k}\}$ is determined to be a redundant model, then it is removed from the coalition ${F_0}$. Subsequently, Steps 4–7 are repeated for the new coalition ${F_0}={F_0} - \{ {\hat {f}_k}\}$.

If $\{ {\hat {f}_k}\}$ is not a redundant model and $\left| V \right|>1$, then let $\left| V \right| - \left\{ {\hbox{min} V} \right\}$ and repeat Steps 6–7.

If $\{ {\hat {f}_k}\}$ is not a redundant model and $\left| V \right|=1$, then there is no redundant model in ${F_0}$; thus, ${F_0}$ is the superior subgroup.

According to the criterion for determining redundant forecasts established in Definition 1, there is no redundant model in ${F_0}$. Therefore, for any $\{ {\hat {f}_k}\} \in {F_0}$, we have

$$\Delta I{S_c}({F_0},{F_0} - \{ {\hat {f}_k}\} )=I{S_c}({F_0}) - I{S_c}({F_0} - \{ {\hat {f}_k}\} )<0$$

(58)

In contrast, $F - {F_0}$ comprises all redundant models within ${F_0}$. For any $\{ {\hat {f}_k}\} \in F - {F_0}$, we have

$$\Delta I{S_c}({F_0} \cup \{ {\hat {f}_k}\} ,{F_0})=I{S_c}({F_0} \cup \{ {\hat {f}_k}\} ) - I{S_c}({F_0}) \geqslant 0$$

(59)

Equations (49) and (50) are combined to yield:

1. $I{S_c}({F_0})<I{S_c}({F_0} - \{ {\hat {f}_k}\} ),\forall \{ {\hat {f}_k}\} \in {F_0},$

2. $I{S_c}({F_0}) \leqslant I{S_c}({F_0} \cup \{ {\hat {f}_k}\} ),\forall \{ {\hat {f}_k}\} \in F - {F_0}.$

On the basis of Definition 2, we can conclude that ${F_0}$ is the superior subgroup.

The process of selecting the superior subgroup on the basis of the Shapley value is shown in Fig. 7 and Algorithm 1, and the interval forecast flow of the superior subgroup is shown in Fig. 8.

Experiment

To assess the effectiveness, superiority, and robustness of the MSIFC–SV algorithm, two experiments are conducted. Each experiment generated ten individual interval forecasts, employing ANN and LSTM predictors within the LUBE, QR, bootstrap, and DE frameworks. To reduce the randomness inherent in neural networks, the experimental results represent the averages of 15 repetitions. In Experiment 1, the MSIFC–SV selection method is applied to assess the algorithm’s effectiveness by forecasting daily average prices of the Hubei carbon emission trading market in China from January 3, 2017, to February 28, 2022 (data from http://www.tanpaifang.com/). The performance of the subsets selected by the proposed algorithm is compared with their derivative subsets and the best individual interval predictions via the PICP, MPIW, CWC, and IS metrics. Additionally, these subsets are compared with those from other selection methods. In Experiment 2, Boston housing price data from the Statistical Data Sets at Carnegie Mellon University (https://lib.stat.cmu.edu/datasets/boston) are used to verify the algorithm’s superiority and robustness. The methodologies for Experiment 2 are consistent with those of Experiment 1.

In Experiment 1, following the methodology outlined by Xie et al. (2022)⁵³, 15 variables from diverse domains, including macroeconomics, international carbon markets, energy prices, climate and environment, and public concerns, were preliminarily selected. All these variables are considered potential influencers of carbon price. To increase the forecasting efficiency, we employed LassoCV for feature selection, as it is known for effectively identifying key features in high-dimensional data⁵⁴. The results of the feature selection for all the experiments are presented in Table 1. During the experiment, the data are divided into training and testing sets, with the specific split detailed in Table 2. Model parameters are set on the basis of referenced studies and specific experimental requirements, as detailed in the Supplementary Table 1.

Table 1 Variables selected by LassoCV.

Full size table

Table 2 Data partition.

Full size table

Experiment 1

Table 3 presents the PICP, MPIW, CWC, and IS for various individual interval predictions. The results show that SNM–ANN outperforms all the other methods, achieving a PICP of 0.9835, an MPIW of 10.7694, a CWC of 0.2834, and an IS of 10.8786. Notably, SNM–ANN achieves the smallest CWC and MPIW values among all the models when the PICP exceeds P_c. This finding indicates that SNM–ANN provides a narrower prediction interval while ensuring satisfactory coverage, thereby delivering higher-quality prediction intervals. In contrast, QR-LSTM performed the worst, with coverage significantly below the target P_c, indicating low prediction reliability. Among the individual predictions, only SNM–ANN, SNM-LSTM, and ANNDE achieved interval coverage at the predefined confidence level.

Table 3 PICP, MPIW, and CWC of individual interval forecasts of carbon price.

Full size table

Table 4 presents the results of the PICP, MPIW, CWC, and IS for all individual combinations and the best individual prediction under different weighting methods. The simple average (SA) weighting method computes the mean of the upper and lower bounds of multiple forecast intervals to generate an interval combination prediction. In contrast, the median weighting method selects the median⁵⁵ of the upper and lower bounds across forecast intervals to form the interval combination prediction. The results indicate that all combinations achieve a PICP above the set threshold (Pc). However, only the weighted combinations via IBSW and IISW outperform the best individual prediction in terms of the MPIW, with values of 10.7455 and 10.6458, respectively, both lower than the optimal individual model’s MPIW of 10.7694. Additionally, the weighted combination via IBSW achieves the lowest CWC and IS values of 0.2661 and 10.7455, respectively, compared with the optimal single model’s values of 0.2834 and 10.8786, respectively. This finding indicates that including all the individual models may introduce redundancy, leading to decreased prediction performance and further highlights the importance of careful model selection.

Moreover, the significant differences in performance across various weighting methods underscore the need for careful allocation of prediction weights in ensemble models. Given that the weighted combination using IBSW achieves the smallest CWC and IS values, the IBS method is used to construct the ensemble in subsequent experiments.

Table 4 Comparison of different subgroups of carbon price.

Full size table

To evaluate the effectiveness of the MSIFC–SV algorithm, we compared the superior subset selected with the best individual prediction and further analyzed the performance changes when adding or removing a model from the superior subset. As shown in Table 5, the subset selected on the basis of Shapley values significantly outperforms its derived subsets and the best individual prediction across the PICP, MPIW, CWC, and WS metrics. The superior subset {5,6} achieved PICP, MPIW, CWC, and IS values of 1, 5.4061, 0.1397, and 5.4061, respectively. This subset not only provided 100% coverage but also presented the narrowest interval width among the compared subsets. Compared with the best individual model and all individual model combinations, the interval width of the superior subset is reduced by 49.80% and 49.69%, respectively. Removing any model from the optimal subset resulted in coverage falling below the confidence level, whereas adding any model achieved 100% coverage but with a width between 6 and 8. These results demonstrate that the proposed selection algorithm effectively identifies a superior subset.

Table 5 Results of different subgroups of experiment 1.

Full size table

To further validate the effectiveness of the MSIFC–SV algorithm, we compared it with existing point prediction model selection methods (e.g., the HLN test², encompassing test⁷, CEI¹⁸, MI¹³, and RRLS¹⁴), as there are no established methods for interval prediction model selection. We used the midpoints of all the individual prediction intervals and the true values as inputs to apply these point prediction model selection algorithms. We subsequently combine the selected intervals into an interval prediction via the interval Bayesian weighting algorithm. The comparison results are presented in Table 6; Fig. 9.

Across all combinations, every subset’s PICP exceeded the threshold Pc, with the subset selected by the MSIFC–SV algorithm demonstrating superior performance. The subset identified by the CEI method matched the subset chosen by the MSIFC–SV algorithm, whereas the HLN method performed the worst. The MI and RRLS methods, both of which are based on mutual information, performed well but were still outperformed by our proposed method. Notably, the subset selected by the RRLS method did not achieve the minimum MPIW but yielded the smallest CWC and the largest IS. This discrepancy highlights limitations in existing comprehensive evaluation metrics for interval prediction, such as CWC and IS. Future work will focus on identifying more suitable evaluation metrics.

Table 6 Comparison of different model selection algorithms for carbon price.

Full size table

To further validate the effectiveness of the proposed algorithm, a Wilcoxon test is conducted to compare the prediction intervals of the superior subsets selected by the proposed algorithm with those selected by the baseline algorithm. Since the Wilcoxon test is based on point differences, this study revises the calculation to better account for the differences between the bounds of intervals. For n pairs of data $({x_i},{y_i})~$, where i = 1,2,…,n, the difference for each pair is calculated as:

${d_i}={x_i} - {y_i}.~$

In the case of interval data, the revised calculation is applied to n pairs of interval data $([x_{i}^{L},x_{i}^{U}],[y_{i}^{L},y_{i}^{U}]).$ The difference is now expressed as:

$${d_i}=(x_{i}^{L} - y_{i}^{L})+(x_{i}^{U} - y_{i}^{U})$$

.

All remaining steps and the theoretical framework of the method remain consistent with the original approach.

The significance level is set at 0.1, where a p value less than 0.1 indicates a significant difference between the two methods. Since the CEI method selects the same subset as the proposed algorithm does in this example, it is not included in this test. The results are shown in Table 7.

Table 7 shows that the proposed algorithm significantly differs from HLN (0.05)², the encompassing test⁷, MI¹³, and RRLS¹⁴. Although the differences with HLN (0.15, 0.25) and HLN (0.35) are not statistically significant, the analysis indicates that, while maintaining confidence levels, the proposed algorithm reduces the width of the prediction intervals by 54.19% and 53.73%, respectively, compared with these two methods.

Table 7 Wilcoxon test: proposed algorithm vs. baseline for carbon price.

Full size table

Experiment 2

Table 8 presents the individual interval prediction results for the Boston housing prices, highlighting significant differences in model performance. The ANNDE model demonstrated the best predictive performance, achieving a PICP of 0.9509, an MPIW of 19.5354, a CWC of 0.3992, and an IS of 23.8322. Notably, the SNM-ANN, bootstrap-ANN, and bootstrap-LSTM models failed to reach the specified confidence level Pc. Among these, the PICP of the bootstrap-based models fell below 0.9, substantially lower than the target of 0.95, indicating a lower predictive reliability in this context. This underscores the limitations of single models in capturing complex information patterns.

Table 8 PICP, MPIW, and CWC of individual interval forecasts of Boston housing price.

Full size table

To further assess the robustness of the MSIFC–SV algorithm, we conducted a comparative analysis using the Boston housing price dataset from two perspectives. Table 9 presents the results for the optimal individual prediction, various combinations of individual predictions via different weighting methods, the subset selected via the MSIFC–SV algorithm and its derived subsets, and the subsets selected via benchmark selection methods. Figure 10 provides a visual comparison of the various metrics. The subset {5, 6, 7}, selected by the MSIFC–SV algorithm, with PICP, MPIW, CWC, and IS values of 1, 16.4609, 0.3116, and 16.4609, respectively, outperforms the optimal individual prediction, all individual model combinations, its derived subsets, and the subsets selected by the benchmark algorithms.

For all combinations of single models using different weighting methods, the coverage rates exceeded the confidence level Pc, but their MPIWs were all greater than 20, even larger than the best-performing individual interval. This finding indicates that redundant models within the combinations limit predictive performance, underscoring the importance of model selection in interval combination forecasting. Among all individual model combinations, the IBSW weighting method achieved the smallest MPIW and IS, with the CWC being the second smallest, highlighting the method’s superiority. However, even when coverage meets the target and the interval width is minimized, the CWC does not reach its lowest value, revealing the limitations of the CWC in providing a comprehensive evaluation of interval predictions.

Removing model {7} from the superior subset results in a coverage rate below Pc, demonstrating that model {7} is essential in the combination. Adding any other model to the superior subset maintains a coverage rate of 100%, but with increased interval width, further highlighting the superiority of the selected subset.

Additionally, in the benchmark selection methods, the coverage rates all exceeded the confidence level, but the interval widths ranged between 18 and 20, which are noticeably larger than those of the subset selected by the MSIFC-SV algorithm. This further emphasizes the robustness of the proposed algorithm, which identifies a superior predictive subset by considering diversity, redundancy, and interdependencies within the combinations.

Table 9 Results of different subgroups of Boston housing price.

Full size table

A Wilcoxon test is also conducted to compare the prediction intervals of the superior subsets selected by the proposed algorithm with those selected by the baseline algorithm in this case. The results are shown in Table 10.

The proposed algorithm shows a significant difference compared with the HLN test², MI¹³, and RRLS¹⁴. Although the differences between the proposed algorithm and the encompassing test⁷ and CEI method did not reach a level of significance, as shown in Table 8; Fig. 10, when the PICP exceeds the confidence level, the prediction interval of the proposed algorithm reduces the MPIW, CWC, and IS by 13.17%, 33.23%, and 13.17%, and 15.71%, 38.27%, and 45.69%, respectively, compared with these two methods.

Table 10 Wilcoxon test for the proposed algorithm vs. the baseline algorithm in terms of the housing price.

Full size table

Experiment 3

To evaluate the scalability of the proposed model selection algorithm, we apply it to interval forecasting of West Texas Intermediate (WTI) crude oil prices, with data sourced from the U.S. Energy Information Administration (https://www.eia.gov/). The daily data spans from January 4, 2010, to December 31, 2024. Additionally, to assess the performance of the algorithm as the number of models increases, we conduct model selection experiments using model pools containing 10, 15, and 20 sub-models. Based on the aforementioned interval forecasting framework, this section introduces interval prediction models under both GRU and CNN predictors, including SNM-GRU, UBPI-GRU, bootstrap-GRU, QR-GRU, GRUDE, SNM-CNN, UBPI-CNN, bootstrap-CNN, QR-CNN, and CNNDE. The parameter settings are provided in the Supplementary Table 1. Table 11 presents the forecasting performance of each individual model.

Table 11 PICP, MPIW, and CWC of individual interval forecasts of WTI.

Full size table

Table 12 compares the subsets of models selected by the proposed method from model pools of 10, 15, and 20 models with those selected by the Encompassing test⁷, CEI¹⁸, MI¹³, and RRLS¹⁴ methods. As shown in the table, the proposed method demonstrates advantages in PICP, MPIW, CWC, and IS metrics across all three model pool sizes (10, 15, and 20), confirming its effectiveness and superiority in interval forecasting for WTI, as well as its scalability. Additionally, it is observed that as the number of sub-models in the model pool increases, the performance of the final selected subset from the proposed method improves.

Specifically, the subsets selected by all model selection algorithms achieved the pre-set coverage rate of 0.95. The proposed method consistently maintains the interval width below 9, ensuring both high coverage and narrower intervals. Among the comparison methods, the subsets selected by CEI and RRLS showed better performance. The RRLS method selects fewer sub-models compared to the others, while the Encompassing test tends to select a larger number of sub-models for combination. Notably, the subsets selected by RRLS in both the 15-model and 20-model pools are identical. The performance of MI fluctuates significantly across different model pool sizes.

Table 12 Comparison of the performance of different model selection algorithms for WTI.

Full size table

Conclusions

To address the challenges of model selection in interval combination forecasting, this work introduces a novel algorithm, MSIFC–SV, for selecting models in interval forecasting combinations. This algorithm calculates the marginal contribution of each individual prediction within the combination via the IS and establishes a criterion for identifying redundant models on the basis of the incremental change in the IS of the combination after a model is removed. Initially, the combination includes all the models, and the models with lower marginal contributions are sequentially removed until all the redundant models are eliminated. To further optimize the weighting of the combined models, interval Bayesian weighting, inverse CWC weighting, and inverse IS weighting methods are introduced, with the final combination determined via the best-performing interval Bayesian weighting method. As a result, MSIFC–SV avoids the complexity of an exhaustive search across all possible model combinations, thereby enhancing the efficiency of model selection.

A series of comparative experiments on carbon price datasets demonstrate that the proposed algorithm outperforms its derived subsets, the optimal individual prediction, and subsets selected by benchmark algorithms such as the HLN test, encompassing test, CEI, MI, and RRLS in terms of the PICP, MPIW, CWC, and IS. The empirical analysis validates the effectiveness and robustness of the proposed algorithm, providing a new perspective and practical tool for model selection in interval combination forecasting.

However, some challenges remain. First, the computational cost of calculating Shapley values is high when addressing large datasets and numerous models. Second, existing comprehensive interval metrics, such as CWC and IS, have limitations and may not always accurately measure model performance. In future research, we will focus on improving the efficiency of Shapley value computation and developing more accurate comprehensive evaluation metrics for interval forecasting.

Data availability

The datasets used in this study are publicly available at http://www.tanpaifang.com/ and https://lib.stat.cmu.edu/datasets/boston. The code supporting this study is available upon request from the corresponding author, Prof. Huayou Chen, at huayouc@126.com.

References

Fang, Y. Forecasting combination and encompassing tests. Int. J. Forecast. 19(1), 87–94 (2003).
Article MATH Google Scholar
Costantini, M. & Pappalardo, C. A hierarchical procedure for the combination of forecasts. Int. J. Forecast. 26(4), 725–743 (2010).
Article MATH Google Scholar
Thomson, M. E., Pollock, A. C., Önkal, D. & Gönül, M. S. Combining forecasts: performance and coherence. Int. J. Forecast. 35(2), 474–484 (2019).
Article MATH Google Scholar
Mendes-Moreira, J., Jorge, A. M., Soares, C. & de Sousa, J. F. Ensemble learning: A study on different variants of the dynamic selection approach. In Machine Learning and Data Mining in Pattern Recognition: 6th International Conference, MLDM 2009, Leipzig, Germany, July 23–25, 2009. Proceedings 6 (pp. 191–205). Springer Berlin Heidelberg. (2009).
Ashton, R. H. Combining the judgments of experts: how many and which ones? Organ. Behav. Hum Decis. Process. 38(3), 405–414 (1986).
Article MATH Google Scholar
Lichtendahl, K. C. Jr & Winkler, R. L. Why do some combinations perform better than others? Int. J. Forecast. 36(1), 142–149 (2020).
Article Google Scholar
Kışınbay, T. The use of encompassing tests for forecast combinations. J. Forecast. 29(8), 715–727 (2010).
Article MathSciNet MATH Google Scholar
Schmittlein, D. C., Kim, J. & Morrison, D. G. Combining forecasts: operational adjustments to theoretically optimal rules. Manage. Sci. 36(9), 1044–1056 (1990).
Article MATH Google Scholar
Schwarz, G. Estimating the Dimension of a Model461–464 (The Annals of Statistics, 1978).
Swanson, N. R. & Zeng, T. Choosing among competing econometric forecasts: regression-based forecast combination using model selection. J. Forecast. 20(6), 425–440 (2001).
Article MATH Google Scholar
Le An, Y. U., Yang, S., Lai, W. A. N. G., Nakamori, Y. & K. K., & Time series forecasting with multiple candidate models: selecting or combining? J. Syst. Sci. Complexity. 18(1), 1 (2005). (In Chinese).
MathSciNet MATH Google Scholar
Harvey, D. I., Leybourne, S. J. & Newbold, P. Tests for forecast encompassing. J. Bus. Economic Stat. 16(2), 254–259 (1998).
Article MATH Google Scholar
Cang, S. & Yu, H. A combination selection algorithm on forecasting. Eur. J. Oper. Res. 234(1), 127–139 (2014).
Article MathSciNet MATH Google Scholar
Che, J. Optimal sub-models selection algorithm for combination forecasting model. Neurocomputing 151, 364–375 (2015).
Article MATH Google Scholar
Zhou, Q., Wang, C. & Zhang, G. A combined forecasting system based on modified multi-objective optimization and sub-model selection strategy for short-term wind speed. Appl. Soft Comput. 94, 106463 (2020).
Article MATH Google Scholar
Liu, T., Liu, W., Liu, Z., Zhang, H. & Liu, W. Ensemble water quality forecasting based on decomposition, sub-model selection, and adaptive interval. Environ. Res. 237, 116938 (2023).
Article CAS PubMed MATH Google Scholar
Liu, Z., Jiang, P., Wang, J. & Zhang, L. Ensemble forecasting system for short-term wind speed forecasting based on optimal sub-model selection and multi-objective version of mayfly optimization algorithm. Expert Syst. Appl. 177, 114974 (2021).
Article Google Scholar
Jiang, P., Liu, Z., Wang, J. & Zhang, L. Decomposition-selection-ensemble forecasting system for energy futures price forecasting based on multi-objective version of chaos game optimization algorithm. Resour. Policy. 73, 102234 (2021).
Article Google Scholar
Jiang, P., Liu, Z., Niu, X. & Zhang, L. A combined forecasting system based on statistical method, artificial neural networks, and deep learning methods for short-term wind speed forecasting. Energy 217, 119361 (2021).
Article Google Scholar
Khosravi, A., Nahavandi, S. & Creighton, D. Construction of optimal prediction intervals for load forecasting problems. IEEE Trans. Power Syst. 25(3), 1496–1503 (2010).
Article ADS MATH Google Scholar
Hamaguchi, Y., Noma, H., Nagashima, K., Yamada, T. & Furukawa, T. A. Frequentist performances of bayesian prediction intervals for random-effects meta‐analysis. Biom. J. 63(2), 394–405 (2021).
Article MathSciNet PubMed MATH Google Scholar
Yang, X., Fu, G., Zhang, Y., Kang, N. & Gao, F. A naive Bayesian wind power interval prediction approach based on rough set attribute reduction and weight optimization. Energies, 10(11), 1903. (2017).
Nix, D. A. & Weigend, A. S. Estimating the mean and variance of the target probability distribution. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94) (Vol. 1, pp. 55–60). IEEE. (1994), June.
Yao, W., Zeng, Z. & Lian, C. Generating probabilistic predictions using mean-variance estimation and echo state network. Neurocomputing 219, 536–547 (2017).
Article MATH Google Scholar
Zhu, S., Luo, X., Yuan, X. & Xu, Z. An Improved long short-term Memory Network for Streamflow Forecasting in the Upper Yangtze River341313–1329 (Stochastic Environmental Research and Risk Assessment, 2020).
Beyaztas, U., Arikan, B. B., Beyaztas, B. H. & Kahya, E. Construction of prediction intervals for Palmer Drought Severity Index using bootstrap. J. Hydrol. 559, 461–470 (2018).
Article ADS MATH Google Scholar
Acharki, N., Bertoncello, A. & Garnier, J. Robust prediction interval estimation for gaussian processes by cross-validation method. Comput. Stat. Data Anal. 178, 107597 (2023).
Article MathSciNet MATH Google Scholar
Saeed, A., Li, C., Gan, Z., Xie, Y. & Liu, F. A simple approach for short-term wind speed interval prediction based on independently recurrent neural networks and error probability distribution. Energy 238, 122012 (2022).
Article Google Scholar
Conde-Amboage, M., González-Manteiga, W. & Sánchez-Sellero, C. Predicting Trace gas Concentrations Using Quantile Regression Models311359–1370 (Stochastic Environmental Research and Risk Assessment, 2017).
He, Y. & Li, H. Probability density forecasting of wind power using quantile regression neural network and kernel density estimation. Energy. Conv. Manag. 164, 374–384 (2018).
Article ADS MATH Google Scholar
Zhang, W., Quan, H. & Srinivasan, D. Parallel and reliable probabilistic load forecasting via quantile regression forest and quantile determination. Energy 160, 810–819 (2018).
Article MATH Google Scholar
Khosravi, A., Nahavandi, S., Creighton, D. & Atiya, A. F. Lower upper bound estimation method for construction of neural network-based prediction intervals. IEEE Trans. Neural Networks. 22(3), 337–346 (2010).
Article PubMed MATH Google Scholar
Wu, Q., Lin, J., Zhang, S. & Tian, Z. A theil coefficient-based combination prediction method with interval heterogeneous information for wind energy prediction. J. Intell. Fuzzy Syst. 41(1), 1031–1048 (2021).
Article MATH Google Scholar
Wang, P., Tao, Z., Liu, J. & Chen, H. Improving the forecasting accuracy of interval-valued carbon price from a novel multi-scale framework with outliers detection: an improved interval-valued time series analysis mode. Energy Econ. 118, 106502 (2023).
Article MATH Google Scholar
Meira, E., Oliveira, F. L. C. & Jeon, J. Treating and pruning: new approaches to forecasting model selection and combination using prediction intervals. Int. J. Forecast. 37(2), 547–568 (2021).
Article Google Scholar
Long, H., Geng, R. & Zhang, C. Wind speed interval prediction based on the hybrid ensemble model with biased convex cost function. Front. Energy Res. 10, 954274 (2022).
Article MATH Google Scholar
Shapley, L. S. A value for n-person games. Contribution to the Theory of Games, 2. (1953).
Huayou, C. H. E. N. A kind of cooperative games method determining weights of combination forecasting. Forecasting 22(1), 75–80 (2003). (In Chinese).
MATH Google Scholar
Ding, Z., Chen, H. & Zhou, L. Using shapely values to define subgroups of forecasts for combining. J. Forecast. 42, 905–923 (2023).
Article MathSciNet MATH Google Scholar
Schmidhuber, J. & Hochreiter, S. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).
Article PubMed MATH Google Scholar
Lai, Y. et al. Exploring uncertainty in regression neural networks for construction of prediction intervals. Neurocomputing 481, 249–257 (2022).
Article MATH Google Scholar
Hu, J., Lin, Y., Tang, J. & Zhao, J. A new wind power interval prediction approach based on reservoir computing and a quality-driven loss function. Appl. Soft Comput. 92, 106327 (2020).
Article MATH Google Scholar
Salem, T. S., Langseth, H. & Ramampiaro, H. Prediction intervals: Split normal mixture from quality-driven deep ensembles. In Conference on Uncertainty in Artificial Intelligence (pp. 1179–1187). PMLR. (2020), August.
Yan, L., Verbel, D. & Saidi, O. Predicting prostate cancer recurrence via maximizing the concordance index. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 479–485). (2004), August.
Quinonero-Candela, J., Rasmussen, C. E., Sinz, F., Bousquet, O. & Schölkopf, B. Evaluating predictive uncertainty challenge. In Machine Learning Challenges Workshop (1–27). Berlin, Heidelberg: Springer Berlin Heidelberg. (2005), April.
MATH Google Scholar
Pearce, T., Brintrup, A., Zaki, M. & Neely, A. High-quality prediction intervals for deep learning: A distribution-free, ensembled approach. In International Conference on Machine Learning (pp. 4075–4084). PMLR. (2018), July.
Wang, B., Li, T., Yan, Z., Zhang, G. & Lu, J. DeepPIPE: a distribution-free uncertainty quantification approach for time series forecasting. Neurocomputing 397, 11–19 (2020).
Article MATH Google Scholar
Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (Chapman and Hall/CRC, 1994).
Ni, Q., Zhuang, S., Sheng, H., Wang, S. & Xiao, J. An optimized prediction intervals approach for short term PV power forecasting. Energies 10(10), 1669 (2017).
Article MATH Google Scholar
Winkler, R. L. A decision–theoretic approach to interval estimation. J. Am. Stat. Assoc. 67(337), 187–191 (1972).
Article MathSciNet MATH Google Scholar
Maciejowska, K., Nowotarski, J. & Weron, R. Probabilistic forecasting of electricity spot prices using factor Quantile Regression Averaging. Int. J. Forecast. 32(3), 957–965 (2016).
Article MATH Google Scholar
Darbandsari, P. & Coulibaly, P. Assessing entropy-based bayesian model averaging method for probabilistic precipitation forecasting. J. Hydrometeorol. 23(3), 421–440 (2022).
ADS Google Scholar
Xie, Q., Hao, J., Li, J. & Zheng, X. Carbon price prediction considering climate change: a text-based framework. Economic Anal. Policy. 74, 382–401 (2022).
Article MATH Google Scholar
Duan, S. et al. LightGBM Low-Temperature Prediction Model based on LassoCV Feature Selection. Math. Probl. Eng. 2021(1), 1776805 (2021).
Google Scholar
Gaba, A., Tsetlin, I. & Winkler, R. L. Combining interval forecasts. Decis. Anal. 14(1), 1–20 (2017).
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 72371001).

Author information

Authors and Affiliations

School of Big Data and Statistics, Anhui University, Hefei, 230601, China
Jingling Yang & Huayou Chen
School of Marine Science and Technology, Tianjin University, Tianjin, 300072, China
Liren Chen

Authors

Jingling Yang
View author publications
Search author on:PubMed Google Scholar
Liren Chen
View author publications
Search author on:PubMed Google Scholar
Huayou Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Jingling Yang: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. Liren Chen: Validation, Writing – review & editing. Huayou Chen: Validation, Writing – review & editing.

Corresponding author

Correspondence to Huayou Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, J., Chen, L. & Chen, H. Shapley value-driven superior subset selection algorithm for carbon price interval forecast combination. Sci Rep 15, 7087 (2025). https://doi.org/10.1038/s41598-025-90006-2

Download citation

Received: 06 December 2024
Accepted: 10 February 2025
Published: 27 February 2025
Version of record: 27 February 2025
DOI: https://doi.org/10.1038/s41598-025-90006-2

Subjects

Abstract

Similar content being viewed by others

Smart system for forecasting financial outcomes and supporting strategic choices

Paying attention to distortion: improving the accuracy of multistep-ahead carbon price forecasting with shape and temporal criteria

Interfacial-active green surfactants from sugarcane bagasse and its biochar for enhanced oil recovery

Introduction

Preliminary knowledge

Predictors

Artificial neural network

Long short-term memory

Methodology of interval forecasting

Interval forecasting

Lower upper bound estimation (LUBE)

Quantile regression (QR)

Bootstrap

Distribution estimator (DE)

Evaluation metrics for the PI

Selection procedure for the superior subset

Combined interval prediction

Definition 1

Definition 2

Shapley values for interval prediction combinations

Definition 3

Definition 4

Definition 5

Corollary 1

Model selection for interval prediction combination based on shapley value

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Experiment

Experiment 1

Experiment 2

Experiment 3

Conclusions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links