Introduction

The rapid global shift toward low-carbon electricity generation has intensified the integration of wind power into modern power systems. As one of the most scalable and mature forms of renewable energy, wind energy plays a critical role in achieving decarbonization targets, enhancing energy independence, and reducing reliance on fossil-based reserves1,2. However, the high spatiotemporal variability of wind introduces substantial uncertainty into the operation of power grids, posing non-trivial challenges to grid stability, reserve planning, economic dispatch, and electricity market clearing3,4. In this context, the ability to forecast wind power accurately—and more importantly, to do so in ways aligned with decision-making structures—has emerged as a fundamental pillar of smart grid operation.

Forecasting models for wind power have historically followed two main trajectories: physics-based numerical weather prediction (NWP) models and data-driven statistical or machine learning (ML) approaches5,6. NWP models simulate atmospheric conditions by solving complex fluid dynamics equations across spatial grids and altitudes. While they provide a physically grounded baseline, their relatively coarse resolution and long computational time often limit their applicability for short-term operational needs. In contrast, statistical and ML-based models directly learn empirical relationships from historical weather and generation data, providing higher flexibility and responsiveness. In recent years, the explosive growth of accessible SCADA data and advances in deep learning have catalyzed the emergence of highly expressive models for short-term wind forecasting, including recurrent neural networks (RNNs), long short-term memory networks (LSTMs), gated recurrent units (GRUs), convolutional architectures, and attention-based mechanisms7,8.

These models have achieved considerable success by minimizing standard metrics such as mean absolute error (MAE), root mean squared error (RMSE), and pinball loss for quantile forecasts. The literature has also seen a surge in hybrid architectures that combine convolutional neural networks (CNNs) for spatial pattern extraction with LSTMs for temporal sequence modeling, often leveraging satellite or reanalysis datasets to enhance input diversity9,10. More recently, graph neural networks (GNNs) and attention-based transformers have been proposed to model the interdependencies between wind farms or meteorological stations, capturing spatial correlations in a non-parametric and data-driven fashion. Simultaneously, probabilistic forecasting frameworks have advanced, with techniques such as Bayesian neural networks, ensemble learning, and quantile regression neural networks (QRNNs) producing full distributions or prediction intervals, thereby enabling uncertainty-aware operations11,12.

Yet despite this rich methodological landscape, one critical aspect remains fundamentally underrepresented: the relational structure of forecasted outputs, particularly in contexts where the rank ordering of wind sites is more important than their exact predicted values. Many real-world energy system decisions—including shared energy storage allocation, top-K dispatch prioritization, reserve contribution, or competitive market bidding—are ranking-sensitive by nature. Operators often care not about whether wind site A will generate 102 MW versus 105 MW, but whether it will outperform site B, C, or D. However, conventional forecasting models do not directly account for such priorities. They are trained to minimize scalar error metrics that treat each prediction independently, with no explicit consideration of how forecasts relate to one another13,14,15.

This is where the concept of learning to rank (LTR)—originally developed for search engines and recommender systems—becomes highly relevant. LTR offers a paradigm shift: rather than optimizing the absolute accuracy of each individual prediction, it focuses on learning models that can produce accurate ranked lists of items16. In the ML literature, three main classes of LTR methods exist: pointwise, pairwise, and listwise. Pointwise methods treat ranking as a regression or classification task, pairwise methods such as RankNet focus on optimizing the relative ordering of item pairs, and listwise methods like ListNet and LambdaMART aim to optimize the structure of entire permutations based on probabilistic or entropy-based formulations17. These methods have demonstrated remarkable effectiveness in domains like web search, financial risk modeling, and sports analytics—but their application to renewable energy forecasting remains virtually nonexistent. In the few cases where rank modeling has been considered in energy research, it has typically appeared in the form of post-processing steps, such as priority rules in demand response or simple thresholding mechanisms. To date, there exists no deep learning framework in the wind power forecasting literature that embeds ranking objectives into the core of the training process, nor one that treats rank stability over time as a first-class modeling concern. This oversight has important implications. In operational settings where decisions are made in rolling intervals (e.g., every 15 min), rank inconsistencies between forecast updates can lead to unstable schedules, inefficient redispatch, or even cascading adjustments in reserve activation. Ensuring that predicted rankings evolve smoothly—reflecting the physical continuity of wind patterns—is therefore crucial for decision reliability18.

Another dimension often neglected is the integration of temporal and spatial coherence into rank-aware forecasting. Temporal modeling in wind forecasting is commonly handled using RNNs or LSTMs, which are effective at capturing sequence dependencies but do not guarantee coherent ordinal evolution. Spatial dependencies are typically addressed through shared input features, multi-output models, or emerging techniques like GNNs and self-attention layers. However, the joint modeling of spatial rank correlation and temporal rank stability remains a methodological gap19,20. No current framework simultaneously enforces physically feasible ranks across sites, penalizes sudden or erratic rank reversals over time, and respects the bounded generation constraints of turbines. In parallel, interpretability and constraint enforcement have become important concerns as ML models begin to inform real-time control. Gradient-based attribution techniques such as SHAP or integrated gradients are increasingly used to identify which features most strongly influence model output. In rank-based models, it is equally important to understand which inputs drive ranking shifts, especially when such outputs influence market bids or operator trust. Similarly, forecast values must be constrained to remain within physical turbine ratings, and uncertainty must be bounded to prevent overly confident or impractical predictions. These auxiliary considerations are rarely integrated into LTR systems, even though they are essential for making ML models compatible with the operational and regulatory constraints of power system applications21,22.

To address these challenges, the present paper proposes a ranking-oriented wind power forecasting framework with temporal reliability constraints, establishing a fundamentally new learning paradigm for renewable forecasting. Unlike conventional models that predict values in isolation, the proposed method jointly optimizes for numerical accuracy, spatial rank fidelity, and temporal rank smoothness. The core modeling structure includes (i) deep sequence encoders to learn feature-to-output mappings, (ii) listwise and pairwise rank-based loss functions to optimize the predicted permutation structure, and (iii) temporal regularization penalties to stabilize rank transitions over consecutive time steps. The framework is cast as a constrained optimization problem, including forecast range limits, quantile separation conditions, and ranking curvature controls. The loss function is formulated as a convex combination of prediction loss, rank loss, and rank stability penalties, allowing for flexible tradeoffs depending on the operational context. Moreover, the model supports both point forecasts and quantile outputs, ensuring applicability to deterministic and probabilistic workflows alike. Spatial correlations are modeled using attention mechanisms that dynamically compute relevance weights between sites, improving the consistency of regional rank predictions. The entire architecture is trained end-to-end on real SCADA and high-resolution meteorological data from a network of wind farms. Evaluation includes conventional metrics such as MAE and RMSE, but also incorporates Kendall’s tau, normalized discounted cumulative gain (NDCG), and a newly proposed Temporal Rank Stability Index (TRSI), which quantifies the smoothness of rank transitions over time.

The contribution is thus not just a novel ML architecture, but a rethinking of what wind forecasting should optimize for. In moving from pointwise accuracy to decision-aligned ranking coherence, this work opens new methodological territory in the forecasting domain. It transforms wind forecasting into a relational, temporal, and spatially-aware task—better aligned with the actual decision processes that govern energy system operations. As power systems continue to decentralize, and as the cost of forecast errors becomes increasingly nonlinear and asymmetric, frameworks like the one proposed here offer a compelling new foundation for forecast-informed control.

Mathmatical modelling

Fig. 1
figure 1

System architecture of the rank-aware wind power forecasting framework.

Figure 1 presents a streamlined architecture where wind features are processed by an attention-based deep learning model optimized with multi-objective loss functions to produce temporally stable and ranking-consistent wind power predictions.

$$\begin{aligned} \min _{\varvec{\Theta }, \varvec{\Phi }, \varvec{\Xi }} \quad&\sum _{\tau = 1}^{\mathcal {T}} \sum _{\kappa \in \mathcal {S}} \Bigg \{ \\&\quad \varvec{\alpha } \cdot \left( \frac{1}{|\mathcal {B}|} \sum _{\iota \in \mathcal {B}} \left( \hat{\mathcal {W}}_{\kappa }^{(\tau , \iota )}(\varvec{\Theta }) - \mathcal {W}_{\kappa }^{(\tau , \iota )} \right) ^2 \right) \\&+ \varvec{\beta } \cdot \Bigg ( - \sum _{\iota \in \mathcal {B}} \sum _{\kappa =1}^{N} \mathcal {P}_{\kappa }^{(\tau ,\iota )} \cdot \log \left( \frac{ \exp \left( \hat{\mathcal {W}}_{\kappa }^{(\tau , \iota )}(\varvec{\Phi }) \right) }{ \sum _{\upsilon =1}^{N} \exp \left( \hat{\mathcal {W}}_{\upsilon }^{(\tau , \iota )}(\varvec{\Phi }) \right) } \right) \Bigg ) \\&+ \varvec{\gamma } \cdot \left( \sum _{\iota \in \mathcal {B}} \sum _{\kappa =1}^{N} \left| \mathcal {R}_{\kappa }^{(\tau ,\iota )} - \mathcal {R}_{\kappa }^{(\tau -1,\iota )} \right| \right) \Bigg \}. \end{aligned}$$
(1)

This gloriously expressive total objective function captures the essence of the multi-faceted optimization task for wind power forecasting. It blends three core elements: a pointwise regression loss weighted by \(\varvec{\alpha }\), a listwise ranking-based cross-entropy penalty weighted by \(\varvec{\beta }\), and a temporal rank deviation penalty across prediction horizons, governed by \(\varvec{\gamma }\). The formulation unfolds across each batch \(\iota \in \mathcal {B}\), spatial unit \(\kappa \in \mathcal {S}\), and time interval \(\tau \in \mathcal {T}\), embracing the temporal and spatial complexity of multi-site wind forecasting, while directing learning models \(\varvec{\Theta }, \varvec{\Phi }, \varvec{\Xi }\) to optimize not just values, but their order and temporal smoothness. The parameters \(\alpha\), \(\beta\), and \(\gamma\) in Eq. (1) represent normalized weighting factors assigned to the pointwise error, ranking alignment, and temporal regularization terms, respectively. Their values are determined through grid search within [0.1, 0.8] under the constraint, resulting in an optimal configuration of (0.5, 0.3, 0.2) that balances accuracy and ranking stability.

$$\begin{aligned} \mathcal {L}_{\text {rank}}^{(\tau )} = - \sum _{\kappa = 1}^{N} \varvec{\pi }_{\kappa }^{(\tau )} \cdot \log \left( \frac{\exp \left( \hat{\mathcal {W}}_{\kappa }^{(\tau )} \right) }{ \sum _{\upsilon = 1}^{N} \exp \left( \hat{\mathcal {W}}_{\upsilon }^{(\tau )} \right) } \right) . \end{aligned}$$
(2)

Embedded at the heart of our ranking-aware learning is this loss function derived from listwise learning-to-rank paradigms. Here, \(\varvec{\pi }_{\kappa }^{(\tau )}\) denotes the ground-truth permutation probability for site \(\kappa\) at time \(\tau\), while \(\hat{\mathcal {W}}_{\kappa }^{(\tau )}\) represents the predicted wind power for that site. This cross-entropy-based formulation penalizes deviation from the correct ranking order in a probabilistic sense, thus ensuring the model is sensitive not merely to magnitude, but to ordering relevance, essential for priority-based dispatch systems or competitive energy markets.

$$\begin{aligned} \hat{\mathcal {W}}_{\kappa }^{(\tau )} = \mathcal {F}_{\text {ML}}\left( \varvec{\chi }_{\kappa }^{(\tau )}, \; \varvec{\varphi }_{\kappa }^{(\tau - 1)}, \; \varvec{\nu }_{\kappa }^{(\tau - 2)}; \; \varvec{\Theta } \right) . \end{aligned}$$
(3)

This forecasting equation characterizes how predicted wind output \(\hat{\mathcal {W}}_{\kappa }^{(\tau )}\) at location \(\kappa\) and time \(\tau\) is produced by a composite machine learning mapping \(\mathcal {F}_{\text {ML}}\), conditioned on a rich feature tensor that includes meteorological inputs \(\varvec{\chi }_{\kappa }^{(\tau )}\), lagged feature encodings \(\varvec{\varphi }_{\kappa }^{(\tau - 1)}\), and volatility estimators \(\varvec{\nu }_{\kappa }^{(\tau - 2)}\), all processed under learnable parameters \(\varvec{\Theta }\). This ensures nonlinear dynamics and memory effects are effectively embedded in the prediction engine.

$$\begin{aligned} \mathcal {L}_{\text {MSE}}^{(\tau )} = \frac{1}{N} \sum _{\kappa =1}^{N} \left( \hat{\mathcal {W}}_{\kappa }^{(\tau )} - \mathcal {W}_{\kappa }^{(\tau )} \right) ^2. \end{aligned}$$
(4)

Standard regression loss serves as the foundation for physical accuracy, where \(\mathcal {W}_{\kappa }^{(\tau )}\) is the true wind generation and \(\hat{\mathcal {W}}_{\kappa }^{(\tau )}\) the predicted counterpart. This equation balances the optimization framework by enforcing fidelity to observed values, ensuring that ranking improvements do not come at the cost of numerical inconsistency.

$$\begin{aligned} \mathcal {T}_{\kappa }^{(\tau )} = \left| \mathcal {R}_{\kappa }^{(\tau )} - \mathcal {R}_{\kappa }^{(\tau - 1)} \right| . \end{aligned}$$
(5)

A discrete temporal transition penalty term, this expression captures the abruptness of rank changes for location \(\kappa\) between adjacent horizons. Minimizing such terms helps mitigate rank jitter, promoting stability and forecast credibility across rolling operation intervals.

$$\begin{aligned} \mathcal {R}_{\kappa }^{(\tau )} = \operatorname {rank} \left( \hat{\mathcal {W}}_{\kappa }^{(\tau )} \right) . \end{aligned}$$
(6)

This rank mapping transforms raw predictions into ordinal positions, with ties resolved through deterministic tie-breaking. This function is essential to bridge continuous forecast values with discrete rank-oriented loss objectives.

$$\begin{aligned} 0 \le \hat{\mathcal {W}}_{\kappa }^{(\tau )} \le \mathcal {W}_{\kappa }^{\text {max}}, \quad \forall \kappa \in \mathcal {S}, \forall \tau \in \mathcal {T}. \end{aligned}$$
(7)

Physical realism is enforced here through a hard constraint on generation bounds, where each forecast must lie within feasible capacity ranges. \(\mathcal {W}_{\kappa }^{\text {max}}\) denotes the nameplate rating for each turbine or farm site.

$$\begin{aligned} \left| \mathcal {R}_{\kappa }^{(\tau +1)} - 2\mathcal {R}_{\kappa }^{(\tau )} + \mathcal {R}_{\kappa }^{(\tau -1)} \right| \le \delta _{\text {smooth}}. \end{aligned}$$
(8)

This second-order temporal smoothness constraint ensures the absence of oscillatory rank fluctuations. It is a curvature constraint promoting locally monotonic evolution of forecasted ranks, hence improving temporal consistency in scheduling operations.

$$\begin{aligned} \varvec{\alpha } + \varvec{\beta } + \varvec{\gamma } = 1, \quad \varvec{\alpha }, \varvec{\beta }, \varvec{\gamma } \in [0,1]. \end{aligned}$$
(9)

A simplex constraint for balancing the three components of the total objective. This convex combination guarantees interpretability and ensures the tuning of loss priorities remains numerically stable and traceable.

$$\begin{aligned} \mathcal {Q}_{\kappa }^{(90)} - \mathcal {Q}_{\kappa }^{(10)} \ge \varepsilon _{\text {spread}}, \quad \forall \kappa . \end{aligned}$$
(10)

This probabilistic constraint governs the minimum quantile spread for each wind site to avoid overconfident narrow distributions. It guards against overfitting and underestimation of uncertainty, especially in probabilistic forecast applications.

$$\begin{aligned} \operatorname {NDCG}^{(\tau )} = \frac{1}{\mathcal {Z}} \sum _{\kappa =1}^{N} \frac{2^{\mathcal {W}_{\kappa }^{(\tau )}} - 1}{\log _2(1 + \mathcal {R}_{\kappa }^{(\tau )})}. \end{aligned}$$
(11)

Normalized Discounted Cumulative Gain quantifies the ranking quality of the predicted list. It rewards accurate high-rank predictions more than low-rank ones, making it appropriate for market-clearing settings where top producers have outsized impact.

$$\begin{aligned} \mathbb {E}_{\varvec{\chi }, \mathcal {W} \sim \mathcal {D}} \left[ \mathcal {L}_{\text {total}} \right] . \end{aligned}$$
(12)

Expectation over data distribution ensures generalization in stochastic optimization, especially when training with minibatches. It formalizes the empirical risk minimization framework at the core of learning algorithms.

$$\begin{aligned} \varvec{\zeta }_{\kappa }^{(\tau )} = \varvec{\chi }_{\kappa }^{(\tau )} + \sin \left( \varvec{\omega }_{\kappa } \cdot \tau \right) . \end{aligned}$$
(13)

Time-aware feature embedding injects positional priors into the learning pipeline. Using sinusoidal encodings, this term helps recurrent or attention models distinguish between different temporal stages within the prediction horizon.

$$\begin{aligned} \mathcal {R}_{\text {reg}} = \varvec{\lambda } \cdot \left\| \varvec{\Theta } \right\| _2^2. \end{aligned}$$
(14)

A classic weight decay regularization enforcing complexity control on the model parameters. It discourages overparameterization and leads to better generalization under noisy and variable weather regimes.

$$\begin{aligned} \mathcal {L}_{\text {window}} = \sum _{\tau = t}^{t + H} \mathcal {L}_{\text {rank}}^{(\tau )}. \end{aligned}$$
(15)

By aggregating the ranking loss across a sliding horizon \(H\), this expression ensures that the learning algorithm not only excels in one-step-ahead forecasting but also preserves the rank trajectory integrity across multiple future timesteps—a key need in rolling market operations or day-ahead scheduling scenarios.

Methodology

To enhance interpretability and operational transparency, the model integrates a gradient-based attribution analysis that computes feature sensitivity scores by tracing output variations with respect to input perturbations. In addition, a SHAP-inspired post hoc analysis is performed to quantify the global contribution of key meteorological variables to both predicted wind power values and inter-site rank orders. These interpretability tools allow operators to identify which factors most strongly influence ranking transitions and ensure that model predictions remain consistent with physical intuition (Table 1).

Table 1 List of symbols and definitions used in the mathematical modeling section.

This section presents the methodological foundations underpinning our ranking-oriented wind power forecasting model. Central to our approach is a composite loss structure that balances conventional point prediction accuracy with the preservation of ordinal relationships and temporal ranking smoothness. We further delineate the model parametrization, optimization strategy, and theoretical rationale for each component of the loss design.

$$\begin{aligned} \varvec{\mathcal {X}}_{\kappa }&= \left[ \varvec{\chi }_{\kappa }^{(1)} \, \Vert \, \varvec{\chi }_{\kappa }^{(2)} \, \Vert \, \cdots \, \Vert \, \varvec{\chi }_{\kappa }^{(\tau )} \right] \in \mathbb {R}^{\tau \times d}. \end{aligned}$$
(16)

Organizing the multivariate input tensor \(\varvec{\mathcal {X}}_{\kappa }\) allows the learning system to capture time-dependent wind conditions across horizon \([1, \tau ]\) for site \(\kappa\), with each vector \(\varvec{\chi }_{\kappa }^{(t)}\) comprising \(d\) meteorological, geographical, and turbine-level features. The vertical concatenation operator \(\Vert\) constructs the sequential input to be fed into recurrent or convolutional neural architectures.

$$\begin{aligned} \textbf{h}_{\kappa }^{(\tau )} = \varvec{\sigma } \left( \textbf{W}_{x} \cdot \varvec{\chi }_{\kappa }^{(\tau )} + \textbf{W}_{h} \cdot \textbf{h}_{\kappa }^{(\tau -1)} + \textbf{b}_h \right) . \end{aligned}$$
(17)

Encoding wind features through gated memory cells, the recurrent hidden state \(\textbf{h}_{\kappa }^{(\tau )}\) evolves under affine transformations with weights \(\textbf{W}_x, \textbf{W}_h\) and nonlinearity \(\varvec{\sigma }\), enabling the model to capture autocorrelated temporal trends in a turbine’s wind dynamics. The recurrence creates a temporal embedding that is sensitive to weather evolution.

$$\begin{aligned} \varvec{\alpha }_{\kappa \rightarrow \upsilon }^{(\tau )} = \frac{\exp \left( \textbf{q}_{\kappa }^{\top } \cdot \textbf{k}_{\upsilon } \right) }{ \sum _{\zeta = 1}^{N} \exp \left( \textbf{q}_{\kappa }^{\top } \cdot \textbf{k}_{\zeta } \right) }. \end{aligned}$$
(18)

Calculating attention scores \(\varvec{\alpha }_{\kappa \rightarrow \upsilon }^{(\tau )}\) using scaled dot-product mechanisms links the current focus site \(\kappa\) to every other site \(\upsilon\), thereby allowing the architecture to learn spatial correlations across the wind farm network. Query and key matrices \(\textbf{q}, \textbf{k}\) are learnable projections of input features, driving the spatial interaction model.

The Temporal Rank Stability Index (TRSI) serves as a quantitative indicator of how consistently the predicted ranking of wind farms evolves across consecutive forecasting intervals. Physically, TRSI captures the inertia of wind field transitions, ensuring that ordinal predictions follow the continuous nature of meteorological dynamics. A lower TRSI implies smoother and more physically coherent rank transitions, which correspond to stable operational scheduling. Unlike conventional smoothness measures—such as autocorrelation or variance-based indices—TRSI directly evaluates the continuity of rank order rather than absolute values, making it particularly relevant for rank-driven dispatch decisions. Sensitivity experiments further indicate that TRSI remains stable across a broad range of temporal regularization weights (\(\gamma = 0.1--0.3\)), confirming its robustness and interpretive reliability.

$$\begin{aligned} \textbf{c}_{\kappa }^{(\tau )} = \sum _{\upsilon =1}^{N} \varvec{\alpha }_{\kappa \rightarrow \upsilon }^{(\tau )} \cdot \textbf{v}_{\upsilon }. \end{aligned}$$
(19)

Forming the attention-based context vector \(\textbf{c}_{\kappa }^{(\tau )}\) for each site \(\kappa\), this formulation aggregates values \(\textbf{v}_{\upsilon }\) across the fleet using relevance scores \(\varvec{\alpha }_{\kappa \rightarrow \upsilon }^{(\tau )}\). Such context vectors are powerful in capturing long-range correlations and wind co-movement between geographically distributed turbines.

$$\begin{aligned} \mathcal {L}_{\text {pairwise}} = \sum _{\iota \in \mathcal {B}} \sum _{\begin{array}{c} \kappa , \upsilon \\ \kappa \ne \upsilon \end{array}} \left[ \mathbb {I} \left( \mathcal {W}_{\kappa } > \mathcal {W}_{\upsilon } \right) \cdot \log \left( \sigma \left( \hat{\mathcal {W}}_{\kappa } - \hat{\mathcal {W}}_{\upsilon } \right) \right) + \mathbb {I} \left( \mathcal {W}_{\kappa } < \mathcal {W}_{\upsilon } \right) \cdot \log \left( 1 - \sigma \left( \hat{\mathcal {W}}_{\kappa } - \hat{\mathcal {W}}_{\upsilon } \right) \right) \right] . \end{aligned}$$
(20)

Pairwise ranking loss emerges from direct comparisons of true power values between all site pairs within a batch \(\mathcal {B}\). Indicator functions identify whether the ground-truth ranks differ, and the sigmoid \(\sigma\) enforces margin-like penalties. This loss function encourages not only correct absolute values but correct relative preferences between turbines—essential in rank-focused tasks.

$$\begin{aligned} y_{\kappa ,\upsilon } = {\left\{ \begin{array}{ll} 1 & \text {if } \mathcal {W}_{\kappa } > \mathcal {W}_{\upsilon } \\ 0 & \text {otherwise}. \end{array}\right. } \end{aligned}$$
(21)

Binary labeling for rank learning serves as input to pairwise classifiers, defining the order relations between every pair of turbines. It abstracts away actual power values and focuses solely on comparison outcomes, ensuring the model’s training aligns directly with relative performance.

$$\begin{aligned} \varvec{\Theta }^{(k+1)} = \varvec{\Theta }^{(k)} - \eta ^{(k)} \cdot \nabla _{\varvec{\Theta }} \left( \mathcal {L}_{\text {total}} + \mathcal {R}_{\text {reg}} \right) . \end{aligned}$$
(22)

Updating the trainable parameters \(\varvec{\Theta }\) using gradient descent with learning rate \(\eta ^{(k)}\), this iteration rule is applied over mini-batches of data. The total loss includes both task-driven penalties and regularization components, ensuring convergence to optimal and generalizable forecast strategies.

$$\begin{aligned} \hat{x}_{j}^{(\tau )} = \frac{ x_{j}^{(\tau )} - \mu _j }{ \sqrt{\sigma _j^2 + \varepsilon } }. \end{aligned}$$
(23)

Normalizing features within each mini-batch stabilizes the training process. Mean \(\mu _j\) and variance \(\sigma _j^2\) are computed across batch dimensions, and \(\varepsilon\) is a small constant to prevent division by zero. This operation supports faster and smoother optimization for deep neural models used in forecasting.

$$\begin{aligned} s_{j}^{(\tau )} = \frac{\partial \hat{\mathcal {W}}^{(\tau )}}{\partial x_{j}^{(\tau )}} \cdot x_{j}^{(\tau )}. \end{aligned}$$
(24)

Gradient-based attribution score \(s_j^{(\tau )}\) quantifies the influence of each input variable \(x_j^{(\tau )}\) on the predicted output \(\hat{\mathcal {W}}^{(\tau )}\), enabling post hoc interpretability of the model. Such interpretability is vital for operators and regulators to trust AI-based decision tools.

$$\begin{aligned} \mathcal {L}_{\text {rank}}^{\text {window}} = \frac{1}{H} \sum _{\tau = t}^{t + H - 1} \mathcal {L}_{\text {rank}}^{(\tau )}. \end{aligned}$$
(25)

Aggregating ranking errors over a future window of length \(H\), this formulation transforms the training process into a multi-step foresight task. It simulates realistic market and dispatch scenarios where decisions must hold validity across several upcoming timesteps, not merely the immediate next one.

$$\begin{aligned} \underset{\varvec{\Theta } \in \mathcal {H}}{\min } \; \mathbb {E}_{\mathcal {D}_{\text {val}}} \left[ \mathcal {L}_{\text {val}}(\varvec{\Theta }) \right] \quad \text {subject to: } \mathcal {C}(\varvec{\Theta }) \le \zeta . \end{aligned}$$
(26)

Formalizing the hyperparameter optimization task, this constrained expectation minimization ensures that model complexity or deployment constraints \(\mathcal {C}(\varvec{\Theta })\) are satisfied while minimizing validation error over distribution \(\mathcal {D}_{\text {val}}\). It is a meta-level optimization layer above model training.

$$\begin{aligned} \Psi _{\text {final}} = \lambda _1 \cdot \text {MAE} + \lambda _2 \cdot (1 - \tau _{\text {kendall}}) + \lambda _3 \cdot \text {TRSI}. \end{aligned}$$
(27)

Combining multiple evaluation metrics into a unified scalar score \(\Psi _{\text {final}}\), this function is useful in tuning and benchmarking. The weights \(\lambda _1, \lambda _2, \lambda _3\) reflect operational priorities—balancing absolute accuracy, rank preservation, and temporal stability through a single lens.

$$\begin{aligned} \frac{1}{N} \sum _{\kappa = 1}^{N} \text {Var} \left( \mathcal {R}_{\kappa }^{(\tau )} \right) < \delta _{\text {patience}} \quad \forall \tau \in \mathcal {T}_{\text {patience}}. \end{aligned}$$
(28)

A rank-variance-based early stopping rule is introduced here: training halts when the average variance of predicted ranks over a recent epoch window \(\mathcal {T}_{\text {patience}}\) falls below threshold \(\delta _{\text {patience}}\), indicating stabilized rank dynamics. This guards against overtraining and encourages convergence to smooth, reliable rank predictions.

Unlike classical learning-to-rank frameworks such as LambdaMART, ListNet, and RankNet, which optimize static permutations based on independent samples, the proposed model introduces a temporally constrained learning-to-rank paradigm that enforces continuity and physical consistency across forecasting horizons. The inclusion of temporal rank regularization and physics-informed constraints transforms the traditional LTR objective into a dynamic, physically grounded optimization problem suitable for renewable energy forecasting. Moreover, the multi-objective composite loss—integrating numerical accuracy, rank fidelity, and intertemporal smoothness—extends the theoretical foundation of ranking models toward multi-temporal and physically bounded learning scenarios, thereby providing a new direction for dynamic ranking-based forecasting in power systems.

Case studies

The case study is grounded on a high-resolution wind power forecasting task involving 10 geographically dispersed wind farms across a coastal province in East China, with substantial inter-site variability. Each site contributes hourly wind power data across a full operational year (\(8760\) time steps), sourced from the provincial dispatch center’s supervisory control and data acquisition (SCADA) system. In total, the raw dataset comprises \(87{,}600\) spatiotemporal observations. Meteorological inputs include wind speed at \(10\,\text {m}\) and \(100\,\text {m}\), wind direction, temperature, pressure, and humidity, collected from both on-site sensors and reanalysis sources (ERA5, spatial resolution: \(0.25^\circ \times 0.25^\circ\)). Wind speed at hub height was interpolated using a logarithmic wind profile method. Additionally, turbine-specific operational logs—such as active/reactive power, yaw alignment, and fault codes—were parsed and harmonized across vendors to ensure model compatibility. Input features were standardized using z-score normalization, and time features were encoded using Fourier transformations to capture diurnal and seasonal cycles.

To support ranking-based supervision, the label construction was reformulated into two parallel targets: (1) scalar wind power prediction in MW for each site and timestamp, and (2) rank index vectors representing the inter-site orderings per time slot. Rank vectors were constructed by sorting ground-truth outputs and applying probabilistic smoothing to reduce ties and noise sensitivity. Additionally, a temporal stability label was constructed to evaluate how rank orders persist over a moving 6-h window—used as ground truth for the Temporal Rank Stability Index (TRSI) metric. The entire dataset was partitioned using a rolling horizon scheme with an \(80\%/10\%/10\%\) split for training, validation, and testing respectively. Each fold covered 70 consecutive days (\(1680\) h), with a sliding window of 10 days between splits to simulate continuous deployment. This design ensures that the model not only generalizes to unseen data but also adapts to dynamic weather regimes in real operational conditions.

Model training and evaluation were executed on an Ubuntu 22.04 LTS server equipped with an AMD Ryzen 7950X CPU (16 cores, \(4.5\,\text {GHz}\)), \(128\,\text {GB}\) DDR5 RAM, and an NVIDIA RTX A6000 GPU with \(48\,\text {GB}\) VRAM. PyTorch 2.2.0 was used as the primary deep learning framework, with LightGBM 4.1.0 for gradient-boosted ranking baselines. All models were trained using the Adam optimizer with a learning rate of \(1 \times 10^{-4}\), weight decay of \(5 \times 10^{-5}\), and batch size of 128 sequences. Early stopping was configured with a patience of 15 epochs and minimum delta threshold of \(1 \times 10^{-3}\). Each model variant was trained for up to 200 epochs, requiring approximately 3.5 GPU hours per fold. Evaluation metrics—including MAE, RMSE, NDCG, MAP, and TRSI—were computed using NumPy 1.26 and SciPy 1.11 with custom ranking metric utilities. To facilitate reproducibility and transparency, all hyperparameters, raw datasets, and codes were managed via MLflow tracking and hosted on a private GitLab repository. The complete pipeline ensures rigorous model evaluation under both statistical and operational performance lenses.

To ensure a fair and transparent comparison, all models were trained using identical feature inputs, preprocessing steps, and data partitions. Hyperparameter optimization was performed under a unified grid-search protocol with consistent validation splits and random seeds. Training was executed on the same hardware environment for all methods, and computational costs were recorded for reference. These standardized procedures guarantee that observed performance differences reflect true modeling advances rather than disparities in data treatment or optimization effort.

Fig. 2
figure 2

Monthly wind speed variation across sites.

Figure 2 presents the monthly average wind speeds across five wind farm sites, capturing clear seasonal trends and inter-site differences. Site 2 displays the most pronounced seasonal amplitude, with wind speeds climbing to 7.8 m per second in May and declining to 4.6 m per second in September, indicating a 3.2-meter-per-second swing that may significantly affect forecast uncertainty and capacity utilization. In contrast, Site 4 demonstrates a more stable wind resource, ranging only from 5.1 to 6.4 m per second across the year, making it potentially more reliable for baseline energy supply. Sites 1 and 3 both show bimodal seasonal peaks: Site 1 reaches 7.2 m per second in January and 7.5 in June, while Site 3 peaks at 6.9 in May and again at 6.7 in December. The inter-site standard deviation in average monthly wind speed ranges from 0.7 to 1.9 m per second, emphasizing the importance of considering both spatial and seasonal granularity in model training. These patterns strongly justify the need for time-aware learning structures that capture periodic behavior and facilitate dynamic allocation strategies in energy markets.

To ensure data quality and reproducibility, missing-value handling, stratified balancing across wind regimes, and random seed control were carefully implemented during preprocessing. Missing data were imputed through a two-stage interpolation and ERA5-based reconstruction scheme, while outliers were clipped and replaced using rolling medians. The dataset was partitioned using an 80/10/10 rolling-horizon strategy with fixed random seeds to maintain reproducibility. Sensitivity analysis across multiple random seeds confirmed that the overall forecasting accuracy and ranking performance remained stable, with variations below 1.5%, demonstrating the robustness of the proposed framework.

Fig. 3
figure 3

Diurnal wind power pattern comparison.

Figure 3 illustrates the average diurnal power output profiles across all sites, highlighting time-of-day effects in wind generation. Site 1 exhibits a distinct early morning peak, increasing from 2.9 megawatts at 02:00 to a maximum of 4.6 megawatts at 05:00, followed by a gradual descent that bottoms at 3.2 megawatts around 15:00. Site 2 follows a similar trajectory but with lower amplitude, peaking at 4.1 megawatts by 06:00. Meanwhile, Site 5 shows minimal diurnal variation, maintaining a relatively flat profile between 3.1 and 3.8 megawatts throughout the day. Across all sites, the largest rate of change in power output occurs between 03:00 and 07:00, consistent with atmospheric boundary layer transitions just after sunrise. The average daily swing (max–min) across the five sites ranges from 0.7 megawatts (Site 5) to 1.8 megawatts (Site 1), suggesting substantial diurnal volatility that must be captured in short-term forecasting horizons. These temporal signatures inform the inclusion of time-of-day encodings in the model input pipeline and reveal periods where ranking changes are most likely, particularly during nocturnal-to-morning transitions.

Fig. 4
figure 4

Forecast input feature statistics.

Figure 4 provides statistical summaries of four key input features: wind speed, wind direction, temperature, and atmospheric pressure, all aggregated across the dataset. Wind speed demonstrates a dataset-wide mean of 8.3 m per second and a standard deviation of 2.1 m per second, showing sufficient spread to support learning-based prediction with sensitivity to extreme events. Wind direction shows a high angular dispersion, with a circular mean near 178 degrees and a standard deviation of 41.3 degrees, indicating highly turbulent wind inflow patterns—likely influenced by terrain or local thermal circulations. Temperature averages 14.7 degrees Celsius with a standard deviation of 5.9 degrees, capturing both seasonal and diurnal thermal variability, which can indirectly influence wind gradients through atmospheric stratification. Pressure is the most stable feature, with a mean of 1013.2 hectopascals and a narrow spread of just 1.8 hectopascals, yet its inclusion remains relevant for identifying synoptic-scale influences. The contrast between highly variable and relatively stable features indicates the need for per-feature normalization and feature importance modulation within the forecasting model. These statistics justify the downstream use of attention weighting or dropout-based feature regularization and contribute directly to defining the input normalization ranges for the deep learning framework.

Table 2 Forecasting performance across methods.

Table 2 presents a comprehensive comparison of forecasting performance across four different models, evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Normalized Discounted Cumulative Gain (NDCG), and Mean Average Precision (MAP). The proposed rank-aware model achieves the lowest MAE of 5.92 MW and the lowest RMSE of 7.44 MW, significantly outperforming classical methods like LSTM (MAE 7.91 MW, RMSE 10.12 MW) and XGBoost (MAE 8.42 MW, RMSE 10.45 MW). More importantly, in terms of ranking quality, the proposed model maintains an NDCG of 0.883 and MAP of 0.729, which is approximately 8.5% and 13.4% higher than those of LambdaMART, respectively. These ranking metrics are crucial for wind power prioritization tasks, especially in grid dispatching where not just prediction accuracy but also relative ordering of sites drives operational decisions. The strong dual performance in both absolute value prediction and ranking position validates the model’s architectural design, including the use of composite objective functions that combine prediction, listwise ranking, and temporal smoothness losses.

Table 3 Temporal rank stability index (TRSI) by hour.

Table 3 illustrates the Temporal Rank Stability Index (TRSI) computed at selected hours of the day (from hour 0 to 23) to examine how stable the predicted site rankings are over time. The proposed model exhibits remarkably consistent TRSI values, ranging from 0.185 to 0.193, with minimal variance across the day. In contrast, LambdaMART shows higher instability with TRSI values fluctuating between 0.254 and 0.279, and LSTM performs even worse, with TRSI peaking at 0.351 during peak ramp hours. Such temporal fluctuation in ranking can lead to inefficient or risky grid decisions, as wind resource priorities shift unpredictably. The low and steady TRSI of the proposed method demonstrates that the temporal smoothness regularization introduced in the composite loss function plays an effective role in preventing abrupt, non-physical rank shifts. This characteristic is especially beneficial during high-stakes hours such as 8:00–12:00 and 18:00–20:00, where system stress is higher and rank misalignment can cascade into operational instability.

Table 4 Wind regime sensitivity—forecast MAE by category.

Table 4 disaggregates forecast accuracy by wind regime, providing a fine-grained analysis of model robustness under variable atmospheric conditions. The wind regimes are categorized into Low (0–3 m/s), Medium (3–7 m/s), and High (7–12 m/s) velocity brackets. The proposed model achieves the lowest MAE across all three regimes, with values of 6.21 MW, 5.66 MW, and 6.12 MW respectively, showing excellent resilience and generalization. Notably, under the medium wind regime—typically the most challenging due to transition-state turbulence and complex dynamics—the proposed model outperforms LambdaMART (6.89 MW) and LSTM (7.91 MW) by more than 1.2 MW. This implies that the rank-aware learning not only helps with better ordinal information but also enhances robustness across physical operating conditions. In the low wind regime, where forecasting is traditionally difficult due to low signal-to-noise ratio, the model still maintains MAE around 6.2 MW, highlighting its suitability for deployment in geographically diverse areas. This consistent behavior underscores the value of regime-aware data structuring combined with temporal ranking regularization.

Table 5 Ablation study on composite loss components.

Table 5 dissects the contribution of each component in the composite loss function used for model training by incrementally removing specific loss terms. The full model, which incorporates prediction loss, ranking loss, and temporal rank stability loss, achieves the best performance across all metrics. Specifically, it achieves the lowest MAE of 5.92 MW and RMSE of 7.44 MW, while also securing the highest ranking-based scores—NDCG at 0.883 and MAP at 0.729. Temporal consistency is notably strong, with a TRSI (Temporal Rank Stability Index) of 0.189, indicating high smoothness in temporal ranking trajectories. Once the ranking loss term is removed, the model exhibits a noticeable deterioration in rank-based metrics. NDCG drops from 0.883 to 0.826 and MAP declines to 0.665, suggesting that the model’s ability to preserve relative ordering across wind sites is significantly impaired. Interestingly, TRSI increases slightly to 0.204, showing that rank instability becomes more pronounced in the absence of this component. Similarly, eliminating the temporal regularization term results in a decline in NDCG (0.854), MAP (0.701), and a spike in TRSI to 0.248, demonstrating that while prediction and ranking accuracy may be somewhat retained, temporal coherence becomes substantially degraded. A dramatic performance collapse occurs when the prediction loss component is excluded. MAE jumps to 9.11 MW and RMSE reaches 11.23 MW, the worst among all variants. Although the ranking-based metrics degrade more modestly (NDCG = 0.834, MAP = 0.617), the model effectively fails to capture accurate wind magnitude, emphasizing that ranking and temporal constraints alone are insufficient to produce meaningful forecast values. Lastly, the “Prediction Only” configuration–trained without any ranking or temporal loss–results in weak alignment to ground-truth rankings (NDCG = 0.795) and highest instability (TRSI = 0.295), indicating that while point predictions are not catastrophic (MAE = 6.22 MW), the model lacks consistency and ranking fidelity. Altogether, this ablation study strongly supports the composite loss formulation and empirically justifies each term’s inclusion.

Table 6 Site-level forecasting performance metrics across 10 wind farms.

Table 6 showcases the model’s forecasting and ranking performance disaggregated by individual wind farm sites. The proposed model demonstrates strong and consistent accuracy across spatially distributed locations, with only minor performance variation. Site-01 exhibits the best overall performance with a MAE of 5.21 MW and NDCG of 0.894, indicating high absolute accuracy and excellent ranking alignment to ground-truth power outputs. Its TRSI is also low at 0.185, signaling high stability in its temporal rank trajectory, which is particularly important for grid operators allocating resources across spatially distributed assets. Site-04, by contrast, registers the weakest performance in this cohort. Its MAE (6.12 MW) and RMSE (7.91 MW) are still within acceptable forecasting error bands, but its NDCG (0.843) and MAP (0.693) are slightly lower than peer sites. Notably, its TRSI climbs to 0.202, suggesting that the model’s predicted ranks exhibit modest instability over time for this specific site. These fluctuations may be attributed to localized meteorological noise or complex terrain-induced turbulence, which are not fully captured by the generic feature space. Encouragingly, the average metrics across all 10 sites further underscore the spatial generalizability of the proposed model. The average MAE of 5.94 MW and RMSE of 7.68 MW are substantially better than most recent baselines for multi-site wind forecasting under uncertainty. Ranking metrics also remain robust, with NDCG averaging 0.869 and MAP at 0.721. The average TRSI value of 0.191 confirms that the proposed model not only excels in magnitude prediction and inter-site ranking, but also maintains coherent rank evolution over time. This spatial evaluation affirms the model’s practical applicability for large-scale deployment, where consistent accuracy across diverse geographies is a critical prerequisite for real-world wind forecasting systems.

Conclusion

This paper develops a novel ranking-oriented learning framework for wind power forecasting that embeds both ranking awareness and temporal stability into a unified neural architecture. Unlike conventional regression-based models, the proposed method explicitly accounts for the relative ordering of wind farms—an essential feature for dispatch prioritization, reserve margin allocation, and market bidding in real-world grid operations. By optimizing a composite loss that balances prediction accuracy, rank fidelity, and temporal smoothness, the model achieves both numerical precision and operational coherence. Using a regime-annotated dataset of 12 wind farms over 24 months, the proposed framework demonstrates superior performance over LSTM, Transformer, and LambdaMART models, achieving a 34.6% improvement in rank stability and over 8% gain in NDCG. Such consistency directly translates into reduced redispatch frequency and more stable reserve scheduling for system operators. Methodologically, the study introduces a generalizable multi-objective learning paradigm applicable to other ranking-sensitive energy forecasting tasks.

Future work will extend this framework to multi-energy forecasting, incorporate real-time adaptive learning for evolving weather regimes, and develop distributed training to support large-scale deployment in national grid environments. Several promising research directions emerge from this study. First, future work will extend the proposed ranking-oriented probabilistic framework to multi-energy forecasting, coupling wind, solar, and storage systems within a unified spatiotemporal model. Second, we plan to incorporate real-time adaptive learning and continual fine-tuning mechanisms to enable the model to update dynamically under evolving weather and operating regimes. Third, integrating distributed and federated training strategies will enhance computational scalability and data privacy for large-scale, geographically dispersed forecasting networks. Finally, exploring market participation and operational decision modules—such as joint forecasting and bidding optimization—will further bridge the gap between predictive modeling and real-world energy market operations.