Introduction

Wind energy is now a central element of the global Renewable Energy (RE) set-up, with an installed capacity of over 906 GW and contributing nearly 10% of global electricity production1. The refinement of wind turbine performance using Maximum Power Point Tracking (MPPT) algorithms remains essential for ensuring efficient energy capture under variable atmospheric conditions2. Large-scale Wind Energy (WE) systems generate multidimensional data streams from meteorological instrumentation, mechanical subsystems, electrical interfaces, and environmental monitors, resulting in feature spaces that frequently exceed 400–500 variables in utility-scale operations3.

Although high-dimensional data streams proposal opportunities to improve MPPT using data-driven methods, they also impose heavy computational demands on real-time control systems, which are constrained by latency requirements. The dimensionality problem in turbine monitoring introduces processing bottlenecks that impair control responsiveness, while redundant or unrelated variables contribute noise, reducing predictive reliability4. The central task is to isolate feature subsets that preserve predictive strength while lowering computational expense and removing redundancy5.

The feature selection (FS) in wind energy MPPT systems is a multi-objective task that demands a balance between predictive accuracy and computational efficiency. Using the complete set of measurements leads to prohibitively complex solutions, limiting their suitability for embedded controllers, whereas overly aggressive reductions risk degrading control quality. This trade-off remains difficult to address in the absence of systematic optimization models6.

The complexity of FS is exacerbated by the heterogeneity of wind farms, which span diverse geographical settings, turbine designs, and operating conditions. Temporal variability in wind regimes, nonlinear aerodynamic interactions, and changing environmental factors require methods that generalize across deployment scenarios while maintaining reliable real-time performance7.

The FS methods for WE applications can be grouped into numerous classes, each with inherent problems. Filter Methods (FM), including mutual information ranking and correlation analysis, offer speed but neglect interactions vital for MPPT optimization. Wrapper Methods (WMs), such as Genetic Algorithms (GA) and Particle Swarm Optimization (PSO), capture dependencies but incur high computational costs and a risk of overfitting in real-time conditions8. Embedded methods, such as the Least Absolute Shrinkage and Selection Operator (LASSO), perform selection during training but remain tied to narrow model families and struggle with the nonlinearities inherent in WE data9. Hybrid methods aim to merge paradigms, though many lack structured integration and fail to account for the multi-objective requirements of the optimization task10.

Critical gaps include the absence of systematic multi-dataset validation models, insufficient consideration of temporal dependencies, and a lack of adaptive mechanisms for handling dynamic operational conditions across diverse WE contexts. To address these gaps, this research introduces a novel two-stage hybrid FS designed explicitly for WE-based MPPT applications. The methodology uniquely integrates Mutual Information (MI)-based statistical filtering for initial dimensionality reduction with an Adaptive Multi-Objective Binary Harmony Search (AMO-BHS) algorithm for fine-tuning the accuracy-efficiency trade-off.

Unlike conventional single-stage methods or static metaheuristics, the proposed model introduces three key innovations:

  1. (1)

    Synergistic two-stage decomposition that achieves computational tractability (MI pre-filtering reduces search space complexity) while preserving multi-objective optimization capability;

  2. (2)

    Adaptive AMO-BHS mechanisms including dynamic parameter control (power-law HMCR adjustment, exponential PAR decay), diversity preservation methods, and constraint-aware repair procedures precisely tailored for binary FS problems;

  3. (3)

    Systematic multi-dataset validation across operationally heterogeneous environments (utility-scale temperate, tropical monsoon, laboratory experimental) that ensures methodological robustness beyond single-site evaluations. This integrated method addresses the identified gaps by developing algorithms and empirically validating them across diverse WE contexts.

The primary contributions include:

  1. (a)

    Development of a systematic hybrid model that integrates statistical and metaheuristic methods for multi-objective FS in WE systems,

  2. (b)

    Introduction of AMO-BHS with adaptive parameter control, diversity preservation, and constraint handling mechanisms designed for binary optimization problems,

  3. (c)

    The Comprehensive validation across three distinct datasets representing experimental laboratory conditions, utility-scale operations, and tropical wind farm environments,

  4. (d)

    The statistical analysis signified significant performance improvements over developed baseline methods. The methodology addresses the critical need for computationally efficient yet accurate FS to enhance MPPT performance across diverse operational conditions. The economic implications are substantial, as even modest improvements in wind energy conversion efficiency translate into significant revenue increases over the operational lifetimes of wind installations.

The remainder of this paper is structured as follows: Section “Literature review” reviews related work in FS and WE optimization. Section “Problem definition and formulation” presents the mathematical formulation of the problem. Section “Multi-dataset model and data preprocessing” describes the multi-dataset model and preprocessing procedures. Section “Methodology” details the proposed hybrid model. Section “Experimental setup” outlines the experimental setup and evaluation metrics. Section “Results” presents comprehensive results and performance analysis. Section “Conclusion and future works” discusses implications and limitations. Section 9 concludes with a discussion of future research directions.

Literature review

Multi-objective feature selection models (MOFSM)

Recent studies have increasingly emphasized multi-objective optimization in WE prediction. Alharthi et al.11 introduced a Non-dominated Sorting Genetic Algorithm-III (NSGA-III) that integrates FS with a hybrid Deep Residual Network-Long Short-Term Memory (DRN–LSTM) model, embedding a data transformation step before training. Their method achieved substantial gains, with error metrics reduced to MSE = 2.65 × 10⁻1⁰ and RMSE = 1.63 × 10⁻5, outperforming conventional baselines. This work verified the use of Pareto-based search to maintain predictive accuracy while enhancing computational efficiency, thereby positioning NSGA-III as a benchmark for MOFSM in RE.

Cai et al.12 developed an Orthogonalized Maximal Information Coefficient (OMNIC) filter combined with an adaptive fractional Pareto motion model. By modelling long-range dependence among meteorological variables, they reported reductions of over 9% in Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). Collectively, these studies highlight the increasing importance of integrating multi-objective optimization with information-theoretic measures to help more reliable feature subset discovery in wind prediction.

Harmony search algorithms (HAS) for feature selection (FS)

Beyond genetic approaches, Harmony Search (HS) and its extensions have been applied as competitive metaheuristics for FS. Gholami et al.13 proposed the Improved Binary Global Harmony Search (IBGHS), introducing modified improvisation rules that accelerated convergence. Tests on 18 benchmarks proved performance comparable to that of leading methods, such as GA, PSO, and Ant Colony Optimization (ACO), while preserving the algorithmic simplicity of HS.

Shi et al.14 tailored Binary Harmony Search (BHS) for channel selection in brain–computer interfaces, reporting significant improvements in accuracy (p < 0.05) and reduced computational cost. The study proved BHS as a practical tool for high-dimensional, noise-sensitive domains, with implications for WE-based Supervisory Control and Data Acquisition (SCADA) analysis.

Building on these ideas, Wang et al.15 developed an Equilibrium Optimization-based Harmony Search Algorithm (HAS) with Nonlinear Dynamic Domains (EO-HS-NDD). By adaptively balancing exploration and exploitation, EO-HS-NDD surpassed nine HS variants and several “De-implementations” (DEi) on Clinical Ethics Committee (CEC) benchmarks and real-world problems, confirming its robustness and scalability. Together, these works establish HSA as a strong candidate for binary optimization in large feature spaces.

Application-oriented feature selection in wind energy

The FS has also been embedded within specialized prediction tasks. Baisthakur and Fitzgerald16 combined MI ranking with recursive feature addition to improve LSTM-based blade tip deformation prediction, reducing dependence on high-cost network simulations and enabling virtual sensing for turbine management.

Zhou et al.17 integrated Bayesian Optimization with Boruta and LightGBM for grid frequency prediction, surpassing the Least Absolute Shrinkage and Selection Operator (LASSO) and Recursive Feature Elimination (RFE) in terms of accuracy and robustness under fault conditions.

Wang et al.18 introduced a pre-filtering scheme for SCADA data, linking MPPT approaches with rotor–pitch–speed distributions to reinforce multivariate power prediction. Belletreche et al.19 extended the scope to desert climates, employing hybrid attention-driven Neural Networks (DNNs) with FS analysis, which lowered the RMSE by 22.9%. Chinnappan and A.20 proposed a hybrid Machine Learning (ML)-based physics-informed neural networks (ML–PINN)–Simulink network that combines statistical accuracy with physical interpretability and is validated under climate change scenarios in southern India. Collectively, these contributions highlight the significance of domain-aware preprocessing and hybrid coupling with physical or neural models, ensuring that feature subsets remain efficient and physically interpretable.

Recent advances in RE optimization have extended wrapper-based FS methodologies beyond traditional GA. Chen et al.21 verified PSO-based Integral Backstepping for sensorless MPPT control in photovoltaic systems, achieving 99.99% efficiency through metaheuristic parameter optimization that eliminates environmental sensor dependencies while maintaining robust tracking performance. Harrison et al.22 introduced hybrid analytical-ML models combining Bayesian regularization neural networks with fractional-area MPPT algorithms, outperforming conventional metaheuristics (PSO, GWO, Differential Evolution (DE)) in convergence speed and stability. Abdelsattar et al.23 applied STATCOM to improve voltage stability in the Egyptian WE grid, demonstrating intelligent control system enhancements enabled by optimization-driven parameter tuning.

ML and Deep Learning (DL) have proven effective for RE prediction tasks: Emad-Eldeen et al.24 achieved improved prediction accuracy in energy storage performance prediction from data-driven Feature Extraction (FE), while Abdelsattar et al.25 developed advanced ML for Solar Energy (SE) generation prediction and fault detection, highlighting feature engineering as critical for predictive accuracy. Metaheuristic optimization methods have successfully addressed hybrid RE systems: Abdelsattar et al.26 introduced the mountain gazelle optimizer for standalone power system sizing, and Abdelsattar et al.27 compared multiple meta-heuristic algorithms for optimal hybrid model configuration in remote regions. El Sayed et al.28 integrated Artificial Intelligence (AI) fuzzy control with optimization of the lightning attachment method for grid-connected wind/PV systems, thereby enhancing low-voltage ride-through and power quality. These studies exemplify contemporary trends toward adaptive WM and intelligent optimization methods that balance solution quality with computational constraints—paralleling the proposed integration of multi-objective harmony search with statistical filtering for WE applications, where dimensional tractability and real-time feasibility similarly drive algorithmic design choices.

ML and intelligent control methods have been extensively applied to wind energy MPPT optimization. Govinda Chowdary et al.29 developed hybrid fuzzy-logic-based MPPT controllers for WE conversion systems, signifying that intelligent control approaches can adapt to nonlinear wind dynamics more effectively than conventional methods. Mariprasath et al.30 designed Artificial Neural Network (ANN) controllers for wind power plants, achieving energy efficiency enhancements from data-driven parameter optimization. Recent advances in nature-inspired metaheuristics have expanded optimization capabilities: Anbazhagan et al.31 introduced the Egret Swarm Algorithm for hybrid solar-wind systems, signifying that bio-inspired optimization can navigate complex multi-modal search spaces in RE applications. These studies exemplify the increasing integration of computational intelligence and metaheuristic optimization in WE control, contextualizing our AMO-BHS method within contemporary trends toward adaptive, data-driven MPPT methodologies.

Summary and gap identification

The literature highlights three converging themes: (a) the use of multi-objective evolutionary selection (e.g., NSGA-III, IBGHS, EO-HS-NDD) in balancing accuracy–efficiency trade-offs, (b) the importance of information-theoretic filters (e.g., OMNIC, MI) in ranking relevance under non-linear dependencies, and (c) the integration of domain-specific preprocessing and Hybrid Neural Network (HNN) to ensure predictive accuracy and operational interpretability.

However, existing work is limited by either single-dataset evaluations or static parameter settings in metaheuristics, which can lead to overfitting and reduce the generalizability of results. Few studies provide systematic multi-dataset validation across heterogeneous wind environments or explicitly integrate adaptive mechanisms for parameter control and diversity preservation. These gaps motivate the development of a hybrid MI–AMO-BHS, which embeds statistical filtering and adaptive harmony search into a unified, FS method for MPPT optimization.

Problem definition and formulation

The WE systems generate extensive high-dimensional sensor data derived from meteorological, mechanical, electrical, and environmental channels. Such data complexity poses significant computational challenges for real-time MPPT, where prediction speed and accuracy are critical. The essential problem addressed in this study is selecting optimal feature subsets that retain predictive accuracy while reducing computational cost and eliminating redundancy. This challenge is formalized as a constrained multi-objective optimization problem tailored to, WE deployment scenarios. Mathematical symbols and notations used throughout this manuscript are defined in Table 1 (Nomenclature) for reference.

Table 1 Nomenclature.

The complete monitoring feature set is expressed as Eq. (1)

$$X=\left\{{x}_{1},{x}_{2},\dots ,{x}_{d}\right\}$$
(1)

where,

  • \(X\)→ The comprehensive set of available measurements

  • \(d\)→ The total feature dimensionality

  • each \({x}_{i}\) →The operational variable such as Wind Speed (WS), wind direction, turbulence intensity, generator torque, rotor speed, blade pitch angle, ambient temperature, air density, humidity, active power, reactive power, or voltage.

FS is represented by a binary decision vector, Eq. (2)

$$z={\left[{z}_{1},{z}_{2},\dots ,{z}_{d}\right]}^{T}\in \{\text{0,1}{\}}^{d}$$
(2)

where,

  • \({z}_{i}=1\) →Inclusion of feature

  • \({x}_{i}\) and \({z}_{i}=0\) →Exclusion of feature

The optimal feature subset is thus defined as, Eq. (3)

$${F}^{*}=\left\{{x}_{i}\in X\mid {z}_{i}=1\right\}$$
(3)

The FS simultaneously optimizes two conflicting objectives: minimizing prediction loss and minimizing the subset’s cardinality.

The problem is formulated as Eq. (4)

$$\underset{z\in \{\text{0,1}{\}}^{d}}{\text{min}} J(z)=\left[{L}_{M}(z),\frac{1}{d}\sum_{i=1}^{d} {z}_{i}\right]$$
(4)

where,

  • \({L}_{M}(z)\) →The validation loss obtained by predictive model \(M\) trained on the subset \({F}^{*}\)

  • \(\frac{1}{d}\sum_{i=1}^{d} {z}_{i}\) →The normalized feature ensures comparability across different dimensionalities.

Practical deployment imposes additional feasibility constraints. A minimum FE ensures that a sufficient number of critical variables are retained for stable MPPT control, Eq. (5)

$$\sum_{i=1}^{d} {z}_{i}\ge {n}_{\text{min}}$$
(5)

where, \({n}_{\text{min}}\)→The minimum number of features required to guarantee control stability and observability of the WE system.

Conversely, computational latency constraints require an upper bound, Eq. (6)

$$\sum_{i=1}^{d} {z}_{i}\le {n}_{\text{max}}$$
(6)

where, \({n}_{\text{max}}\)→The computationally permissible feature count for real-time processing.

The formulation assumes binary feature selection (features are either fully included or excluded), mutual information as a valid measure of feature relevance, and temporal stability of feature-target relationships within the training period. The cardinality constraints (Eqs. 5, 6) represent operational requirements: \({n}_{min}\) ensures sufficient variables for control stability and observability, while \({n}_{max}\) enforces computational feasibility for real-time SCADA processing.

Multi-dataset model and data preprocessing

Data sources

The data sources and collection procedures provide the empirical basis for evaluating metaheuristic-optimized DL models for MPPT-based WE conservation across 3 distinct datasets spanning utility-scale operations, laboratory conditions, and operational environments. Each dataset provides independent validation of the proposed optimization model’s use while contributing unique insights into algorithm performance under specific operational contexts. The three-dataset model ensures comprehensive analysis of the complete methodology across diverse WE scenarios while maintaining clear experimental limits and reproducible evaluation protocols.

Dataset 1: Kelmarsh Wind Farm (Public Dataset)

The Kelmarsh Wind Farm dataset32 contains 4.5 GB of 10-min interval time-series data collected from a 6-turbine wind farm in the UK between January 2016 and June 2021. It includes 287,897 PMU records across 111 electrical parameters (active power: 0–2100 kW, reactive power: –800 to 800 kVAR, voltage: 380–420 V, frequency: 49.5–50.5 Hz), and 289,152 Grid Meter experimental spanning 14 import/export metrics. Each turbine is represented by 36 files, totaling 464 sensor streams. These sensor streams include wind speed (0–30 m/s with ± 0.1 m/s accuracy), rotor speed (0–25 rpm with ± 0.01 rpm precision), and pitch angle (0–90° with 0.1° resolution). SCADA logs document 2,122 events, including 1,797 informational, 168 warnings, 129 stops, and 28 communication failures, aligned with IEC classification standards.

Dataset 2: Reverse-Acting Controller-Wind Energy Conversion System (RAC-WECS)- (Public Dataset)

The RAC-WECS dataset33 captures high-resolution experimental data from a small-scale turbine using a multilevel boost converter MPPT control method over a 76.6-s trial, sampled at 20 Hz, resulting in a total of 299.5 KB. It includes 5 time-series streams: output voltage (1,515 records; –0.95 to 150.4 V; mean 73.9 V), output current (1,532 records; 0.024–0.75 A; mean 0.44 A), rotor speed (1,498 records; 0.21–111.4 rad/s; mean 75.9 rad/s), reference speed setpoints (1,478 records; 58.9–109.5 rad/s; Mean 87.9 rad/s), and PWM duty cycles (1,459 records; 5.0–90.0%; mean 33.3%). The experiment utilizes a structured wind profile (5 → 7 → 6 → 4 → 7 m/s) to evaluate control performance under varying conditions, with turbine specifications held constant (0.6 m rotor radius, 1.13 m2 swept area, tip speed ratio of 9.1). A MATLAB analysis model supports the prediction of empirical power coefficients, enabling comprehensive validation of MPPT control usage.

Dataset 3: VV Wind Farms (Collected Dataset)

The VV Dataset comprises 12 months (January–December 2023) of high-quality 10-min interval data from two turbines (UD-ET-07: 2.1 MW, DFIM; UD-ET-11: 1.5 MW, SCIG) at VV Wind Farms, Tamil Nadu, totalling ~ 105,000 records across 87 features under tropical monsoon conditions. Each turbine contributed ~ 52,560 samples spanning meteorological (e.g., wind speed: 0–60 m/s, accuracy ± 0.1 m/s), rotor, generator, pitch, yaw, control, electrical, turbulence, wind shear, air density, and derived performance parameters (e.g., MPPT accuracy, tip speed ratio, Cp, η). Measurements adhered to IEC standards using calibrated anemometers, wind vanes, RTDs, barometers, torque sensors, and 0.2S-class power analyzers. Data availability exceeded 90%, with rigorous validation pipelines for outlier removal, synchronization, normalization, and sensor redundancy checks. The dataset enables the robust evaluation of MPPT optimization and FS methodologies in real-world settings across diverse operational states and seasonal regimes.

Dataset selection rationale

The three datasets were strategically selected to validate FS robustness across operationally heterogeneous WE contexts, addressing the critical gap that existing studies address only at single sites and risk overfitting to site-specific features.

The selection criteria ensure coverage of 3 key dimensions of operational variability:

  1. A.

    Scale and Complexity Gradient: Kelmarsh (464 features, utility-scale 6-turbine farm) represents high-dimensional industrial deployments where FS is most critical for computational tractability. RAC (5 features, laboratory experimental) represents minimal-complexity controlled environments in which feature interactions are transparent and ground truth is verifiable. VV (87 features, operational dual-turbine farm) represents an intermediate level of complexity, typical of modern commercial installations. This gradient (5 → 87 → 464 features) enables a systematic test of how algorithmic performance scales with dimensionality.

  2. B.

    Environmental and Climatic Diversity: Kelmarsh captures temperate maritime conditions (UK, moderate wind variability, seasonal patterns, offshore-influenced regimes) representing the dominant deployment context for European and North American wind farms. VV captures tropical monsoon conditions (Tamil Nadu, India; high humidity; rapid wind regime transitions; extreme seasonal variability), representing emerging markets in South and Southeast Asia where FS must accommodate diverse meteorological dynamics. RAC provides controlled laboratory conditions with minimal environmental noise, serving as a baseline for algorithm validation under ideal conditions. This diversity ensures the methodology generalizes beyond single climatic contexts.

  3. C.

    Data Accessibility and Reproducibility: Kelmarsh and RAC are publicly available datasets (Zenodo https://doi.org/10.5281/zenodo.5841834; Mendeley https://doi.org/10.17632/363d24mcb6.2), commonly used in WE research, enabling direct comparison with the existing literature and helping reproducibility. VV represents proprietary operational data from a commercial wind farm under a data-sharing agreement (VV-RA-2023-047), providing real-world validation beyond publicly curated benchmarks that may be subject to selection bias.

  4. D.

    MPPT Control Paradigm Coverage: Kelmarsh employs Grid-connected doubly-Fed Induction Generator (DFIG) turbines with pitch control and variable-speed operation, representing modern utility-scale MPPT. RAC implements a multilevel boost-converter MPPT for small-scale standalone systems, applicable to off-grid and distributed generation contexts. VV includes DFIG (UD-ET-07) and Squirrel-Cage Induction Generator (SCIG) (UD-ET-11) configurations, capturing control heterogeneity within single installations. This coverage ensures FS applicability across diverse MPPT control strategies.

Collectively, these 3 distinct datasets provide comprehensive validation spanning 5 orders of magnitude in feature dimensionality (5–464), 3 climatic regimes (temperate, tropical, controlled), 2 accessibility types (public, proprietary), and 3 MPPT paradigms (grid-connected DFIG, standalone converter, mixed DFIG/SCIG). This FS directly addresses the multi-dataset validation gap identified in Section “Summary and gap identification”, ensuring that performance improvements are methodologically robust rather than objects of single-site optimization.

Data preprocessing

The data preprocessing model transforms raw WE measurements into analysis-ready formats while preserving the information content relevant to MPPT optimization. The imputation methodology assumes missing data mechanisms are random (not systematically related to unobserved values), which enables valid statistical inference from imputed values. Missing data analysis revealed incompleteness levels of 2.3% (Kelmarsh), 0.8% (RAC), and 8.9% (VV). A duration-based imputation method applied linear interpolation to short gaps (< 30 min, 65% of instances), seasonal-trend decomposition to medium gaps (30 min–2 h, 28%), and multivariate regression to extended gaps (> 2 h, 7%). Turbine-month segments with cumulative missingness exceeding 15% were excluded from analysis to maintain data quality standards. Outlier detection used the modified z-score method, identifying nearly 3.2% of measurements as outliers. These combined measures restored temporal integrity and generated final completeness above 96% across all datasets.

Temporal alignment standardized datasets to a 10 min reference grid, anchored to the Kelmarsh sampling interval. The RAC dataset, initially sampled at 20 Hz, was decimated using an 8th-order Butterworth anti-alias filter with a 0.5 Hz cutoff, and then aggregated to a 1 s resolution to preserve short-term dynamics. The VV dataset required timestamp correction for 2.1% irregular intervals, using limited interpolation to restore nominal 10 min spacing. All datasets were standardized to Coordinated Universal Time (UTC), including correction of Kelmarsh daylight saving offsets and conversion of VV data from Indian Standard Time (UTC + 5:30).

Feature scaling applied z-score normalization across all variables. To address the bimodal distribution in power output, the Kelmarsh and VV datasets were subjected to a Box–Cox transformation (λ = 0.23) before normalization. The VV validation dataset used normalization parameters derived from the Kelmarsh training set, ensuring consistency and preventing data loss across experimental phases. Following z-score normalization, the datasets exhibited the following distributional parameters for the target variables: Kelmarsh active power (kW, kW), VV active power (kW, kW), and RAC rotor speed (RPM, RPM). All RMSE values reported in subsequent sections are computed on the original physical scales (kW or RPM) after inverse-transforming normalized predictions to ensure interpretability and comparability across datasets.

Following preprocessing, the three datasets retained characteristics reflective of their operational contexts. The Kelmarsh dataset achieved 96.5% effective completeness across 464 features with a standardized 10 min resolution, enabling robust evaluation of utility-scale MPPTs. The RAC dataset achieved 99.0% completeness across 5 variables at 1 s resolution, providing controlled laboratory precision with minimal corrections required. The VV dataset achieved 94.8% effective completeness across 87 features at a 10 min resolution, preserving tropical operational variability that is significant for monsoon-influenced wind analysis.

Overall, the preprocessing model delivered datasets aligned to their intended analytical purposes: Kelmarsh and VV maintained utility-scale temporal granularity, while RAC preserved high-frequency experimental resolution. This multiscale preprocessing design ensures methodological consistency while supporting comprehensive FS validation across laboratory, utility-scale, and tropical deployment environments.

Methodology

The proposed hybrid model

The proposed model, outlined in Fig. 1, follows a structured two-stage pipeline. It begins with processing multi-dataset input, where the Kelmarsh (464 features), RAC (5 features), and VV Farms (87 features) datasets undergo standardized preprocessing to ensure temporal alignment and consistent quality. This stage addresses missing values using duration-based imputation, removes outliers through modified z-score detection, and applies z-score normalization to standardize feature scales, thereby preparing the data for subsequent analysis.

Fig. 1
Fig. 1
Full size image

Proposed model.

Hybrid optimization is then conducted in two stages that integrate statistical filtering with metaheuristic search. In Stage 1, MI filtering with Silverman’s bandwidth rule ranks features by relevance to MPPT control variables, and the top 60% (β = 0.4) are retained to reduce dimensionality while preserving important data. In Stage 2, the Adaptive Multi-Objective Binary Harmony Search (AMO-BHS) algorithm explores the reduced feature space, using binary encoding with constraint repair to maintain feasibility. The method generates Pareto-optimal feature subsets that capture trade-offs between predictive accuracy and computational complexity, enabling selection of configurations based on operational requirements and hardware constraints.

Theoretical rationale for two-stage decomposition

The two-stage network is motivated by principles of computational complexity theory and information theory. Pure metaheuristic optimization over the entire feature space faces exponential search complexity in the number of features, rendering exhaustive or near-exhaustive exploration intractable even for moderate feature sets (e.g., Kelmarsh’s 464 features). Directly applying AMO-BHS to this space would require prohibitively large population sizes and iteration budgets to achieve adequate coverage.

  1. A.

    Stage 1 (MI filtering) provides computational tractability using dimensionality reduction grounded in information theory. MI quantifies the statistical dependence between the feature and the target, capturing linear and nonlinear relationships without making parametric assumptions (Cover & Thomas, 2006). By retaining only features with, this study eliminates statistically independent or weakly relevant variables, reducing search dimensionality from to while preserving information content. For this reduces Kelmarsh’s search space from to, a reduction of orders of magnitude, making metaheuristic search computationally feasible.

  2. B.

    Stage 2 (AMO-BHS) provides optimization capability that MI filtering alone cannot achieve. While MI ranking identifies univariate feature-target relevance, it cannot account for: (1) feature redundancy—two highly relevant features may provide overlapping information; (2) feature complementarity—individually weak features may improve prediction when combined; (3) accuracy-efficiency trade-offs—different applications require different optimal subset sizes. Multi-objective metaheuristic search addresses these limitations by exploring feature combinations within the reduced space and explicitly optimizing prediction accuracy (f₁) and computational cost (f₂) via Pareto-based evaluation.

Synergistic integration achieves capabilities unattainable by either stage alone: MI filtering transforms an intractable optimization into a manageable problem, while AMO-BHS refines selection beyond univariate ranking to discover interaction effects and generate deployment-flexible Pareto fronts. This decomposition is theoretically grounded in divide-and-conquer optimization: reduce problem dimensionality (Stage 1), then solve the refined problem (Stage 2), avoiding the “curse of dimensionality” inherent in direct high-dimensional search.

Stage 1: statistical filtering using mutual information

To reduce the combinatorial burden of FS before metaheuristic optimization, the hybrid model’s first stage employs MI-based statistical filtering. This method ranks features based on their statistical requirement with the target variable and retains only the most informative features for subsequent optimization.

The MPPT control target is formally defined as Eq. (7)

$$y={f}_{\text{MPPT}}(x)$$
(7)

where,

  • \(y\) →The control signal (e.g., rotor speed or power reference),

  • \(x\in {\mathbb{R}}^{d}\)→The input feature vector

  • \({f}_{\text{MPPT}}(\cdot )\) →The control function derived from operational turbine data collected under normal wind farm conditions.

For each feature \({x}_{i}\in X\), its relevance to the target is quantified by MI, Eq. (8)

$$R\left({x}_{i}\right)=I\left({x}_{i};y\right)$$
(8)

Generating the relevance score set, Eq. (9)

$$R=\left\{R\left({x}_{1}\right),R\left({x}_{2}\right),\dots ,R\left({x}_{d}\right)\right\}$$
(9)

Features are then ranked in descending order of MI.

The rank of each feature is defined as, Eq. (10)

$$\text{Rank}\left({x}_{i}\right)=\left|\left\{{x}_{j}\in X:R\left({x}_{j}\right)>R\left({x}_{i}\right)\right\}\right|+1$$
(10)

A filtering threshold \({\tau }_{\text{MI}}\) is applied to retain only the most informative features, Eq. (11)

$${X}_{\text{filtered }}=\left\{{x}_{i}\in X:R\left({x}_{i}\right)\ge {\tau }_{\text{MI}}\right\}$$
(11)

The threshold is determined using a percentile-based rule that retains a fixed proportion of top-ranked features, Eq. (12)

$${\tau }_{\text{MI}}=\text{Percentile}(R,100\times (1-\beta ))$$
(12)

where, \(\beta \in \left(\text{0,1}\right]\)→The feature retention ratio.

Based on cross-validation sensitivity analysis across the three datasets, \(\beta =0.4\) was selected, corresponding to retention of the top \(60\text{\%}\) of features.

To compute MI, bandwidth parameters for kernel density estimation are defined using Silverman’s rule, which provides improved robustness to skewed and heavy-tailed distributions commonly encountered in WE data, compared to Gaussian-assuming rules such as Scott’s, as shown in Eqs. (13) and (14).

$${h}_{x}=0.9\cdot \text{min}\left({\sigma }_{x},{\text{IQR}}_{x}/1.34\right)\cdot {n}^{-1/5}$$
(13)
$${h}_{y}=0.9\cdot \text{min}\left({\sigma }_{y},{\text{IQR}}_{y}/1.34\right)\cdot {n}^{-1/5}$$
(14)

where,

  • \(\sigma\), IQR →The sample standard deviation and interquartile range of the input and target variables

  • \(n\) →The number of observations.

MI estimation employs kernel density estimation implemented via sklearn.feature_selection.mutual_info_regression (scikit-learn 1.0.2), which uses k-nearest neighbors (k-NN) entropy estimation with k = 3 neighbors for continuous variable pairs. The estimator computes joint and marginal entropies H(X), H(Y), and H(X, Y) through adaptive partitioning of the feature-target space. Silverman’s bandwidth rule (Eqs. 1314) provides initial kernel width parameters; the implementation automatically clips bandwidth to the range [0.01σ, 2.0σ] to prevent numerical instability at distribution tails and ensure robust density estimation. Edge effects at domain limits are handled through reflection padding in the kernel density computation. All MI calculations were performed on a CPU setup, as the moderate dimensionality retained after 40% filtering (185 features for Kelmarsh, 21 for VV, 2 for RAC) remained computationally tractable without requiring GPU acceleration. The filtered feature set \({X}_{\text{filtered}}\) serves as the reduced search space for Stage 2 optimization, detailed in the subsequent section.

Stage 2: adaptive multi-objective binary harmony search

The second stage of the hybrid model is applied to the dimensionally reduced feature space \({X}_{\text{filtered}}\), which has been attained by the MI-based statistical filtering method. This stage uses the proposed AMO-BHS algorithm to identify optimal feature subsets that balance predictive accuracy and computational efficiency. The algorithm preserves the important principles of classical HS while introducing critical improvements explicitly designed for binary FS in, WE-based MPPT applications. These improvements include discrete encoding operators, multi-objective evaluation without scalarization, adaptive parameter adjustment, diversity preservation, elite retention, and adaptive local refinement. Together, these modifications address the significant shortcomings of traditional HS, rendering AMO-BHS a domain-specific optimization tool.

Classical harmony search foundation and limitations

Classical HS is a population-based metaheuristic inspired by the improvisation method of musicians seeking aesthetically pleasing harmonies. In the computational adaptation, each candidate solution (harmony) corresponds to a decision vector. For binary FS, a harmony is encoded as a binary vector \({z}^{(i)}\in \{\text{0,1}{\}}^{\left|{X}_{\text{filtered}}\right|}\), where each element of the vector represents inclusion (\({z}_{i}=1\)) or exclusion ( \({z}_{i}=0\) ) of the corresponding feature. The algorithm maintains a Harmony Memory (HM) of size HMS, which stores candidate solutions evaluated in previous iterations.

New harmonies are formed through three probabilistic mechanisms.

  1. (a)

    The memory consideration samples values from the HM, allowing the algorithm to exploit previously identified high-quality solutions.

  2. (b)

    The pitch adjustment introduces local modifications to the selected values by controlled bit-flip operations, mimicking the fine-tuning of musical notes.

  3. (c)

    Finally, random generation generates entirely new values, ensuring the exploration of unvisited regions of the solution space.

Together, these mechanisms balance exploitation, local refinement, and exploration during the search process.

The fitness of each candidate solution is typically assessed using a scalarized objective function that combines predictive accuracy (e.g., validation error) with subset size, often through weighted aggregation. While this method is simple and computationally efficient, classical HS displays several weaknesses when applied to high-dimensional binary FS for WE-based MPPT optimization.

  • Static Parameter Configurations: Parameters such as HMCR (Harmony Memory Considering Rate) and PAR (Pitch Adjustment Rate) remain fixed throughout the search, limiting the algorithm’s adaptive balance between exploration and exploitation.

  • Scalarized Objective Function: The use of a weighted aggregation requires predefined weights, which introduce bias, reduce flexibility, and constrain exploration of the trade-off space.

  • Premature Convergence: The absence of diversity-preserving mechanisms leads to population collapse and stagnation in local optima.

  • Simplistic Local Search: Relying on single-bit flips fails to capture non-linear and interaction-driven feature dependencies, which are crucial for accurate MPPT prediction.

These limitations motivate the development of the AMO-BHS, which introduces dynamic parameter control, multi-objective evaluation, and diversity-preserving mechanisms specifically designed to address the challenges of binary FS in complex WE environments.

Binary encoding and discrete operators

The AMO-BHS algorithm represents each candidate solution as a binary decision vector that explicitly encodes whether each feature is included or excluded. This design ensures computational efficiency, interpretability, and seamless integration with MPPT prediction pipelines.

Formally, the representation is specified as Eq. (15)

$$z={\left[{z}_{1},{z}_{2},\dots ,{z}_{\left|{X}_{\text{filtered }}\right|}\right]}^{T}\in \{\text{0,1}{\}}^{\left|{X}_{\text{filtered }}\right|},$$
(15)

where,

  • \({z}_{i}=1\) →Signifies inclusion of feature \(i\)

  • \({z}_{i}=0\) →Exclusion.

This direct encoding enables intuitive interpretation of optimization results, computational efficiency, and seamless integration into MPPT prediction pipelines.

To evolve solutions, AMO-BHS employs 3 discrete operators that regulate the generation of new harmonies by balancing exploitation, local refinement, and exploration.

A. Memory Consideration (Exploitation): This operator promotes exploitation by reusing high-quality solutions stored in HM. The decision for the \(j\)-th feature in a new harmony is sampled from a randomly selected stored solution:

$${z}_{j}^{\text{new }}={z}_{j}^{(k)}, k\sim \text{Uniform}(1,\text{HMS}),$$
(16)

where,

  • \({z}_{j}^{\text{new}}\) →The binary decision for the \(j\)-th feature in the new harmony

  • \({z}_{j}^{(k)}\) →The corresponding decision in the \(k\)-th stored harmony.

This operator exploits existing high-quality harmonies.

B. Pitch Adjustment (Local Search): This operator introduces controlled perturbations into the solution, enabling local exploration around existing harmonies.

The adjustment is performed by probabilistic bit-flipping:

$${z}_{j}^{\text{new }}=\left\{\begin{array}{ll}1-{z}_{j}^{\text{new }}& \text{ if rand }()<\text{PAR}\\ {z}_{j}^{\text{new }}& \text{ otherwise}\end{array}\right.$$
(17)

where, PAR→The pitch adjustment rate.

This operator conducts a localized search by controlled bit flipping, thereby balancing stability and perturbation.

C. Random generation (Exploration): To avoid stagnation and ensure continuous discovery of new feature subsets, a random generation operator is applied with probability (\(1-\text{HMCR}\)).

Each decision variable is set as, Eq. (14)

$${z}_{j}^{\text{new }}=\left\{\begin{array}{ll}1,& \text{ if }\text{rand}()<0.5\\ 0,& \text{ otherwise}\end{array}\right.$$
(18)

This ensures exploration of previously unexplored regions of the solution space and mitigates stagnation.

Together, these three operators act synergistically: memory consideration exploits existing solutions, pitch adjustment performs local refinement, and random generation injects diversity. This balance ensures that AMO-BHS can explore the solution space effectively while maintaining convergence pressure.

Multi-objective formulation and pareto optimization

Classical HS typically evaluates solutions using a single scalarized objective function. In contrast, AMO-BHS treats FS as a true multi-objective optimization problem, simultaneously minimizing predictive error and feature subset size.

The two objectives are defined as Eq. (19)

$${f}_{1}(z)={L}_{M}(z), {f}_{2}(z)=\frac{1}{\left|{X}_{\text{filtered }}\right|}\sum_{i=1}^{\left|{X}_{\text{filtered }}\right|} {z}_{i}$$
(19)

where,

  • \({f}_{1}(z)\) →The validation loss for MPPT prediction

  • \({f}_{2}(z)\) Measures the normalized subset size.

The normalization ensures comparability across different dimensionalities of \({X}_{\text{filtered}}\).

The quality of solutions is assessed using the Pareto dominance principle. A solution \({z}^{(1)}\) is said to dominate \({z}^{(2)}\) If it is no worse in all objectives and severely better in at least one:

$${z}^{(1)}\prec {z}^{(2)}\Leftrightarrow {f}_{1}\left({z}^{(1)}\right)\le {f}_{1}\left({z}^{(2)}\right)\wedge {f}_{2}\left({z}^{(1)}\right)\le {f}_{2}\left({z}^{(2)}\right)\wedge \exists \text{ strict inequality }.$$
(20)

This formulation provides a Pareto front of non-dominated solutions that represent distinct trade-offs between prediction accuracy and computational efficiency. Instead of committing to a single solution, practitioners face a spectrum of choices, allowing them to select the most suitable feature subset based on deployment constraints, such as hardware size or response-time requirements.

Constraint handling and feasibility maintenance

In practical MPPT applications, feature subsets must satisfy strict cardinality bounds \(\left[{n}_{\text{min}},{n}_{\text{max}}\right]\) to ensure computational feasibility and control stability. Without these bounds, solutions may either omit too many critical variables or include an excessive number of features that exceed real-time processing capacity. To ensure feasibility, AMO-BHS incorporates a violation-detection-and-repair method that is applied after each new harmony is generated.

Violation Detection: The degree of violation is quantified separately for the minimum and maximum bounds.

The minimum constraint violation is computed as

$${\text{Violation }}_{\text{min}}=\text{max}\left(0,{n}_{\text{min}}-\sum_{i=1}^{\left|{X}_{\text{filtered }}\right|} {z}_{i}\right)$$
(21)

while the maximum constraint violation is measured as

$${\text{Violation }}_{\text{max}}=\text{max}\left(0,\sum_{i=1}^{\left|{X}_{\text{filtered }}\right|} {z}_{i}-{n}_{\text{max}}\right)$$
(22)
  • Repair Mechanism. Whenever a violation is detected, corrective actions are taken to restore feasibility.

  • If Violation \({}_{\text{min}}>0\), an equal number of indices with \({z}_{i}=0\) are randomly selected and flipped to 1, thereby ensuring that at least \({n}_{\text{min}}\) Features are included.

  • If Violation \({}_{\text{max}}>0\), an equal number of indices with \({z}_{i}=1\) are randomly selected and flipped to 0, reducing the subset to no more than \({n}_{\text{max}}\) features.

This repair process guarantees that every candidate solution remains within the hard feasibility bounds defined in Section “Problem definition and formulation”. At the same time, stochastic selection of repair indices preserves diversity in the search method and prevents deterministic bias.

Adaptive mechanisms and diversity preservation

To address the limitations of classical HS and maintain robust search performance, AMO-BHS integrates adaptive mechanisms with explicit diversity-preserving methods. These enhancements enable the algorithm to balance exploration and exploitation dynamically, prevent premature convergence, and sustain variety along the Pareto front.

  1. A.

    Dynamic Parameter Adjustment

The algorithm adjusts the Harmony Memory Considering Rate (HMCR) and Pitch Adjustment Rate (PAR) as functions of the iteration index \(t\) :

$${\text{HMCR}}^{(t)}={\text{HMCR}}_{\text{min}}+\left({\text{HMCR}}_{\text{max}}-{\text{HMCR}}_{\text{min}}\right){\left(1-t/{T}_{\text{max}}\right)}^{\alpha }$$
(23)
$${\text{PAR}}^{(t)}={\text{PAR}}_{\text{min}}+\left({\text{PAR}}_{\text{max}}-{\text{PAR}}_{\text{min}}\right)\text{exp}\left(-{\alpha }_{PAR}t/{T}_{\text{max}}\right)$$
(24)

Equation (23) implements a power-law increase in HMCR, gradually shifting the algorithm from exploration to exploitation as the number of iterations increases. Equation (24) applies exponential decay to PAR, reducing disruptive local search near convergence.

Together, these schedules adaptively regulate search dynamics.

  1. B.

    Population Diversity Monitoring

To avoid population collapse, AMO-BHS quantifies diversity at both the binary representation level and the Pareto front.

The normalized Hamming distance is used to measure population diversity:

$${D}^{(t)}=\frac{2}{\text{ HMS(HMS - 1) }}\sum_{i=1}^{{\text{H}}\text{MS-1 }} \sum_{j=i+1}^{\text{HMS }} \frac{H\left({z}^{(i)},{z}^{(j)}\right)}{\left|{X}_{\text{filtered }}\right|}.$$
(25)

where, \(H(\cdot ,\cdot )\) →The Hamming distance between two solutions.

The spacing metric captures Pareto front diversity:

$${S}^{(t)}=\frac{1}{\left|{P}^{(t)}\right|-1}\sum_{i=1}^{\left|{P}^{(t)}\right|-1} {d}_{i}$$
(26)

where, \({d}_{i}\) →Distance between consecutive non-dominated solutions. These metrics guide the algorithm in maintaining a wide distribution of solutions.

  1. C.

    Convergence Detection

To find whether evolution is decaying, AMO-BHS evaluates a composite metric of Pareto front improvement:

$${C}^{(t)}=\frac{\left|{P}^{(t)}\right|-\left|{P}^{(t-w)}\right|}{w}+\gamma \cdot \frac{\sum_{i=1}^{\left|{P}^{(t)}\right|} {f}_{1}\left({z}_{i}^{(t)}\right)-\sum_{j=1}^{\left|{P}^{(t-w)}\right|} {f}_{1}\left({z}_{j}^{(t-w)}\right)}{\left|{P}^{(t)}\right|\cdot w}.$$
(27)

When \({C}^{(t)}<\varepsilon\), diversification methods such as parameter tuning and memory reshuffling are invoked.

  1. D.

    Elite Preservation

To safeguard high-quality solutions, AMO-BHS maintains an archive of non-dominated solutions across all iterations, Eq. (28)

$${E}^{(t)}=\left\{z\in \bigcup_{k=1}^{t} H{M}^{(k)}:z\text{ is non-dominated}\right\}$$
(28)

Periodic reinjection of elites ensures that valuable solutions are not lost and contributes to sustained diversity.

  1. E.

    Adaptive Local Search

Promising harmonies experience selective refinement within a \(k\)-bit Hamming neighborhood. The best neighbor is selected as:

$${z}_{\text{LS}}=\text{arg}\underset{{z}{\prime}\in {N}_{k}(z)}{\text{min}} {f}_{1}\left({z}{\prime}\right), \text{ subject to }{f}_{2}\left({z}{\prime}\right)\le {f}_{2}(z)+{\varepsilon }_{\text{LS}}$$
(29)

This mechanism allows the algorithm to improve accuracy locally without significantly increasing subset size.

AMO-BHS integrates initialization, adaptive solution generation, constraint handling, multi-objective evaluation, diversity monitoring, elite preservation, and local refinement into an iterative loop. The algorithm terminates once the maximum number of iterations is reached or the convergence condition in Eq. (27) is satisfied.

At this point, the final Pareto front is extracted from the union of the harmony memory and the elite archive, Eq. (30)

$${\mathcal{P}}^{*}=\text{ ExtractParetoFront}\left(HM\cup {\mathcal{E}}^{\left({T}_{\text{max}}\right)}\right)$$
(30)

Each \({z}^{*}\in {P}^{*}\) is an optimal feature subset showing specific trade-offs between MPPT prediction accuracy and computational complexity.

The complete AMO-BHS algorithm 1 integrates harmony search principles with adaptive parameter control and multi-objective evaluation mechanisms to develop a robust optimization method for FS applications.

The algorithm retains the core network of HS while embedding adaptive mechanisms that dynamically adjust to the progress of the search and population diversity. Multi-objective evaluation with Pareto-based selection preserves diverse, high-quality solutions across iterations. The complete hybrid FS method is exposed in Fig. 2, which integrates statistical filtering with metaheuristic optimization in a sequential pipeline. The flowchart outlines input data preprocessing, MI–based ranking, and thresholding in Stage 1, followed by iterative AMO-BHS optimization in Stage 2. Key components include HM initialization, adaptive parameter control, constraint repair, multi-objective assessment, and Pareto-based selection within the main loop. The method concludes with the extraction of the Pareto front and mapping of feature subsets back to the original space, providing multiple configurations that balance MPPT predictive accuracy against computational cost in WE applications.

Algorithm 1
Algorithm 1
Full size image

For AMO-BHS.

Fig. 2
Fig. 2
Full size image

Hybrid FS flowchart.

Experimental setup

Computational environment and implementation model

Experimental validation was conducted on a high-performance computing platform to ensure reproducibility across the 3 distinct WE datasets. The environment consisted of dual Intel Xeon Gold 6248R processors operating at 3.0 GHz with 48 cores, paired with 256 GB of DDR4-3200 ECC memory for handling large-scale feature matrices. MI calculations were parallelized across dual NVIDIA Tesla V100 GPUs with 32 GB of HBM2 each, delivering 15–20 × speedups over CPU-only execution on the high-dimensional Kelmarsh dataset.

Software implementation was based on Python 3.9.16, with NumPy 1.21.6 for numerical routines, SciPy 1.8.1 for statistical functions, and Sci-Kit-Learn 1.0.2 for machine learning integration. Parallelization during AMO-BHS feature subset evaluation used multiprocessing Pool Executor, while GPU acceleration of MI density estimation employed CUDA 11.7 with cuPy 10.6.0. Efficiency was further enhanced through sparse matrix representations and streaming data processing, enabling scalability across datasets with variable dimensionality.

Prediction task and target specification

The FS is evaluated through supervised learning tasks targeting active energy prediction for utility-scale datasets and rotor speed prediction for the laboratory-scale controller dataset. Table 2 specifies the complete prediction configuration for each dataset, including target variables, temporal horizons, and evaluation protocols.

Table 2 Prediction task specification across datasets.

For the Kelmarsh and VV datasets, the task involves 10-min ahead prediction of active power output, where features measured at time ‘t’ are used to predict WE generation at t + 10 min. This prediction horizon aligns with operational MPPT control cycles in utility-scale wind farms. For the RAC-WECS dataset, the high-frequency sampling rate (20 Hz) focuses on current-state prediction of rapid rotor speed without temporal offset, reflecting the real-time control requirements of laboratory-scale MPPT systems.

During AMO-BHS optimization, candidate feature subsets are evaluated using RF regression (scikit-learn 1.0.2) with 100 estimators and a maximum depth of 10. This learner configuration was selected for its computational efficiency, robustness to variable feature spaces, and ability to capture non-linear relationships inherent in WE systems. The fitness function f₁ defined in Eq. (19) uses fivefold cross-validation MSE as the validation loss metric.

Dataset partitioning and cross-validation strategy

The validation methodology employs temporal-aware partitioning protocols to prevent data loss and ensure realistic performance assessment under operational conditions. The chronological split maintains temporal integrity with 70% of the earliest data allocated for training, 15% intermediate data for validation, and \(15\text{\%}\) latest data for testing, strictly enforcing \({t}_{\text{train }}<{t}_{\text{validation }}<{t}_{\text{test}}\) ordering.

Dataset-specific partitioning follows the temporal boundaries developed for each wind energy context. The Kelmarsh dataset spanning 2016–2021 allocates January 2016 through September 2019 for training (70%), October 2019 through June 2020 for validation (15%), and July 2020 through June 2021 for testing (15%). The high-resolution RAC dataset covering 76.6 s uses the first 53.6 s for training, the next 11.5 for validation, and the final 11.5 for testing. The VV dataset representing 2023 operations employs January through August for training, September through October for validation, and November through December for testing.

Robustness assessment incorporates forward-chaining cross-validation with 5 temporally ordered folds, incrementally increasing the training window size and testing on subsequent segments.

The aggregated cross-validation performance is computed as, Eq. (31)

$$C{V}_{\text{performance }}=\frac{\sum_{i=1}^{5} \left|Tes{t}_{i}\right|\cdot {P}_{i}}{\sum_{i=1}^{5} \mid {\text{ Test }}_{i}\mid }$$
(31)

where,

  • \(\left|Tes{t}_{i}\right|\) The size of the test fold

  • \(i\), \({P}_{i}\) The corresponding performance metric.

Statistical significance is measured using 95% confidence intervals, and performance improvements are considered significant when the confidence intervals of competing methods do not overlap. Statistical significance testing employs paired t-tests to compare method performance metrics (RMSE, feature count) across the three independent datasets (Kelmarsh, VV, RAC), treating each dataset as an independent observation. This cross-dataset comparison method avoids temporal dependency problems, as the statistical units are geographically and operationally distinct, WE systems rather than time-ordered observations within a single system. Within each dataset, forward-chaining cross-validation with 5 temporally ordered folds ensures proper handling of the time-series network during model evaluation; however, the final performance metric for each dataset-method combination is the aggregated cross-validation score (Eq. (31)), generating 3 independent results per method for statistical comparison. Effect sizes are reported using Cohen’s d to quantify the magnitude of performance differences beyond statistical significance.

Performance evaluation metrics

The comprehensive evaluation model assesses MI-AMO-BHS across multiple dimensions, including FS, subset quality, computational efficiency, and MPPT control performance.

A. Feature Relevance Assessment

The average MI of selected features quantifies data content preservation:

$$\overline{I}_{{\text{selected }}} = \frac{1}{{\left| {F^{*} } \right|}}\mathop \sum \limits_{{f_{i} \in F^{*} }} I\left( {f_{i} ;y} \right)$$
(32)

Feature-target correlation analysis evaluates linear relationship strength:

$$\overline{\rho }_{{\text{selected }}} = \frac{1}{{\left| {F^{*} } \right|}}\mathop \sum \limits_{{f_{i} \in F^{*} }} \left| {\rho \left( {f_{i} ,y} \right)} \right|$$
(33)

where, \(\rho \left({f}_{i},y\right)\)→The Pearson correlation coefficient between the feature \({f}_{i}\) and target variable \(y\).

B. Subset Quality Evaluation

Classification performance for definite MPPT targets employs an accuracy size:

$$\text{Accuracy }=\frac{TP+TN}{TP+TN+FP+FN}$$
(34)

where, TP, TN, FP, and FN are true positives, true negatives, false positives, and false negatives.

Regression performance assessment uses Root Mean Square Error (RMSE) for continuous MPPT parameters:

$$RMSE = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }$$
(35)

Scale-independent evaluation employs Mean Absolute Percentage Error (MAPE):

$$MAPE = \frac{{100{\text{\% }}}}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right|$$
(36)

Feature redundancy within selected subsets is measured through pairwise MI:

$$\text{Redundancy }=\frac{1}{\left|{F}^{*}\right|\left(\left|{F}^{*}\right|-1\right)}\sum_{{f}_{i}\in {F}^{*}} \sum_{\begin{array}{c}{f}_{j}\in {F}^{*}\\ j\ne i\end{array}} I\left({f}_{i};{f}_{j}\right)$$
(37)

C. Computational Efficiency Metrics

Dimensionality reduction use is quantified through the feature reduction ratio:

$$\text{Reduction\_Ratio }=\frac{d-\left|{F}^{*}\right|}{d}\times 100\text{\%}$$
(38)

where,

  • \(d\) Original feature space dimensionality

  • \(\left|{F}^{*}\right|\) Selected subset size.

Baseline methods for comparative analysis

Four established FS provide comprehensive benchmarking across FM, WM, embedded, and regularization paradigms:

  • Pure MI Ranking: This filter-based baseline ranks features by MI with the target variable and retains the top- \(k\) subset

  • Genetic Algorithm Wrapper: The evolutionary method utilizes binary encoding for feature subsets, employing tournament selection, single-point crossover, and bit-flip mutation. The population size is set to 50 individuals over 100 generations, providing a computational budget comparable to AMO-BHS.

  • Random Forest Recursive Feature Elimination: This embedded method iteratively removes features with the lowest importance scores, based on the reduction in Gini impurity across Decision Trees (DTs).

  • LASSO Regularization: The L1-regularized regression method selects features through sparsity-inducing optimization.

Parameter configuration and optimization

Systematic parameter optimization was performed using nested cross-validation to avoid overfitting during hyperparameter tuning and to maintain fair comparisons across methods. The outer loop delivered unbiased performance predictions, while the inner loop adjusted parameters solely on the training partitions. Table 3 reports the complete parameter configuration of the proposed model, covering MI filtering and AMO-BHS optimization settings. Table 4 summarizes the configurations of the four baseline methods, with consistent computational costs and optimization protocols applied to ensure comparability across all methods.

Table 3 MI-AMO-BHS parameter configuration.
Table 4 Baseline methods parameter configuration.

Computational cost allocation ensures a fair comparison: 20% to parameter tuning and 80% to FS execution. Convergence criteria are developed when performance improvement falls below 0.1% over 10 consecutive iterations or maximum iteration limits are reached, maintaining consistent termination conditions across all methods.

Results

To establish performance benchmarks for evaluating FS use, this study evaluated two baseline configurations. The full-feature baseline employs all available features without selection (464 features for Kelmarsh, 52 for VV, 5 for RAC) to assess whether dimensionality reduction maintains prediction accuracy. Also, a naïve persistence prediction baseline (predicting future energy from current energy for the Kelmarsh/VV datasets) provides a lower bound for comparing the value of feature-based prediction models to simple temporal extrapolation. Detailed performance comparisons between these baselines and FS methods are presented in Sections “Stage 1 results: MI-based filtering” and 7.3.

Stage 1 results: MI-based filtering

The MI-based filtering stage proves substantial dimensionality reduction while preserving the critical data content required for MPPT control applications. This section presents comprehensive evaluation results across the 3 distinct datasets, examining FS distributions, correlation patterns among statistical measures and empirical importance, and the use of dimensionality reduction via threshold-based selection.

  1. A.

    Feature Ranking and MI Score Distributions

The MI score distributions across the 3 datasets (Fig. 3a to f) reveal distinct patterns that reflect the primary physical features and operational contexts of each WE system. The distribution analysis provides insight into FS heterogeneity and the effectiveness of MI-based discrimination between relevant and irrelevant variables.

Fig. 3
Fig. 3
Full size image

(a)–(f): MI score distribution results for (a) MI score range (minimum–maximum), (b) mean MI with standard deviation, (c) top 10% vs. bottom 10% mean MI scores, (d) total features per dataset, (e) distribution skewness, and (f) distribution kurtosis. Together, the panels illustrate dataset-specific variability in FS across Kelmarsh Wind Farm, Reverse-Acting Controller, and VV Wind Farms.

The Kelmarsh Wind Farm Dataset shows a wide spread between the minimum and maximum MI scores (Fig. 3a), highlighting substantial heterogeneity in feature informativeness. The overall mean MI is moderate (0.1247) with a standard deviation of 0.0891 (Fig. 3b), and the separation between the top 10% (0.3654) and bottom 10% (0.0187) mean scores (Fig. 3c) indicates a sharp divide between highly relevant and irrelevant features. With the most prominent feature set among the datasets (464 variables; Fig. 3d), Kelmarsh also exhibits positive skewness (1.23; Fig. 3e) and kurtosis (2.14; Fig. 3f), reflecting a right-tailed distribution where most features provide minimal predictive value but a small subset contributes disproportionately high information content.

The Reverse-Acting Controller dataset exhibits a narrower MI score range (Fig. 3a), yet achieves the highest mean MI (0.2456) and a relatively low standard deviation (0.0654; Fig. 3b). The gap between the top 10% (0.3125) and the bottom 10% (0.0987) in mean scores (Fig. 3c) is comparatively small, indicating greater uniformity in informativeness across features. With only five total variables (Fig. 3d), the dataset exhibits low skewness (0.34; Fig. 3e) and negative kurtosis (–0.87; Fig. 3f), signifying a flatter distribution with fewer extremes. These features reflect the controlled laboratory setting, where all measurements exhibit consistently strong associations with MPPT targets.

The VV Wind Farms Dataset exhibits a wide range of MI scores (Fig. 3a), with the lowest mean MI (0.0905) and relatively high variability (SD = 0.0721; Fig. 3b). The disparity between the top 10% and bottom 10% mean scores (Fig. 3c confirms that predictive information is focused on only a small fraction of features. With 87 total features (Fig. 3d, VV exhibits the highest skewness (1.78; Fig. 3e and kurtosis (3.82; Fig. 3f, reflecting a highly right-tailed distribution. This proposes that tropical operating conditions and heterogeneous turbine configurations generate sparse data landscapes, in which only a few variables retain substantial predictive value.

The top-ranked features derived from MI analysis verify robust physical relevance across diverse WE operational contexts, as exposed in Fig. 4a to c. Power-related variables and rotor dynamics consistently dominate the rankings, underscoring their importance in optimizing MPPT control.

Fig. 4
Fig. 4Fig. 4
Full size image

(a)–(c): Top-ranked features by MI scores for: (a) Kelmarsh Wind Farm; (b) VV Wind Farms; (c) Reverse-Acting Controller datasets. Each panel lists features in descending order of MI score, highlighting dataset-specific patterns of FS.

In the Kelmarsh Wind Farm Dataset (Fig. 4a), nacelle wind speed emerges as the most informative feature (MI = 0.4876), followed by active power output (MI = 0.4534). These results confirm that direct performance measures and rotor inflow conditions are primary drivers of control accuracy. Secondary features, such as rapid rotor speed, generator torque, and tip speed ratio, rank among the top ten. At the same time, engineered variables such as power coefficient and blade pitch angle also appear, indicating the added value of derived measurements that combine multiple physical inputs.

The VV Wind Farms dataset (Fig. 4b) places active power output at the top (MI = 0.4123), followed closely by rapid rotor speed (MI = 0.3789). Features such as generator torque, tip speed ratio, and power coefficient rank among the top, indicating that in tropical operational conditions, real-time performance metrics and rotor dynamics provide superior predictive value than environmental signals, such as nacelle wind speed.

The Reverse-Acting Controller dataset (Fig. 4c) achieves the highest single-feature relevance, with the reference speed setpoint (MI = 0.6234) ranking first. Instantaneous rotor speed (MI = 0.4567) and output voltage (MI = 0.2891) also score highly, reflecting the robust role of control loop parameters and power electronics variables under laboratory conditions.

Across all 3 distinct datasets, rotor speed consistently appears among the most informative variables, validating its universal role in wind energy MPPT control. Also, the presence of derived indicators, such as tip speed ratio and power coefficient, in the Kelmarsh and VV datasets proves that engineered features, which capture physical interactions between turbine subsystems, provide significant data content beyond raw sensor readings.

  1. B.

    Correlation Analysis Between MI Scores and Actual Feature Importance: The correlation analysis examines the relationship between MI scores computed during statistical filtering and empirical feature importance derived from trained ML models. This analysis validates the effectiveness of MI-based significance assessment in predicting actual contributions to MPPT performance across different modelling methods and operational contexts.

The correlation analysis between MI scores and empirical FS from trained ML models (Fig. 5) validates the use of MI-based relevance assessment for predicting actual predictive contribution across diverse WE contexts, with all correlations achieving statistical significance (p < 0.05) and signifying substantial agreement between information-theoretic measures and model-based importance rankings. The Reverse-Acting Controller dataset exhibits the strongest correlation consistency across all metrics, with RF achieving the highest Pearson correlation (r = 0.892) and coefficient of determination (R2 = 0.796), indicating that controlled laboratory conditions enable precise alignment between statistical dependence measures and actual model performance contributions. The Kelmarsh dataset validates moderate-to-strong correlations across all ML methods, with RF consistently outperforming other algorithms in correlation strength (Pearson r = 0.743 vs. 0.689 for Gradient Boosting and 0.634 for SVR), signifying that tree-based ensemble methods better capture the non-linear FS prevalent in utility-scale wind farm operations. The VV Wind Farms dataset exhibits the most challenging correlation environment, characterized by consistently lower correlation coefficients across all models (RF Pearson r = 0.671), reflecting the complex tropical operational conditions and diverse turbine configurations that generate non-trivial relationships between statistical independence measures and actual predictive utility for MPPT control applications. However, all correlations remain statistically significant and practically meaningful for FS guidance.

Fig. 5
Fig. 5
Full size image

Correlation analysis between MI scores and feature importance.

Stage 2 Results: AMO-BHS Optimization

The AMO-BHS optimization stage operates on the filtered feature spaces generated by the statistical filtering process, conducting multi-objective optimization to identify Pareto-optimal feature subsets that balance prediction accuracy with model complexity. This section presents comprehensive test results signifying the algorithm’s convergence behavior, solution quality evolution, and adaptive mechanism effectiveness across the 3 experimental datasets. The optimization results validate the theoretical contributions established in Sect. 3.5 and provide empirical evidence of superior performance compared to static parameter methods.

  1. C.

    Convergence Analysis

The convergence analysis evaluates AMO-BHS performance across the three datasets, focusing on iteration dynamics, Pareto set evolution, solution quality, and efficiency, as illustrated in Fig. 6 (a) to Figure (f).

Fig. 6
Fig. 6
Full size image

(a) to (f): AMO-BHS convergence analysis results: (a) convergence vs. maximum iterations, (b) final HMS and Pareto set size, (c) hypervolume performance, (d) Inverted Generational Distance (IGD), (e) convergence rate, and (f) efficiency comparison (convergence rate vs. iterations). Results are shown for Kelmarsh Wind Farm, VV Wind Farms, and Reverse-Acting Controller datasets.

The Kelmarsh Wind Farm Dataset proves the most extensive search method. Convergence occurs at iteration 2,234, representing 80.4% of the maximum iteration cost (Fig. 6a). Despite the high computational demand imposed by its 278-dimensional filtered feature space, the algorithm achieves a robust Pareto set of 23 non-dominated solutions (Fig. 6 (b)). Performance indicators confirm a strong trade-off quality, with a hypervolume of 0.7834 (Fig. 6c) and an IGD of 0.0234 (Fig. 6d). The convergence rate is the highest among datasets at 80.4% (Fig. 6e), although the efficiency analysis reveals that this comes at the cost of many iterations (Fig. 6f).

The Reverse-Acting Controller Dataset achieves the most efficient convergence profile. Optimization stabilizes after only 31 iterations, representing 77.5% of the allowed budget (Fig. 6a to e), and produces a compact Pareto set of four solutions (Fig. 6b). Despite its small search space (4 filtered features), the dataset achieves the highest hypervolume (0.8567; Fig. 6c) and the lowest IGD (0.0156; Fig. 6d). Efficiency plots (Fig. 6f) confirm that the laboratory-controlled environment enables rapid proof of identity for high-quality trade-offs, making RAC the best-performing dataset in terms of convergence speed and solution quality.

The VV Wind Farms Dataset exhibits intermediate performance. Convergence is reached after 378 iterations (72.7% of maximum; Fig. 6a), producing 18 Pareto-optimal solutions (Fig. 6 (b)). Its hypervolume (0.7156; Fig. 6c) is the lowest across all datasets, and its IGD (0.0312; Fig. 6d) is the highest, indicating reduced Pareto front accuracy and coverage. Efficiency analysis (Fig. 6e and f) shows weaker performance relative to Kelmarsh and RAC, consistent with the dataset’s heterogeneous tropical operating conditions and higher noise levels.

Collectively, Fig. 6a to f prove that AMO-BHS adapts effectively to varying dataset complexities. Specifically, Kelmarsh requires deep exploration but produces substantial Pareto diversity, RAC converges fastest with the most accurate trade-offs, and VV Farms presents the most challenging landscape, albeit with lower solution quality. These results validate the robustness of adaptive mechanisms in AMO-BHS for FS across diverse WE environments.

The convergence trajectories of AMO-BHS exhibit consistent patterns across all datasets, as illustrated in Fig. 7a to d. In terms of fitness dynamics, the average fitness decreases steadily from initial values between 0.45 and 0.67 to final convergence levels below 0.30 (Fig. 7a). This reduction proves progressive improvement in predictive performance as the search refines feature subsets over successive iterations. Population diversity exhibits a systematic decline from initial values above 0.90 to stable levels between 0.52 and 0.62 (Fig. 7b). This controlled reduction reflects the balance between exploration and exploitation, while preserving sufficient diversity to avoid premature convergence.

Fig. 7
Fig. 7
Full size image

(a) to (d): Convergence behavior analysis across iteration evolution: (a) average fitness convergence, (b) population diversity, (c) Pareto front size growth, and (d) convergence efficiency (fitness improvement rate). Results are exposed for Kelmarsh Wind Farm, VV Wind Farms, and RAC datasets.

The Pareto front size exhibits rapid expansion during the early 0–50% of iteration progress, corresponding to the exploratory phase, followed by gradual stabilization in later iterations as the search emphasizes refinement (Fig. 7c). Kelmarsh achieves the most significant Pareto front evolution, VV proves intermediate expansion, and RAC stabilizes early due to its compact feature space. Finally, convergence efficiency, expressed as the cumulative fitness improvement rate, shows smooth, monotonic growth across all datasets (Fig. 7d). By the end of the optimization process, all three datasets achieve fitness improvements of more than 50%, confirming the robustness of the adaptive mechanisms in maintaining consistent gains across different problem complexities.

  1. D.

    Selected Feature Subsets

The selected feature subset analysis examines the features and composition of optimal solutions identified by the AMO-BHS optimization process, providing insights into the importance patterns of features and practical deployment attentions for MPPT control applications. This analysis focuses on typical solutions across different regions of the Pareto front to demonstrate the algorithm’s ability to classify diverse, high-quality feature combinations.

The analysis of optimal feature subsets reveals systematic trade-offs between predictive accuracy and computational efficiency, as illustrated in Fig. 8a to f.

Fig. 8
Fig. 8
Full size image

(a) to (f): Optimal subset features across solution types. (a) Feature reduction percentages, (b) number of selected features, (c) RMSE vs. feature density trade-off, (d) computational savings, (e) efficiency ratio (savings/RMSE), and (f) Error per feature. Results are exposed for Kelmarsh Wind Farm, VV Wind Farms, and Reverse-Acting Controller datasets.

For the Kelmarsh Wind Farm Dataset, best-efficiency solutions achieve a 79.1% reduction in dimensionality (58 selected features; Fig. 8a and b) with an RMSE of 0.4234 kW (Fig. 8c). Best-accuracy solutions retain 67.3% of features (187) and achieve the lowest RMSE of 0.1234 kW. Computational savings scale directly with the reduction percentage, reaching nearly 80% in best-efficiency mode (Fig. 8d). The efficiency ratio (savings/RMSE) peaks at 2.65 for the best-efficiency solutions (Fig. 8e), while the error-per-feature metric remains low (0.0062; Fig. 8f), confirming the feasibility of aggressive dimensionality reduction in utility-scale deployments.

The VV Wind Farms Dataset exhibits similar patterns, but with fewer absolute feature counts due to its intermediate dimensionality. Best-efficiency solutions reduce the feature set by 76.9% (from 52 selected features; Fig. 8a and b) while maintaining an RMSE below 0.40 kW (Fig. 8c). Balanced solutions retain 22–42 features, corresponding to feature densities of 0.388–0.423, with RMSE between 0.2467 and 0.2567 kW. Computational savings reach 77% (Fig. 8d), and the efficiency ratio stabilizes near 2.03 (Fig. 8e). However, error-per-feature values are higher than those at Kelmarsh (0.0112–0.0247; Fig. 8f), indicating that predictive data are focused on fewer variables under tropical conditions.

The Reverse-Acting Controller dataset, constrained by its five-feature space, cannot benefit from aggressive reduction. Both best-accuracy and balanced solutions use 75–100% of features (Fig. 8a and b), generating RMSE values between 0.0987 and 0.1387 RPM (Fig. 8c). Computational savings plateau at 25% (Fig. 8d), and efficiency ratios remain lower (1.49–1.80; Fig. 8e). The error-per-feature metric is the lowest across datasets (0.0042–0.0077; Fig. 8f), reflecting consistently strong informativeness across all features in controlled laboratory settings.

Table 5 presents the FS for representative Pareto solutions across all datasets, signifying the physical interpretability of the optimization results using FS patterns and MI importance scores. For Kelmarsh, the best-efficiency solution (58 features) prioritizes core operational variables directly linked to WE generating: nacelle wind speed (MI = 0.4876), active power output (MI = 0.4534), and rotor speed (MI = 0.4012) consistently rank among the top selections across all Pareto points. The balanced solution (97 features, RMSE = 0.2567 kW) includes electrical parameters (reactive power MI = 0.2187, grid voltage MI = 0.2054) and thermal sensors (generator temperature MI = 0.1987, bearing temperature MI = 0.1876), improving prediction accuracy while maintaining computational tractability. Best-accuracy solutions (187 features) incorporate complete sensor suites including gearbox temperatures, bearing conditions, and grid interface parameters, achieving minimal RMSE (0.1234 kW) at the cost of increased dimensionality.

Table 5 Selected features for representative pareto solutions with importance metrics.

For VV Farms, the tropical operational environment emphasizes environmental adaptation features: air density (MI = 0.2098), ambient temperature (MI = 0.2432), and humidity (MI = 0.1654) appear prominently in selected subsets alongside standard power-torque-speed variables. Derived aerodynamic indicators—tip speed ratio (MI = 0.3234) and power coefficient (MI = 0.3012)—rank among the top-5 features, reflecting monsoon variability where aerodynamic efficiency metrics provide complementary predictive data beyond direct measurements. The compact best-efficiency solution (12 features) achieves an RMSE of 0.40 kW, validating that critical information for MPPT control resides in a small subset of highly relevant features.

The feature importance analysis confirms physical interpretability across all datasets: power-related variables and rotor dynamics achieve the highest MI scores (> 0.40 for top-3 features), with no spurious correlations to irrelevant variables. The consistency of rotor speed, wind speed, and active power across all Pareto solutions and datasets validates their universal role in wind energy MPPT control, ensuring that selected subsets align with developed domain data rather than statistical artifacts. Feature count transitions prove effective dimensionality reduction: Kelmarsh (464 → 185 after MI filtering → 58/97/187 via AMO-BHS), VV (87 → 35 → 12/21/38), RAC (5 → 2 → 2/3/4).

  1. D.

    Performance Comparison with Baseline FS Methods

The performance comparison examines the MI-AMO-BHS against four baseline methods representing different FS paradigms: pure MI ranking, classical GA, recursive feature elimination with RF, and LASSO-based embedded selection. The evaluation employs consistent metrics and experimental conditions across all methods to ensure fair comparison and statistical validity of the experimental performance differences.

The comparative evaluation of FS highlights the superiority of MI-AMO-BHS across accuracy, efficiency, and feature-reduction metrics, as shown in Fig. 9a to f. In terms of predictive accuracy, MI-AMO-BHS consistently achieves the lowest RMSE across the datasets, with values of 0.2567 for Kelmarsh, 0.2467 for VV Wind Farms, and 0.1234 for the RAC (Fig. 9a). Compared to Pure MI Ranking, this corresponds to accuracy improvements between 20.6 and 28.6%, while against LASSO Embedded selection, the improvements range from 25.7 to 34.7%.

Fig. 9
Fig. 9
Full size image

(a) to (f): Performance comparison of FS across datasets: (a) RMSE, (b) computation time, (c) feature reduction percentage, (d) RMSE vs. computation time trade-off, (e) overall performance ranking, and (f) efficiency score. Results are exposed for MI.

Computation time analysis (Fig. 9b) indicates that MI-AMO-BHS requires between 187.3 s (Kelmarsh) and 45.6 s (VV Farms). While this is higher than lightweight baselines such as Pure MI Ranking or LASSO, the additional computational cost is offset by substantially improved predictive accuracy. RF-RFE shows similar execution times (156.4–34.1 s) but delivers inferior accuracy. Dimensionality reduction ratios are also competitive (Fig. 9c). MI-AMO-BHS achieves over 60% feature reduction on the Kelmarsh and VV datasets while maintaining balanced performance on RAC. The method thus preserves subset compactness without sacrificing accuracy. The RMSE vs. computation time trade-off (Fig. 9d and f) confirms that MI-AMO-BHS solutions lie closer to the optimal frontier, achieving lower error at the assumed computational cost than all baselines.

The overall ranking analysis (Fig. 9a to f) consolidates the results across the evaluation criteria. MI-AMO-BHS attains the best average ranking (≈2.17), outperforming GA, RF-RFE, Pure MI Ranking, and LASSO Embedded across datasets. Finally, efficiency scores (reduction/RMSE) further validate the benefit of MI-AMO-BHS, which consistently generates the highest ratios across all datasets (2.39–2.98), indicating a superior trade-off between dimensionality reduction and prediction accuracy.

The validity of the performance improvements achieved by MI-AMO-BHS was assessed using statistical significance testing and robustness evaluation, as presented in Tables 6 and 7.

Table 6 Statistical significance analysis.
Table 7 Robustness analysis across cross-validation folds (RMSE in kW for Kelmarsh/VV, RPM for RAC).

Statistical Significance Testing (Table 6). Comparisons against Pure MI Ranking confirm highly significant improvements across the Kelmarsh and VV datasets (p < 0.001, effect sizes 0.76–0.89), while the Reverse-Acting Controller also records a statistically significant gain (p = 0.021, effect size 0.54). Against LASSO Embedded, MI-AMO-BHS again achieves strong significance across all datasets (p < 0.001 for Kelmarsh and VV, p = 0.034 for RAC) with large effect sizes up to 0.94, highlighting substantial advantages over regularization-based selection. Comparisons with GA provide moderate significance in operational datasets (Kelmarsh, p = 0.003; VV, p = 0.012, effect sizes 0.38–0.45), although differences in RAC are not statistically significant (p = 0.187). Between RF and RFE, differences are minor and not statistically significant (p > 0.05 across all datasets), indicating the relative competitiveness of this method. Overall, these results confirm that MI-AMO-BHS delivers statistically reliable gains over MI- and LASSO-based methods, while maintaining comparable performance to tree-based feature elimination.

Robustness Analysis (Table 7). MI-AMO-BHS proves stable performance across cross-validation folds, with coefficients of variation (CV) below 8% in all datasets and tight 95% confidence intervals: Kelmarsh [0.2459, 0.2675] kW, VV [0.2303, 0.2631] kW, RAC [0.1156, 0.1312] RPM. The non-overlapping confidence intervals between MI-AMO-BHS and baseline methods (Pure MI, LASSO) confirm the statistical significance of performance improvements beyond sampling variation. In contrast, the GA exhibits high variability, with CV ranging from 13.26 to 15.27% and wider confidence intervals (e.g., Kelmarsh [0.2587, 0.3387] kW), indicating susceptibility to stochastic fluctuations. RF-RFE also achieves low variability (5.99–8.63% CV) with confidence intervals [0.2642, 0.2936] kW for Kelmarsh, comparable to MI-AMO-BHS, while LASSO shows higher instability in some cases (up to 9.96% CV). These results indicate that MI-AMO-BHS combines accuracy improvements with reliability, ensuring that performance gains are not objects of sampling variation.

Ablation study: component contribution analysis

To validate the necessity of both stages in the proposed hybrid method, this study conducted systematic ablation experiments comparing three configurations: Pure MI filtering (Stage 1 only), AMO-BHS optimization without pre-filtering (Stage 2 only), and the complete hybrid method (Stage 1 + Stage 2). Table 8 presents comparative results across all three datasets using identical RF evaluation settings (100 estimators, max depth 10) and computational budgets.

Table 8 Ablation study—component contribution analysis.
  1. A.

    Stage 1 Only (Pure MI Filtering): This configuration represents a pure filter-based method where features are ranked by MI (Eq. (9)) and the top features are retained without further optimization. For the Kelmarsh dataset, Pure MI achieves RMSE = 0.2891 kW using 93 features with minimal computation time (12.3 s). While computationally efficient, this method suffers from 2 vital limitations: (i) threshold-based selection ignores feature interactions and complementarity effects that impact predictive performance, and (ii) the fixed retention ratio lacks adaptability to dataset-specific optimal subset sizes. Similar patterns emerge across VV (RMSE = 0.2745 kW, 11 features) and RAC (RMSE = 0.1389 RPM, 2 features) datasets, where ranking alone proves insufficient for identifying optimal feature combinations.

  2. B.

    Stage 2 Only (AMO-BHS without MI Pre-filtering): This configuration applies multi-objective binary harmony search directly to the whole feature space without dimensionality reduction, representing a pure WM. For Kelmarsh, AMO-BHS-only achieves higher accuracy (RMSE = 0.2734 kW) than Pure MI, underscoring the value of optimization-guided selection. However, the method faces severe computational scalability challenges due to exponential search complexity. Operating on Kelmarsh’s 464-dimensional binary space (), AMO-BHS-only requires 1628.4 s—8.7 times longer than the hybrid approach—and exhibits premature convergence, as the harmony memory struggles to adequately explore the vast solution landscape within the iteration cost. For VV (52 dimensions) and RAC (5 dimensions), computation times remain substantial at 287.3 and 45.2 s, representing 3.2 × and 3.6 × overheads compared to the hybrid method.

  3. C.

    Hybrid Architecture (Proposed): The complete two-stage pipeline synergistically combines the strengths of components while mitigating their individual weaknesses. MI filtering (Stage 1) rapidly eliminates clearly irrelevant features, reducing Kelmarsh’s search space to (185 features after retention), thereby transforming an intractable optimization problem into a manageable one. AMO-BHS (Stage 2) then refines the selection through multi-objective evaluation, classifying feature subsets that balance prediction accuracy and computational efficiency beyond what simple ranking can achieve.

The hybrid method achieves superior performance across all metrics: For Kelmarsh, RMSE = 0.2567 kW (11.2% improvement over Pure MI, 6.1% over AMO-BHS-only) with 58 features (37.6% fewer than Pure MI, 13.4% fewer than AMO-BHS-only) in 187.3 s (88.5% faster than AMO-BHS-only). Similar advantages are experimental for VV (10.1% accuracy gain over Pure MI, 68.8% time reduction vs. AMO-BHS-only) and the RAC dataset (11.2% accuracy improvement, 72.6% computational savings).

The ablation results reveal distinct computational profiles. Pure MI exhibits linear time complexity dominated by MI estimation, providing rapid but suboptimal solutions. AMO-BHS-only proves polynomial-to-exponential scaling, where high values in unreduced spaces cause prohibitive computation times and convergence difficulties. The hybrid model achieves favorable complexity via a two-phase decomposition: MI pre-filtering reduces to linear time, followed by AMO-BHS operating on the tractable reduced space, resulting in substantial acceleration without sacrificing solution quality.

Conclusion and future works

This paper presents a two-stage hybrid FS model for wind energy MPPT systems that addresses the computational challenges posed by high-dimensional sensor data across diverse operational contexts. The methodology combines MI filtering with Adaptive Multi-Objective Binary Harmony Search and is validated across 3 distinct datasets: Kelmarsh Wind Farm (464 features, temperate UK conditions), VV Wind Farms (87 features, tropical monsoon environment), and RAC-WECS (5 features, laboratory conditions). Results prove RMSE improvements of 9.4–14.7% compared to full-feature baselines, with compact feature subsets of 58–97 features (76.9–87.5% dimensionality reduction) maintaining prediction accuracy within 95% confidence intervals [0.2459, 0.2675] kW for Kelmarsh and [0.2303, 0.2631] kW for VV. Statistical significance (p < 0.001, effect sizes 0.76–0.89) confirms reliability across cross-validation folds.

However, dataset-specific challenges emerged: the VV tropical dataset displayed lower Pareto quality (hypervolume = 0.7156 vs. 0.8234 for Kelmarsh), reflecting inherent difficulties in monsoon wind regimes with high temporal variability. This highlights that FS complexity scales with operational regime variability, not merely dimensionality. Deployment considerations include sensor drift risks, where calibration degradation of high-importance features (e.g., nacelle wind speed MI = 0.4876) may disproportionately impact compact subsets. Mitigation methods include periodic online feature re-ranking (quarterly MI recompilation), drift detection via residual monitoring, and maintaining redundant Pareto solutions for graceful degradation under sensor failures.

The AMO-BHS algorithm’s adaptive parameter control and proper Pareto optimization represent advances for binary optimization in RE applications, though site-specific tuning may be required for novel turbine models or extreme environments. Multi-dataset validation ensures methodological robustness within the tested conditions, whereas cross-site transfer learning remains unvalidated.

Future research should address: (1) scalability to ultra-high dimensions (> 1000 features), (2) temporal dependency integration for sequential prediction, (3) sensor reliability incorporation into selection criteria, (4) online adaptive mechanisms for non-stationary regimes, and (5) cross-domain transfer to solar prediction and grid integration applications. The methodology provides practical solutions for controlled deployments while highlighting operational considerations essential for long-term production reliability, supporting the transition toward computationally efficient RE systems.