A novel hybrid feature selection method combining binary grey wolf optimization and cuckoo search

Liu, Xiangling; Tian, Hongwei

doi:10.1038/s41598-025-29018-x

Download PDF

Article
Open access
Published: 23 November 2025

A novel hybrid feature selection method combining binary grey wolf optimization and cuckoo search

Xiangling Liu¹ &
Hongwei Tian¹

Scientific Reports volume 15, Article number: 45190 (2025) Cite this article

1635 Accesses
Metrics details

Subjects

Abstract

The increasing availability of high-dimensional data in modern applications poses serious challenges for machine learning models, including excessive computation, high memory demand, and degraded accuracy caused by redundant or irrelevant features. This paper introduces a novel feature selection algorithm, Binary Grey Wolf Optimization with Cuckoo Search (BGWOCS), which distinguishes itself from previous hybrid GWO-based methods through its unique integration of nonlinear adaptive convergence for dynamic exploration-exploitation balance and Lévy flight-based alternation for enhanced global search. The proposed method combines the local exploitation capability of Binary Grey Wolf Optimization with the global exploration of Cuckoo Search, further incorporating a probabilistic variation mechanism to maintain population diversity and prevent premature stagnation. Experimental validation on ten benchmark UCI datasets demonstrates that BGWOCS achieves up to 4% higher classification accuracy and 15% fewer selected features compared to four competitive algorithms (HRO-GWO, GWOGA, MTBGWO, and IBGWO), with statistically significant improvements (p < 0.05).

Introduction

The unprecedented growth of data in modern information systems has created vast opportunities for knowledge discovery while imposing significant computational challenges^1,2. High-dimensional datasets exacerbate the curse of dimensionality, leading to overfitting, prolonged training times, and diminished model interpretability due to redundant, noisy, or irrelevant features^3,4. Feature selection (FS) is a vital preprocessing technique that identifies compact, informative feature subsets to enhance classification performance and reduce computational overhead^5,6.

Among FS strategies, filter methods⁷ prioritize efficiency but often sacrifice accuracy, while embedded approaches⁸ integrate selection into model training at the risk of overfitting due to model-specific dependencies. Wrapper methods¹⁰, particularly when coupled with meta-heuristic optimization⁹, consistently deliver superior accuracy by leveraging external classifiers, making them ideal for complex, nonlinear datasets. Meta-heuristic algorithms including evolutionary, physics-based, and swarm intelligence techniques^11,12 excel in high-dimensional, non-linear FS problems¹³. However, challenges such as parameter sensitivity and exploration-exploitation imbalance persist.

Grey Wolf Optimization (GWO)¹⁴ is a prominent swarm intelligence algorithm valued for its simplicity and rapid convergence. Yet, its continuous formulation struggles with discrete binary spaces required in FS. Although binary adaptations like bGWO¹⁶ and class-separability objectives¹⁵ have been proposed, hybrid variants such as ALO-GWO¹⁷, GWO-PSO¹⁸, and BIGWO¹⁹ frequently suffer from premature convergence, limited global exploration, poor scalability in high-dimensional spaces, and inadequate handling of discretization dynamics—issues underscored by the No Free Lunch theorem, which highlights the need for tailored hybrid innovations. The majority of grey wolves like to live in packs of 5–12 individuals and the population has a very rigid four-level social dominance structure as illustrated in Fig. 1.

This study introduces BGWOCS, a novel hybrid meta-heuristic that integrates Binary GWO (BGWO) with Cuckoo Search (CS) to address these limitations. Unlike prior hybrids, BGWOCS combines BGWO’s robust local exploitation with CS’s Lévy flight-driven global exploration, enhanced by an adaptive nonlinear convergence factor and a probabilistic variation operator to dynamically balance exploration-exploitation and prevent population stagnation. Validated across 10 UCI datasets, BGWOCS achieves superior accuracy and feature reduction with statistical significance (p < 0.05). The primary contributions are:

A unique BGWO-CS integration leveraging Lévy flight alternation for enhanced global search and local precision.
An adaptive nonlinear scaling parameter and probabilistic variation operator to ensure diversity and robust convergence.
Statistically validated superiority in accuracy, compactness, and efficiency over state-of-the-art methods on diverse datasets.

BGWOCS offers a generalizable framework for high-dimensional data analysis, effectively tackling key FS challenges: accuracy-feature trade-offs, local optima avoidance, and computational efficiency across varying dimensions.

The paper is organized as follows: Sect. 3 details the BGWOCS framework, including its algorithmic structure and novel components. Section 4 describes its application to FS, including fitness function design. Section 5 presents experimental results, comparisons, and statistical validations. Section 6 concludes with findings and future research directions.

Related works

One amazing dimensionality reduction method that can successfully eliminate duplicate features is feature selection (FS). Metaheuristic algorithms, such as the Gray Wolf Optimizer (GWO), have been used extensively in FS and have demonstrated good performance. However, when dealing with high-dimensional data, GWO and its variations have poor accuracy, low diversity, and limited adaptability. The mechanism of hybrid heterosis and breeding in nature is the source of the new metaheuristic algorithm known as the hybrid rice optimization (HRO) algorithm. This algorithm is quite good at finding and moving toward the best answers.

Thus, a novel method based on multi-strategy collaborative GWO coupled with HRO algorithm (HRO-GWO) for FS was proposed by researchers in²⁰. Four novel tactics, including three search strategies and a dynamic adjustment strategy, are used to improve the HRO-GWO algorithm. First, a dynamic tuning technique is developed to optimize the GWO parameter in order to increase the adaptability of GWO. Next, an HRO-inspired multi-strategy co-evolution model is created that increases population variety through the use of neighborhood search, double crossover, and self-assembly strategies.

Researchers in²¹ have suggested GWOGA, a novel hybrid algorithm that combines the Genetic Algorithm (GA) and the Gray Wolf Optimizer (GWO). Three primary strategies comprise GWOGA’s innovation: A hybrid optimization mechanism in which GWO guarantees fast convergence in the early stages and GA refines the global search in the later stages to avoid local optima; (2) elite learning strategy to prioritize high-rank solutions, improving search hierarchy and efficiency; and (3) chaos map and opposition-based learning (OBL) to initialize a uniformly distributed population, increasing diversity and reducing premature convergence.

In²² proposes a threshold binary gray wolf optimizer for feature selection (MTBGWO) that is based on multi-elite interaction. To optimize search space usage and boost population diversity, a multi-population topology is used in the initial step. To increase the subpopulation’s capacity for local exploitation, the second phase involves adopting an information interaction learning method to update the subpopulation elite wolf’s position (ideal position) by learning a better position than other elite wolves. To update the population position, wolves in the second and third best positions are removed simultaneously. In order to transform the continuous locations of gray wolf individuals into binary positions for use in the feature selection problem, a threshold approach is finally used.

Three enhanced binary gray wolf optimization (GWO) techniques are put forth in²³ in an effort to maximize feature selection accuracy while choosing the fewest amount of features feasible. In each method, GWO is implemented first, followed by particle swarm optimization (PSO). The results produced by both algorithms are then altered differently by each method. This combination aims to use the large search space capabilities of PSO on the solutions acquired by GWO in order to solve the issue of GWO becoming stuck in a local optimum that may arise. The continuous solutions produced by each suggested method were converted into their corresponding binary equivalents using both S-shaped and V-shaped binary transfer functions.

For the analysis of biological protein sequences, researchers in²⁴ present SBSM-Pro, a machine learning-based technique that performs well when applied to intricate biological datasets. This strategy, which focuses on identifying important characteristics in biological data, is analogous to the current paper’s objective of feature selection optimization. This approach can be used as a benchmark to assess how well the suggested BGWOCS algorithm performs on datasets like Breastcancer.

The study²⁵ focuses on the interpretation of complicated biological data and inferred gene regulatory networks from single-cell transcriptome data using a graph self-encoding model. This approach can serve as a foundation for evaluating the exploration and exploitation tactics in the BGWOCS algorithm by optimizing the network structure. Additionally, its use on a variety of datasets aligns with the current paper’s objectives of increasing accuracy and minimizing features.

To find microRNA-disease connections, a low-rank approximation and multiple kernel learning approach is presented in²⁶. It emphasizes feature selection and classification accuracy. This method, which aims to minimize data dimensionality and maximize performance, is comparable to the GWO and Cuckoo Search strategy merged in BGWOCS.

For the first time, research²⁷ introduces the CS-ExtraTrees model, which combines cuckoo search (CS) with ExtraTrees to find the best hyperparameters. Cuckoos’ parasitic feeding habits and the idea of Lévy flying, which increases random flight ability, allow CS to efficiently look for ideal parameters on a global basis.

A population evaluation method and a collaborative development mechanism serve as the foundation for the multi-strategy distinctive creative search (MSDCS) proposed in Paper²⁸. In order to address the shortcomings of the DCS algorithm, such as its limited exploration ability and propensity to fall into local optima due to the guiding effect of dominant populations, it then suggests a collaborative development mechanism that naturally integrates the estimation distribution algorithm and DCS. At the same time, it enhances the DCS algorithm’s search efficiency and solution quality.

In²⁹, a novel interpretability framework is introduced that combines causal reasoning and instance-based feature selection to explain the choices made by black-box image classifiers. Their approach finds input regions that have the biggest causal impact on the model’s predictions, as opposed to depending on feature importance or mutual information.

For the hybrid production batch flow scheduling problem with dynamic order entry (HFLSSP_DOA), a hybrid knowledge-and data-based method is suggested in³⁰. Formulating the problem as a dynamic heterogeneous graph with changeable edge length is part of the knowledge-based component. The development of a configurable and constructive group solution framework (CCESF) is motivated by the problem solving through graph updates.

The evaluated GWO-based hybrids continuously show important gaps in global exploration, premature convergence, and adaptability to nonlinear, high-dimensional search spaces, despite improving feature selection performance. The design of BGWOCS is driven by these restrictions, which are especially noticeable in ALO-GWO¹⁷, GWO-PSO¹⁸, BIGWO¹⁹, HRO-GWO²⁰, GWOGA²¹, MTBGWO²², and IBGWO²³. These shortcomings are immediately addressed by BGWOCS, which achieves greater balance, diversity, and scalability in wrapper-based FS by carefully integrating Cuckoo Search’s Lévy flight mechanism for robust global search with adaptive nonlinear convergence and probabilistic variation.

The proposed hybrid binary approach

GWO in continuous optimization lets agents move freely in the search space. However, in order to successfully handle 0/1 decisions, feature selection (FS) needs a discrete, binary framework, which calls for modifications. GWO and other swarm intelligence algorithms depend on a careful balancing act between local exploitation to improve solutions and global exploration to prevent premature convergence. In this work, a unique hybrid strategy that combines Binary GWO (BGWO) and Cuckoo Search (CS) is presented: BGWOCS. With the use of a probabilistic variation operator and an adaptive scaling parameter, BGWOCS improves the exploration-exploitation trade-off for FS tasks.

Binary grey wolf optimization with cuckoo search (BGWOCS)

Due to its limited global search capabilities, traditional GWO risks stagnating in local optima while excelling in local exploitation. Inspired by the cuckoo’s reproductive strategy and driven by Lévy flight-based exploration, CS provides robust global search using long-step, randomized movements. By switching between the two every ten iterations BGWO for the first ten, CS for the next, and so on BGWOCS combines the concentrated local search of BGWO with the broad exploration of CS. According to ablation tests, this fixed-cycle alternation dynamically balances local and global search dynamics, making it easier to implement, incurring no additional computational cost, and improving accuracy by 0.6% over static schedules (such as BGWO-only or CS-only).

Adaptive scaling parameter

A transfer function is necessary when switching from continuous to binary Grey Wolf Optimization (BGWO) in order to translate continuous position updates into binary (0 or 1) feature selection decisions. In swarm intelligence systems such as BGWO, the scaling parameter is crucial to maintaining a balance between exploration and exploitation³¹. Effective global exploration and accurate local exploitation are limited by the convergence factor’s typical linear reduction in classical GWO, which ignores the nonlinear dynamics of complicated optimization tasks.

$\:D\left(t\right)\:=\:1\:+\:\backslash\:exp(-t\:/\:T)\:)$, where (t) is the current iteration and (T) is the maximum number of iterations, is the adaptive scaling parameter that BGWOCS provides to remedy this. In order to encourage thorough exploration in early iterations and enable quick coverage of the search space, this exponential formulation begins with a large value (around 2). $\:D\left(t\right)$ steadily drops as iterations go on, concentrating the search on exploitation to improve solutions. This adaptive strategy complements the nonlinear character of feature selection, improving global search in the first stage and local precision in the last, in contrast to the linear decline in regular GWO.

A smooth transition is ensured by using an exponential decay function, which steers clear of sudden changes that can interfere with convergence. Ablation experiments show that by better meeting the dynamic needs of exploration and exploitation, this adaptive scaling increases classification accuracy by 0.5–0.7% as compared to linear models, especially on high-dimensional datasets.

Explicit formula for nonlinear convergence factor

The nonlinear convergence factor $\:a$ is defined as $\:a=2\times\:{\left(\frac{t}{T}\right)}^{\gamma\:}$, where $\:t$ is the current iteration, $\:T$ is the maximum number of iterations, and $\:\gamma\:=2$ is a nonlinear exponent chosen to ensure slow exploration in early iterations and rapid exploitation toward the end. This nonlinear growth contrasts with the linear version $\:a=2\times\:(1-\frac{t}{T})$ used in standard GWO.

Figure 2 shows the behavior of the nonlinear convergence coefficient compared to $\:a=2\times\:(1-t/T)$ at 100 iterations. The nonlinear approach shows a gradual increase for advanced exploration in the early stages followed by a sharp increase for exploitation, which supports the improved performance of BGWOCS on high-dimensional datasets. The nonlinear coefficient (blue) shows slower initial growth and faster final convergence, which increases the efficiency of BGWOCS optimization.

Probabilistic Gaussian variation

Maintaining population variety is essential in meta-heuristic optimization to avoid premature convergence, especially in binary feature selection tasks where solutions are limited to either 0 or 1. Convergence toward dominating solutions (represented by $\:\alpha\:$, $\:\beta\:$, and $\:\delta\:$) in the classic GWO algorithm frequently decreases population diversity, raising the possibility of becoming trapped in local optima⁶. A probabilistic diversity operator that adds controlled unpredictability to leader wolf position updates is introduced by the Binary Grey Wolf Optimization with Controlled Search (BGWOCS) in order to solve this problem. This operator preserves convergence stability while improving global exploration.

There are two steps involved in implementing the diversity operator. To decide if a position update is necessary in the first step, a perturbation probability is computed. The definition of this probability is:

$$\:{P}_{pert}\left(t\right)=0.15.exp\left(-\frac{t}{T}\right)$$

(1)

where T is the maximum number of iterations, t is the current iteration, and the base probability is the constant 0.15. Early iterations favor exploration, which gradually gives way to exploitation as iterations go on according to the exponential decay function. This dynamic probability is intended to maintain a harmonious balance between exploration and exploitation by enhancing the adaptive scaling parameter discussed in Sect. 3.2.

The second step involves mapping the update to the binary space by applying a random perturbation using a logistic transformation for each dimension d of a leader’s position vector $\:{P}_{old}$. This is how the perturbation value is calculated:

$$\:Z\left[d\right]=\frac{1}{1+\text{e}\text{x}\text{p}(-N\left(\text{0,0.1}\right))}$$

(2)

When the value of N(0, 0.1), a Gaussian random variable, falls between 0 and 1 and has a mean of 0 and a standard deviation of 0.1. A continuous and seamless mapping is guaranteed by the logistic function, making it appropriate for procedures involving binary decisions. The following is the definition of the position update rule:

$$\:{P}_{new}\left[d\right]=\left\{\begin{array}{c}1-{P}_{old}\left[d\right]\:\:\:\:\:\:if\:rand<{P}_{pret}\left(t\right).Z\left[d\right],\\\:{P}_{old}\left[d\right]\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise,\end{array}\right.$$

(3)

where rand is a random number in [0, 1] that is uniformly distributed. This rule encourages diversity, especially in the early phases of the optimization process, by inverting the binary value of the dimension (i.e., from 0 to 1 or vice versa) when the product of the perturbation probability and the logistic transformation over the random threshold.

The modified leader positions undergo a supplementary diversity check to further improve exploration and avoid stagnation. The definition of this check is:

$$\:{P}_{new}\left[d\right]=\left\{\begin{array}{c}randint\left(\text{0,1}\right)\:\:\:\:\:if\:rand<0.05\:and\:t<\frac{T}{2},\\\:{P}_{new}\left[d\right]\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise,\end{array}\right.$$

(4)

where the restriction $\:t\:<\frac{T}{2}$ limits this forceful perturbation to the first half of the iteration cycle, and randint (0, 1) randomly chooses either 0 or 1. This technique guarantees sustained diversity in high-dimensional datasets and is inspired by evolutionary strategies³². According to ablation research, this method preserves computational efficiency while improving solution quality by 0.5% to 0.7% when compared to non-perturbed runs.

Figure 3 presents the overall workflow of the proposed BGWOCS algorithm. The process begins with the initialization of the population and algorithmic parameters, followed by the evaluation of the initial fitness values for all candidate solutions. The best and worst wolves are identified, and an adaptive nonlinear convergence factor ($\:\alpha\:,\:\beta\:,\:\delta\:$) is computed to dynamically regulate the balance between exploration and exploitation. Subsequently, the Cuckoo Search mechanism with Lévy flights is applied to enhance global exploration and prevent premature convergence. The positions of wolves are then updated using adaptive control, ensuring diversity within the population. The fitness values are recalculated, and the process iterates until the stopping criterion either maximum iterations or convergence is met. The final output represents the optimal subset of features that provides the best trade-off between accuracy and dimensionality reduction.

Described in Algorithm 1, the BGWOCS algorithm is a hybrid meta-heuristic intended for feature selection based on wrappers. The dataset is first divided into test, validation, and training sets. After initializing a binary population at random, each solution is assessed using a fitness function that combines feature count penalty $\:\beta\:$ =0.05 and KNN classification error $\:\alpha\:$=0.95. The global best and the top three solutions ($\:\alpha\:$, $\:\beta\:$, and $\:\delta\:$) are determined. Every 20 cycles, BGWOCS switches between BGWO and CS in the main loop. While CS uses Lévy flights $\:\delta\:$=1.5 for global exploration, BGWO uses sigmoid mapping and a dynamic adjustment factor $\:{D}_{t}\:\leftarrow\:\:1+exp\:(t/T)$ to update positions. By flipping bits in leader solutions, a stochastic perturbation method that has a probability of $\:{p}_{div}=\:0.15\:.exp(-t/T)$ increases diversity. Random reinitialization is used in early iterations to avoid stagnation. The method achieves strong performance across a variety of datasets by returning the ideal feature subset and test accuracy.

Algorithmic complexity analysis

When T is the maximum number of iterations, M is the population size, and D is the number of features, the time complexity of BGWOCS is $\:O\left(T\:\times\:\:M\:\times\:\:D\right)$. With $\:O\left(D\right)$ operations per update and $\:O(M\:\times\:\:D)$ per iteration for the total population, this results from the position updates and fitness assessments for every individual in each iteration. Although they include constant elements, the Cuckoo Search and variation phases do not change the overall order. Because of its $\:O(M\:\times\:\:D)$ space complexity, which is mostly used to store the leader solutions and population positions, BGWOCS is effective for high-dimensional FS tasks without requiring a lot of memory.

BGWOCS for feature selection problem

Fitness evaluation function

Finding the ideal feature subset that strikes a compromise between feature reduction and classification accuracy is the goal of feature selection (FS), a multi-objective optimization job. By expressing solutions as binary vectors, where 1 denotes a selected feature and 0 denotes an unselected one, the suggested BGWOCS solves this problem. A fitness function that minimizes the number of selected features and classification error is used to assess the quality of each subgroup.

The definition of the fitness function is:

$$\:F\left(S\right)=\:\alpha\:.{E}_{KNN}\left(S\right)+\beta\:.\frac{\left|S\right|}{D}$$

(5)

where $\:\left|S\right|$ is the number of features that were chosen, $\:D$ is the total number of features, and $\:{E}_{KNN}\left(S\right)$ is the classification error rate of the K-Nearest Neighbors (KNN) classifier. A fair trade-off is ensured by the weights $\:\alpha\:$ = 0.95 and $\:\beta\:$ = 0.05, which penalize large feature groups while prioritizing accuracy.

Because of its simplicity and low computational complexity, KNN with (k = 5 ) is a good choice for wrapper-based FS. In the feature space, it groups instances according to the majority class of their (k) closest neighbors³³.

A)
Justification of Weighting Parameters (α = 0.99, β = 0.01).

To meet the main objective of strong predictive performance in high-dimensional environments, the fitness function penalizes large feature subsets (β = 0.01) while prioritizing classification accuracy (α = 0.99). Four datasets representing low to high dimensionality were subjected to a sensitivity study in which α ∈ {0.90, 0.95, 0.99, 0.999} (β = 1 − α) was varied in order to validate this selection.

The mean accuracy and a few chosen attributes over 20 runs are shown in Table 1. Accuracy is compromised but features are reduced when α is less than 0.99. A larger subset size results in a minor accuracy gain (< 0.3%) when α is raised. The accuracy–subset size trade-off is depicted by the Pareto front in Fig. 4, which confirms that the ideal knee point maximum accuracy with effective reduction—is reached at α = 0.99. Thus, the robust default for all experiments is set to α = 0.99, β = 0.01.

Table 1 Sensitivity analysis of fitness weights (α, β) on four datasets (20 runs).

Subjects

Abstract

Introduction

Related works

The proposed hybrid binary approach

Binary grey wolf optimization with cuckoo search (BGWOCS)

Adaptive scaling parameter

Explicit formula for nonlinear convergence factor

Probabilistic Gaussian variation

Algorithmic complexity analysis

BGWOCS for feature selection problem

Fitness evaluation function

Experimental results

Datasets

Experimental environment and parameter settings

Results and discussion

The complete results of BGWOCS

Comparison of the proposed BGWOCS

Statistical validation

Classifier dependence analysis

Statistics & reproducibility

Ablation experiments

Discussion

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links