Hybrid rice optimization algorithm inspired grey wolf optimizer for high-dimensional feature selection

Ye, Zhiwei; Huang, Ruoxuan; Zhou, Wen; Wang, Mingwei; Cai, Ting; He, Qiyi; Zhang, Peng; Zhang, Yuquan

doi:10.1038/s41598-024-80648-z

Download PDF

Article
Open access
Published: 28 December 2024

Hybrid rice optimization algorithm inspired grey wolf optimizer for high-dimensional feature selection

Zhiwei Ye^1,2,
Ruoxuan Huang¹,
Wen Zhou^1,2,
Mingwei Wang^1,2,
Ting Cai^1,2,
Qiyi He^1,2,
Peng Zhang³ &
…
Yuquan Zhang³

Scientific Reports volume 14, Article number: 30741 (2024) Cite this article

3551 Accesses
8 Citations
Metrics details

Subjects

Abstract

Feature selection (FS) is a significant dimensionality reduction technique, which can effectively remove redundant features. Metaheuristic algorithms have been widely employed in FS, and have obtained satisfactory performance, among them, grey wolf optimizer (GWO) has received widespread attention. However, the GWO and its variants suffer from limited adaptability, poor diversity, and low accuracy when faced with high-dimensional data. The hybrid rice optimization (HRO) algorithm is an emerging metaheuristic algorithm derived from the hybrid heterosis and breeding mechanism in nature. It possesses a robust capacity to identify and converge towards optimal solutions. Therefore, a novel approach based on multi-strategy collaborative GWO combined with the HRO algorithm (HRO-GWO) for FS is proposed in this paper. The HRO-GWO algorithm is enhanced by four innovative strategies including dynamical regulation strategy and three search strategies. First, to improve the adaptability of GWO, the dynamical regulation strategy is devised for parameter optimization of GWO. Then, a multi-strategy co-evolution model inspired by HRO is designed, which utilizes neighborhood search, dual-crossover, and selfing techniques to bolster population diversity. Finally, the study develops a hybrid filter-wrapper framework incorporating chi-square and the HRO-GWO algorithm to efficiently select pertinent and informative feature subsets, enhancing the classification performance while conserving time. The performance of HRO-GWO has been rigorously assessed across benchmark functions and the effectiveness of the proposed framework has been evaluated on small-sample high-dimensional biomedical datasets. Our experimental findings demonstrate that the approach on the basis of HRO-GWO outperforms state−of-the−art methods.

An effective feature selection approach based on hybrid Grey Wolf Optimizer and Genetic Algorithm for hyperspectral image

Article Open access 15 January 2025

Hierarchical multi step Gray Wolf optimization algorithm for energy systems optimization

Article Open access 15 March 2025

A hybrid algorithm of grey wolf optimizer and harris hawks optimization for solving global optimization problems with improved convergence performance

Article Open access 21 December 2023

Introduction

In the big data era, the feature dimensions of collected data have increased exponentially, from dozens of dimensions to tens of thousands of dimensions. It has rapidly increased the difficulty of data mining tasks^1,2. Therefore, how to efficiently extract valuable information from data is a popular research topic in data mining and machine learning^3,4.

Feature selection (FS) stands as a pivotal and extensively employed technique for dimensionality reduction that can obtain effective information from big data^5,6. This method allows selecting the feature subsets that encapsulate the most pertinent features while retaining the inherent physical significance of the original data⁷. FS methods can be classified into three categories according to class labels: unsupervised, semi-supervised, and supervised⁸. Unsupervised methods have the capability to identify a subset of features without relying on class labels, but they may exhibit instability attributed to the absence of prior information⁹. Semi-supervised methods can handle subsets of features that contain both labeled and unlabelled data, but they rely heavily on the accuracy of the labeled data. In comparison, supervised methods tend to achieve superior FS results when abundant labeled data is available, benefiting from the inclusion of class labels.

Supervised FS methods involve three search strategies: exhaustive search, sequential search¹⁰, and random search. The initial two search methods are less efficient. However, random search introduces randomness into the search process, thereby yielding comparatively superior results with good efficiency. In recent years, several metaheuristic algorithms have been extensively used in FS owing to their powerful search capabilities in large−scale spaces^11,12, such as Genetic Algorithm (GA)^13,14, Aquila Optimizer (AO)¹⁵, Ant Colony Optimization (ACO) algorithm¹⁶, Sine Cosine Algorithm (SCA)^17,18, Whale Optimization Algorithm (WOA)¹⁹, and Particle Swarm Optimization (PSO) algorithm^20,21. Therefore, this paper selects a metaheuristic algorithm for FS of high-dimensional data and verifies the feasibility and superiority of the proposed method through experiments.

In 2014, the Grey Wolf Optimizer (GWO) was proposed²², a population-based metaheuristic algorithm that mimics the social hierarchy and group hunting behavior of grey wolves. Owing to its inherent simplicity, fewer requirements for control parameters, and strong optimization performance, the GWO has found extensive applications across engineering problems²³, anomaly detection²⁴, band selection²⁵, path planning^26,27, FS^28,29, and other fields^30,31,32. M. Mafarja et al.³³ identify the primary limitation of the convergence factor strategy as its ability to transition an algorithm from an exploration phase to an exploitation phase, irrespective of the outcomes achieved thus far. To address this issue, they have proposed the introduction of a convergence control parameter (cp), which is designed to regulate the shift from exploration at the initial stages of the optimisation process to exploitation at the subsequent stages. Nadimi-Shahraki et al.²³ have enhanced the hunting search strategy of wolves through the implementation of a new search strategy named dimension learning-based hunting (DLH). The objective is to address the weaknesses including deficiency in population diversity. J. Pirgazi et al.³⁴ underscored the significance of feature relevance in big data analytics by proposing a gene selection method for high-dimensional datasets. The method is based on hybrid filter-wrapper metaheuristics, which are designed to facilitate effective FS in large−scale genetic datasets. In summary, the GWO algorithm has three main limitations in the process of FS for small-sample high-dimensional data:

1.
Limited adaptability The adaptability and balance between the exploration and exploitation of GWO are limited when using the linear change method for the convergence factor³³.
2.
Poor diversity The population evolution mode of GWO is singular, which might be easily trapped in the local optimum²³.
3.
Low accuracy When directly using the GWO algorithm for FS, it may lead to low accuracy by ignoring the relationship between information and the category³⁴.

Therefore, the paper studies how to improve the GWO algorithm to address its three limitations in the field of high-dimensional FS. To alleviate these constraints, the concept of hybridizing with complementary metaheuristics has emerged as a promising approach.

Hybrid Rice Optimization (HRO) algorithm³⁵, which takes inspiration from heterosis theory, is a newly developed metaheuristic algorithm. Its good performance in solving 0-1 knapsack problem³⁶, band selection problem³⁷, computer-aided diagnosis³⁸, and intrusion detection³⁹ demonstrates its notable search efficiency and robust global search capabilities. Furthermore, the concept of hybridization combined with metaheuristic algorithms has been successfully applied to FS⁴⁰.

Table 1 Metaheuristic algorithms for comparison.

Subjects

Abstract

Similar content being viewed by others

An effective feature selection approach based on hybrid Grey Wolf Optimizer and Genetic Algorithm for hyperspectral image

Hierarchical multi step Gray Wolf optimization algorithm for energy systems optimization

A hybrid algorithm of grey wolf optimizer and harris hawks optimization for solving global optimization problems with improved convergence performance

Introduction

Related work

Metaheuristic algorithms

Feature selection method based on other approaches

Feature selection method based on GWO

Preliminaries

Grey wolf optimizer

Hybrid rice optimization algorithm

Dimension learning-based hunting search strategy

Chi-square feature selection

The proposed method

Model overview

Feature selection based on the HRO-GWO algorithm

The HRO-GWO algorithm

Binary encoding rules and fitness function

Dynamical regulation strategy for convergence factor

Multi-strategy co-evolution model

Neighborhood search strategy

Dual-crossover strategy

Selfing strategy

Feature selection framework based on chi-square and HRO-GWO

Time complexity analysis

Experimental results and discussions

Experiment on benchmark functions

Experiment on biomedical datasets

Comparison to other FS approaches

Nonparametric test

Ablation studies

Conclusion and future work

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Reprints and permissions information

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A novel hybrid feature selection method combining binary grey wolf optimization and cuckoo search

An improved Grey Wolf Optimizer based on mutation operator, evolutionary population dynamics, and nonlinear population size reduction strategy

A dual-enhanced long short-term memory earthquake prediction method based on improved and hybrid rice-inspired gray wolf optimizers

Search

Quick links