Enhanced Grey Wolf Optimization (EGWO) and random forest based mechanism for intrusion detection in IoT networks

Alqahtany, Saad Said; Shaikh, Asadullah; Alqazzaz, Ali

doi:10.1038/s41598-024-81147-x

Download PDF

Article
Open access
Published: 14 January 2025

Enhanced Grey Wolf Optimization (EGWO) and random forest based mechanism for intrusion detection in IoT networks

Scientific Reports volume 15, Article number: 1916 (2025) Cite this article

3142 Accesses
9 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Smart devices are enabled via the Internet of Things (IoT) and are connected in an uninterrupted world. These connected devices pose a challenge to cybersecurity systems due attacks in network communications. Such attacks have continued to threaten the operation of systems and end-users. Therefore, Intrusion Detection Systems (IDS) remain one of the most used tools for maintaining such flaws against cyber-attacks. The dynamic and multi-dimensional threat landscape in IoT network increases the challenge of Traditional IDS. The focus of this paper aims to find the key features for developing an IDS that is reliable but also efficient in terms of computation. Therefore, Enhanced Grey Wolf Optimization (EGWO) for Feature Selection (FS) is implemented. The function of EGWO is to remove unnecessary features from datasets used for intrusion detection. To test the new FS technique and decide on an optimal set of features based on the accuracy achieved and the feature taking filters, the most recent FS approach relies on the NF-ToN-IoT dataset. The selected features are evaluated by using the Random Forest (RF) algorithm to combine multiple decision trees and create an accurate result. The experimental outcomes against the most recent procedures demonstrate the capacity of the recommended FS and classification methods to determine attacks in the IDS. Analysis of the results presents that the recommended approach performs more effectively than the other recent techniques with optimized features (i.e., 23 out of 43 features), high accuracy of 99.93% and improved convergence.

A new intrusion detection method using ensemble classification and feature selection

Article Open access 20 April 2025

Machine learning based intrusion detection framework for detecting security attacks in internet of things

Article Open access 04 December 2024

(IoT) Network intrusion detection system using optimization algorithms

Article Open access 01 July 2025

Introduction

Background

Various fields require more frequent use of numerous applications and services on the Internet. Among them exist both e-commerce and e-learning. However, this tendency raises privacy and security issues. The recent rise in cyber-security breaches was caused by the convergence of the new scams and hacking tools’ targets, which seek to undermine the concepts of confidentiality, integrity, and availability. Any code that can be used to steal information, get around security measures, damage or compromise computer networks, software systems, or the Internet of Things (IoT) is referred to as malicious software¹. Several security measures, like firewalls, encryption, and anti-malware software, are employed to counter these attacks. To further analyze attacks, digital forensics techniques are used².

On the other hand, zero-day attacks and distinctive cyberattacks pose a serious threat to network security in widely used frameworks³. Nevertheless, cyber attackers appear to have no reservations or restrictions when it comes to carrying out their intrusions, regardless of the circumstances or nature of their targets. This includes compromising critical systems such as healthcare providers and the essential services they offer to patients. For example, on June 3, 2024, most hospitals in London were compelled to declare a severe incident after falling victim to a cyber-attack. This attack led to the cancellation of medical procedures and the diversion of emergency patients to other facilities. Initial investigations indicate that cyber criminals utilized ransomware to execute this assault.

Furthermore, cybersecurity expert Steve Sands from the Chartered Institute for IT warned that the ransomware threat has now become an ever-present danger for vital institutions, ranging from schools to hospitals (BBC, 2024). Malicious threats generate significant security challenges, necessitating the development of an innovative, adaptable, and more efficient intrusion detection system (IDS)⁴. An IDS is an alerting tool that identifies and categorizes compromises, attacks, or security policy breaches instantly at the network and host levels of infrastructure. Depending on intrusive behaviors, Intrusion detection is categorized into two types, i.e., IDS based on hosts (HIDS)⁵ and IDS based on networks (NIDS)⁶. Additionally, primary methods based on the detection mechanism are anomaly and signature-based methods. Using a signature-based technique, the IDS compares anomalous activity to stored and established trends. Therefore, the trend identification database needs to be updated on a regular basis to update IDS to identify newly generated attacks. In contrast, an anomaly-based IDS uses a pre-trained routine activity profile to identify various anomalous behaviors^7,8.

Heuristic methods are used in anomaly detection to identify undetected harmful activity⁹. A high false positive rate is produced by anomaly detection in many cases. Most businesses combine anomaly detection with fraud detection in their commercial solution platforms to address this issue. Although stateful protocol analysis operates at the network, application, and transport layers, it is the most effective approach when compared to the other highlighted methods used for IDS. To identify variations in suitable protocols and applications, this makes use of the specified vendor standard settings^10,11. Using machine learning (ML) approaches, an IDS that can effectively recognize and classify cyberattacks at the host and network layers is being created¹². Still, there are a lot of obstacles because malicious attacks happen frequently and in huge quantities, necessitating an adaptable approach. An IDS’s performance is significantly affected by the feature selection (FS) procedure in addition to its core, apart from the machine learning classification algorithms¹³. The effectiveness of the IDS can be significantly increased with strong FS and effective integration with the classification procedure. Meta-heuristic algorithms are one way to implement FS¹⁴.

Meta-heuristic algorithms that derive inspiration from biological systems mimic regular behaviors of living things under specific circumstances, such as hunting and pursuing prey^15,16. These approaches work well with data that has multiple aspects and in dynamic applications. Furthermore, the algorithms presented improved effectiveness in resolving optimization issues¹⁷. Additionally, according to researchers^18,19,20, bio-inspired meta-heuristic algorithms are frequently employed to address complex, real-time optimization and engineering challenges. Because IDSs handle a lot of data and must identify online invasions in a dynamic, multi-dimensional domain. Therefore, an issue that is thought to be real-world, it can be utilized to develop an effective IDS²¹.

FS is a pre-processing phase that gathers the most useful characteristics to create an efficient model²². This important phase directly impacts the effectiveness of the IDS. There are two primary techniques of FS wrapper-based and filter-based approaches. When searching and optimizing, the wrapper-based strategy evaluates the outcome based on the learning technique, whereas the filter-based approach leverages the correlation between data and the associated class label without consulting the learning technique. The wrapper-based strategy is more widely employed due to its proven results compared to the less expensive filter-based solution²³. Due to their exceptional accuracy, bio-inspired meta-heuristic algorithms are frequently employed in IDSs’ wrapper-based FS method²⁴.

Related work

Anomalous intrusion detection systems have employed a variety of methodologies in the past, including information theory²⁵, statistical analysis²⁶, clustering²⁷, and classification^28,29,30. Nonetheless, there exist certain research obstacles, including but not limited to network data size, unavailability of lab-led data and attributes, and adjustment to dynamic network environments³¹. Few studies have used machine learning and produced some encouraging initial findings^32,33. After a thorough literature analysis, it is discovered that machine learning could be effectively used for identifying network breaches^34,35,36, malicious program detection³⁷, and network traffic authentication^38,39.

Zhang et al.⁴⁰ presented an IDS model based on deep belief networks and improved genetic algorithm (IGA). Through multiple GA iterations, the optimum number of hidden layers and neurons in each layer are effectively created in response to different types of attacks, enabling the IDS based on the DBN (Deep Believe Networks) to facilitate an accurate detection rate with an efficient design. The framework and techniques are evaluated and simulated using the NSL-KDD dataset. The classification accuracy achieved is approximately 99%. Dwivedi et al.⁴¹ proposed an approach that combines the Ensemble of Feature Selection (EFS), and the metaheuristic-inspired Adaptive-Grasshopper Optimization Algorithm (AGOA) procedure. They demonstrate how the outcomes can be used to determine different types of attacks. The high-ranked selected group of characteristics is reduced using the EFS approach, and key attributes from the selected dataset are identified using the AGOA technique, which assists in predicting network traffic behavior.

According to Liu et al.’s⁴² discussion, the existing problems of low detection rate of the standard IDS framework and lack of scalability are two significant hurdles. The processes were established where standard IDS could not adapt to the dynamic and multidimensional IoT. Therefore, the authors solve the problems with a powered metaheuristic-based IDS that efficiently utilizes Particle Swarm Optimization using PSO amalgamated with these processes. To recognize and explain harmful data density using PSO, the function of the data is exemplified by the complete and conveyed data; in machine learning, it is labeled as OCSVM. The model proposed in this study is checked using the UNSW-NB15 dataset. The experiments showed the model’s accuracy in resolving various harmful or inconceivable data, particularly those minor examples, such as worms, backdoors, and shellcodes. Mehanović et al.⁴³ proposed an IDS by utilizing a Genetic algorithm for FS. This work aims to minimize the feature selection computation time by adapting the technique to a MapReduce framework suitable for cloud computing. SVM and ANN machine learning techniques were also utilized for classification.

Li et al.⁴⁴ presented to utilize of the concept of Artificial Neural Networks (ANN) for IoT security. The targeted area of the proposed framework is to detect possible anomalous behavior in a medical IoT system. Since the input characteristics of an ANN contribute to how much a technique can derive from it therefore the accuracy is improved. As a result, it could be difficult but crucial to identify the most significant and discriminative aspects in network traffic data. Butterfly Optimization Algorithm (BOA) is proposed for FS that can be useful for the learning process of ANN. The proposed system achieved 93.27% accuracy.

Neto et al.⁴⁵ proposed an exhaustive IoT attack dataset to improve IoT operations security. A topology made up of 105 IoT devices was developed to accomplish this. The CICIoT2023 dataset is the result of research. Verma and Chandra⁴⁶ proposed the RepuTE algorithm, which makes it possible to identify Sybil and DoS attacks. This research utilized the ToN-IoT dataset to evaluate the proposed work. Wang et al.⁴⁷ proposed the Deep Learning-based Bi-LSTM technique, which is a lightweight model for detecting IoT attacks. The study made use of the N-BaIoT, CICIoT2023, and CICIDS2017 datasets. The design of IDSs is the focus of multiple recent research works^48,49,50 with a focus to improve the accuracy of the model. Consequently, the ToN_IoT dataset was also utilized to evaluate the algorithm’s confidence. A hybrid deep learning technique utilizing LSTM and one-dimensional CNN was proposed by Yaras and Murat Dener⁵¹.

Various researchers^52,53 proposed hybrid state of the art IDS techniques for efficient detection of threats with high accuracy. The model’s accuracy, precision, recall, and F1 characteristics were used to assess its performance. Table 1 presents the summary of the existing studies conducted on IDS in which experiments are carried out in various networks. This table summarizes the details with reference in terms of the published year, utilize dataset, proposed algorithm(s) for model training and feature selection, performance limitations of the proposed algorithm(s) and obtained accuracy. In addition to this, Table 2 presents specific gaps the proposed model aimed to fill. It presents the clarity regarding this study’s unique contributions compared to existing work in IDS for IoT.

Table 1 Summary of the existing studies on IDS developed for various networks.

Full size table

Table 2 Contribution of proposed model along with the gaps and limitations of existing studies.

Full size table

This paper presents a metaheuristic-based strategy to improve the performance of the IDS, i.e., the EGWO algorithm. In this research, the suggested model for detecting IoT attacks is evaluated using the NF-ToN-IoT dataset as the desired dataset. This research uses an enhanced GWO-based approach to select the necessary attributes. The grey wolf algorithm searches for prey in three stages, which are modeled after hunting behavior that includes searching, encircling, and attacking. Exploration takes place in the first two stages, and exploitation occurs in the final one. One benefit of the GWO algorithms that is evident in a variety of applications is the decreased number of search parameters. This improves the convergence of the technique and improves the effectiveness of the GWO method. The core contribution of this research comprises the following objectives:

1.
To propose an Enhanced Grey Wolf Optimization (EGWO) technique by modifying the position update process in the standard GWO with respect to the omega wolf.
2.
To design an efficient and responsive intrusion detection system for detecting IoT network attacks by utilizing Random Forest.
3.
To test EGWO by incorporating the NF-ToN-IoT dataset based on true positive, true negative, and accuracy.

The remaining paper comprises of the following sections. Section 2 presents the overview framework architecture of IDS, features selection using Random Forest and Standard GWO Algorithm. Section 3 presents the methodology of the developed EGWO technique for IDS in IoT networks. Section 4 provides the results of the suggested approach. Section 5 presents a detailed conclusion and future recommendations.

Materials and methods

An overview of IDS designed specifically for IoT networks is provided at the beginning of this section. It also describes the theoretical foundation of IDS using the FS approach. In addition, it explains FS techniques for IoT network traffic-based intrusion detection. It also reviews the most popular IDS techniques. Furthermore, an explanation is provided for the chosen metaheuristic-based algorithm utilized in the suggested IDS.

Intrusion detection systems (IDS)

When an attacker exerts to access areas of a network that they are not authorized to access, it is known as interference into computer networks. During this procedure, intruders attempt to compromise network services and determine the best opportunity to breach the system. This criminal conduct is not always carried out by malware; network users may be the object of a hacker’s attention and accidentally give the information requested⁵⁵. Online theft or phishing is a well-known term for this type of attack⁵⁶. As seen in Fig. 1, implementing IDS is an essential element of defending against these types of attacks, which firewalls may reinforce to regulate traffic on IoT networks and identify network vulnerabilities.

The IDS in Fig. 1 analyses the network traffic pattern to identify a virus when the hacker creates abnormal activity in the system. IDS’s purpose is to produce a warning and notify the firewall or network supervisor. Typically, an IDS is positioned adjacent to a firewall, and in the event of an attack, an activation request is triggered. One of the leading security issues with information systems is intrusion and attack⁵⁷. Unsurprisingly, there has been a noticeable increase in the frequency of attacks on online users and networking devices in recent years. As Fig. 2 shows⁵⁸, the percentage of high-risk vulnerabilities found in 2022 that affect network and systems. Considering the prevalence of diverse attacks on networks and servers these days. It is required to put extra efforts into developing improved IDS for a range of network types to detect hackers and minimize potential harm⁵⁹.

IDS based on FS adoption

Feature selection is a technique for extracting a subset of significant features (characteristics) from the entire data set by removing unnecessary and duplicated characteristics to construct a suitable learning approach. Furthermore, this procedure can reduce complexities and computational time^60,61. Despite the evident shared characteristics, reducing dimensionality and FS are not identical. To create a classification model, it determines the best subset of relevant features that corresponds to the original set of features and has the lowest error rates⁴³. Figure 3 shows the IDS mechanism design, which is based on feature selection employing a Random Forest classifier. IoT network traffic is considered as the input (NF-ToN-IoT data set) and attack identification. Additionally, deduction and alerts are the ultimate outputs. Furthermore, three important step’s structure the IDS model. These steps include training, FS, and categorization.

The input was used as the training data set in the first stage to create feature positions and evaluate each position. The positions with the fewest characteristics and the highest accuracy rate of classification were chosen. The ultimate result in this stage was the positions of each network crime and regular traffic using such optimized features. Training is considered the initial stage of intrusion detection when a training data set is used to train a random forest. To accomplish more accurate identification in the next stage, Random Forest builds decision trees based on the categorization of the training data set. There are two categories for the traffic data set: anomalous traffic and regular traffic⁶².

Grey Wolf optimization (GWO) algorithm

The hunting behavior of grey wolves is the source of inspiration for the GWO algorithm. It is a revolutionary metaheuristic-based method that mimics both the natural hunting strategy of grey wolves and the structure of dominance. The four types of grey wolves that GWO employed to determine and replicate the leadership structure are alpha, beta, delta, and omega. Grey wolves tend to prefer living in packs. The group size often varies from 5 to 12 members. They have a very rigid and dominating social ranking system, as shown in Fig. 4. This social hierarchy demonstrates feature selection through the GWO Algorithm. Firstly, alpha (α) is the most appropriate solution for a mathematical model of GWO. As a result, the outcomes in second and third place, respectively, are appropriately represented by beta (β) and delta (δ). The omega (Ω) is another projected outcome. Ω wolves track three wolves, while α, β, and δ direct the hunt efficiency based on the GWO algorithm. The primary stage of the hunt occurs when the grey wolf surrounds its target. To mathematically simulate the surrounding behavior, Eqs. (1)–(4) are utilized⁶³. Table 3 presents the symbolic description used in the entire research work.

Table 3 Description of used symbols.

Full size table

D = |C. X_p(t) - X_p(t) | (1)

X(t + 1) = X_p(t) - A.D (2)

Where, “X_p” is the prey’s positioning, “A” and “C” are constants, “t” is the current iteration, and “X” denotes the position of a particular grey wolf in the search space⁶⁴.

A = 2.a.r₁ - a (3)

C = 2. r₂ (4)

Where r₁ and r₂ are the constants. Omega wolves adjust their positions based on positions α, β, and δ in every iteration since α, β, and δ are far more probable to know where prey is located. These adjustments are mathematically shown in Eqs. 5–11.

D_α = |C₁.X_α(t)-X(t)| (5)

D_β= |C₂.X_β(t)-X(t)| (6)

D_δ = |C₃.X_δ(t)- X(t)| (7)

X₁(t + 1) = X_α(t)- A₁. D_α (8)

X₂(t + 1) = X_β(t) - A₂. D_β (9)

X₃(t + 1) = X_δ(t)- A₃. D_δ (10)

X(t + 1) = (X₁ + X₂ + X₃)/3 (11)

The grey wolves figure out their hunt by attempting the target when it ends moving. To represent this mathematically, one can assume that “a*A” is a random variable in the interval [-2a, 2a], where “a” decreases from 2 to 0 over the span of several repetitions. The wolves are forced to assault the victim by |A|<1 (i.e. exploitation). While exploring for prey, the grey wolf pack is forced to stray from their target in the expectation of finding a more suitable meal (exploitation) because of |A|>1. Where, “C” is another element of GWO that encourages investigation. It has a random value in the interval [0, 2]. C > 1 highlights the attack, whereas C < 1 rejects it. In general, (EGWO) has accompanied promising performance, especially in terms of optimization problems. Furthermore, it is relatively easy to implement and apply in various sectors, including engineering, computer science, and mathematics.

Methodology of the proposed IDS Model

A description of the suggested approach’s methodology is given in this section. Figure 5 depicts the overview of the proposed framework. The proposed framework’s steps are covered in further detail in the subsequent sections. The proposed IDS for IoT networks aim to prevent crimes and increase the convenience of on-demand security measures. To identify malicious threats in the network traffic of the IoT network and in all other IoT system nodes, the suggested detection method developed a metaheuristic-inspired technique. The proposed technique improved the standard GWO algorithm. To analyze the suggested EGWO method, the NF-ToN-IoT dataset has been employed. During the data preprocessing stage of IDS, the suggested algorithm is used for testing and training with different values.

Step 1: Preparation of NF-ToN-IoT dataset

This step is crucial to the technique used in data analysis and artificial intelligence. Processing and presenting the data in the right way is the responsibility of data preparation. Particularly when the data includes a variety of information variables and types. The following steps constitute this step.

Preprocessing the NF-ToN-IoT dataset

The key tasks to do before providing data into EGWO are filtering and preprocessing to get optimal performance with improved accuracy. Missing data, categorical features, and class imbalance are among some of the many issues with the used dataset. Irrelevant characteristics could affect how well the proposed EGWO algorithm executes and performs. The present research employed various preprocessing techniques through permutations of multiple strategies for preprocessing and normalizing data.

Substitution of missing values:

Recommendations have been provided for oversampling, under-sampling, and hybrid approaches, along with alternatives to the imbalanced scenario. Replicating the minority class values is known as oversampling. It has been utilized by several scholars^62,65. The disadvantage of this method is that it overfits these values. Some adopt under-sampling, which lowers the score of the leading class. The In the NF-ToN-IoT dataset, missing values frequently occur. It is necessary to handle these missing values appropriately. The value that occurs most often in each feature with missing data is used in the suggested model to replace the value that is missing. Utilizing mean value, a second method involved imputed statistical characteristics.

Numeric conversion of categorical variables: The NF-ToN-IoT dataset contains a variety of categorical attributes. The category attributes need to be given numerical values. To perform this, one-hot encoding has been utilized.
Class-imbalance: For balancing the classes in the applied dataset, the Synthetic Minority Oversampling Technique (SMOTE) has been utilized. The distributions with class imbalances afflict the ToN-IoT dataset. issue with this approach is that it might only be possible to adequately represent the class accurately by including some of the features that have been discarded. A hybrid approach is employed, which removes some majority class points and duplicates minority class points. By offering synthetic minority class instances, SMOTE improves basic oversampling at random by resolving the overfitting issue that could result from this method. Instead of copying already available data points, SMOTE produces new data instances. To create additional minority data points, two comparable minority instances are combined linearly⁶⁶.
Remove irrelevant data: Since they could lead to overfitting, several characteristics, including time stamps, IP addresses, source-port, and destination-port, are also eliminated from the NF-ToN-IoT dataset.

Step 2: feature selection

The most effective subset of features is selected through the feature selection step once the dataset has been prepared. The GWO technique is modified for this research to choose the best possible set of features. Step 2 is explained in greater clarity in the following subsections:

Subset generation: preprocessing the NF-ToN-IoT dataset

Subset generation is a heuristic search strategy where each sample in the search space specifies a potential subset assessment alternative. A subset of characteristics is created for this investigation using the random subset creation technique⁶⁷. The solution was generated using Eq. 12 during the initialization process.

P_{(i, j)} = p_{(i, j)}^min+ ∆_{(i, j)}. (p_{(i, j)}^max- p_{(i, j)}^min) (12)

where P_{(i, j)} represents the dimension of the matrix created during the initialization stage, p_{(i, j)}^max and p_{(i, j)}^min denote the matrix’s lower and upper limits, and the values of the parameters i and j range from 1 to N and 1 to D, respectively. where D denotes the solution’s dimensionality in the matrix and N denotes the number of alternative responses.

Enhanced Grey Wolf Optimization (EGWO) algorithm

To decrease the impact of the best strategies, the wolves in EGWO continually shift positions in accordance with the four best options i.e. X_α, X_β, X_δ and X_Ω. EGWO took advantage of the split method to move the grey wolves to a specific location in the search space based on X_α, X_β, X_δ and X_Ω in EGWO for carrying out FS. The flow chart for the suggested EGWO algorithm with FS is shown in Fig. 6.

According to Eq. (13), a straightforward technique of random probability distribution split has been utilized per dimension to split Xα, Xβ and Xδ outcomes in case of standard GWO.

$$\:\mathbf{X}\text{d}=\left\{\:\begin{array}{c}X(\alpha\:,d),\:\:if\:rand<1/3\\\:X(\beta\:,d),\:\:if\frac{1}{3}\le\:rand<2/3\\\:X(\delta\:,d),\:\:otherwise\end{array}\right.$$

(13)

Where, “d” represents dimension. For FS problem, the EGWO is unrestricted to the standard GWO. But in EGWO, the split strategy was used to determine the subsequent position by comparing i.e. X_α, X_β, X_δ and X_Ω. By including the omega wolf to assist in shifting the positions of the grey wolves, it is inspired from the unique version of standard GWO. The effect rate of the decision made by any wolf decreased from 0.25 to 0.17 because of an attempt to lower the impact rate of any of the most effective options by increasing the number of solutions involved in the decision-making process. The details of this are given in Eqs. (14)–(20).

X_i^t+1= split (x₁, x₂, x₃, x₄) (14)

Where, x₁, x₂, x₃, and x₄ correspond to the implications of the wolf move on the best four candidate solutions that are alpha, beta, delta, and omega, respectively. The mathematical representation of x₁, x₂, x₃, and x₄ is shown in Eqs. (15–18).

$$\:{x}_{1}=\left\{\:\begin{array}{c}\:\:1,\:\:if\:\left(\mathbf{X}(\alpha\:,d)+\text{s}\text{t}\text{e}\text{p}\text{b}(\alpha\:,d)\right)\ge\:1\:\:\\\:0,\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise\end{array}\right.$$

(15)

$$\:{x}_{2}=\left\{\:\begin{array}{c}1,\:if\:\left(\mathbf{X}(\beta\:,d)+\text{s}\text{t}\text{e}\text{p}\text{b}(\beta\:,d)\right)\ge\:1\\\:0,\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise\end{array}\right.\:$$

(16)

$$\:{x}_{3}=\left\{\:\begin{array}{c}1,\:\:if\:\left(\mathbf{X}({\updelta\:},d)+\text{s}\text{t}\text{e}\text{p}\text{b}({\updelta\:},d)\right)\ge\:1\\\:0,\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise\end{array}\right.\:$$

(17)

$$\:{x}_{4}=\left\{\:\begin{array}{c}1,\:\:if\:\left(\mathbf{X}({\Omega\:},d)+\text{s}\text{t}\text{e}\text{p}\text{b}({\Omega\:},d)\right)\ge\:1\\\:0,\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise\end{array}\right.\:$$

(18)

Where, “b” is constant; its value varies between 0 and 2, “Stepb(x), d” represents binary steps in dimension “d” and rand ∈ [0,1]. The mathematical representation of Stepb(x), d is shown in Eq. 20.

$$\:\text{S}\text{t}\text{e}\text{p}\text{b}\left(\text{x}\right),\text{d}\:=\left\{\:\begin{array}{c}1,\:\:\:\:\:\:\:\:\:\:\:if\:s\left(\text{x}\right),d\ge\:rand\\\:0,\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:otherwise\end{array}\right.$$

(19)

Where, s$\:\left(\text{x}\right),\text{d}$ is mathematically represented as a sigmoidal function, as shown in Eq. 21.

$$\:s\left(x\right)=\frac{1}{1{+e}^{-x}}\:$$

(20)

To produce the crossover w₁, w₂, w₃, and w₄ improvements, a straightforward technique of random probability distribution crossover has been employed per dimension, as shown in Eq. (21).

$$\:\text{X}d=\left\{\:\begin{array}{c}X\left(\alpha\:,d\right),\:\:\:if\:rand<1/4\\\:X(\beta\:,d),\:\:if\frac{1}{4}\le\:rand<2/4\\\:X(\delta\:,d),\:\:if\frac{2}{4}\le\:rand<3/4\\\:X(\varOmega\:,d),\:\:otherwise\end{array}\right.$$

(21)

Pseudocode of EGWO
Step 1: Parameter Initialization 1.1. Maximum iterations ‘N’ 1.2. Number of wolves in pack ‘w_i’ 1.3. Dimensions in the search space ‘d’ Step 2: Calculate the fitness of each grey wolf
Step 3: Identify the best four wolves (X_α, X_β, X_δ and X_Ω) based on fitness value
Step 4: While (i < N) For each wolf “w_i” in the pack do: For each dimension “d” do: Generate a random probability range 0–1 Utilize Split method for updating position (by using Eq. 13) If rand < 1/3: X(d) = X(α,d) Else If 1/3 ≤ rand < 2/3: X(d) = X(β,d) Else: X(d) = X(δ,d) # Crossover with 4 wolves (Eq. 21) If rand < 1/4: X(d) = X(α,d) Else If 1/4 ≤ rand < 2/4: X(d) = X(β,d) Else If 2/4 ≤ rand < 3/4: X(d) = X(δ,d) Else: X(d) = X(Ω,d) Update position using Eq. 14: X_i^t+1= split (x₁, x₂, x₃, x₄) Calculate impact using conditions for x1, x2, x3, x4 by using Eqs. 15–18 Apply crossover Adaptive parameter tuning Calculate new position and fitness values Update X_α, X_β, X_δ and X_Ω
Step 5: Return the best solution “X_α”

Instead of considering the three wolves’ positions, the suggested algorithm showed an evolving solution iteration (next position change) process based on the positions of the four best wolves. Additionally, the EGWO used a different decision impact rate computation, which had a significant effect on the search performance of the algorithm. Ultimately, even though the suggested technique’s new position update method increased its processing time in general, it did so at the expense of higher algorithmic performance. The impact of all leader wolves is reduced from 0.25 to 0.17 by incorporating the omega wolf in updating the position.

This results in a more balanced influence across the search agents, preventing early convergence to suboptimal solutions and allowing for better exploration of the solution space. Additionally, the probabilistic split decision for position updates reduces the risk of over-relying on the current best solutions, thus maintaining diversity in the search space. This theoretical adjustment enhances the exploration capabilities of the algorithm, making it less likely to get trapped in local optima early in the process.

Improved Exploration and Exploitation

The Enhanced Grey Wolf Optimization (EGWO) algorithm modified the omega wolf position update which affects the trade-off between exploration and exploitation behavior, so this cause enhancement of accuracy detection for Intrusion Detection System (IDS). In EGWO, the level of omega wolf was adjusted in respect to alpha, beta and delta wolves so that it reduces the decision-making impact of any single wolf since then number of multiple wolves is involved. In the calculation this is represented algorithmically with a split strategy and an additional omega wolf’s position. The traditional GWO employs a pack of three leader wolves (alpha, beta and delta) to direct the exploration-exploitation process. This is achieved by EGWO that introduces the omega wolf, which modifies Eq. 3 for position update equations and reduces the effect of top wolves from 0.25 to 0.17 as: This change is then reflected in the mathematics of equations that share results between those four wolves, allowing them to explore a wider space during search and reduce the chance of premature convergence on suboptimal solutions. The net effect of this change is a more able algorithm to explore various promising solutions. These changes improve the tradeoff between exploration and exploitation as follows:

Better exploration

Using four wolves to search instead of three increases the width of spacing, hence reducing chances that we get trapped in a local minimum and help reach more optimal solutions early.

Balanced convergence

Reduced impact of the individual wolf which allows solution improvements throughout exploitation. The algorithm can afford more balanced convergence (i.e. avoid premature convergence) towards global optimum.

That has a huge direct effect on the accuracy of the IDS. This can be attributed to the fact that, EGWO-based feature selection method selects better subset of features and accordingly Random Forest classifier performs best. The convergence performance of the EGWO algorithm demonstrates optimal convergence than that of standard GWO which implies improved classification accuracy as illustrated in this research.

Improved scalability

A new perspective introduced on the EGWO algorithm and coupled with verifying scalability properties in detecting intrusions over IoT networks by utilizing a Random Forest-based mechanism. Moreover, the proposed EGWO algorithm using Random Forest based approach for intrusion detection in IoT networks has demonstrated good scalability behaviour as well. Convergence graphs of the EGWO algorithm show that this method gives optimal solutions in a shorter time compared to traditional GWO, moreover when we consider it after 10th iteration. So, this faster convergence will help the algorithm to manage a greater number of input data points without significantly increasing processing time.

Figure 7 presents the exploration and exploitation graph over iterations of EGWO algorithm. This graph presents the effectiveness of the proposed algorithm by representing the balance between the exploration and exploitation normalized values. It shows how exploration values start decreasing after the exploitation value increased, making sure the balance in this search process while training the proposed model with iterations. This balance is fundamental to obtain improved performance in feature selection process of IDS and allows the EGWO algorithm to converge effectively on optimized solutions.

Evaluation of the generated subset

The objective function is the fundamental part of EGWO and is used for evaluating the fact that the generated subset satisfies the objectives. Researchers have utilized both accuracy and feature size as significant metrics to assess each feature subset and determine the fitness function. The efficiency of IDSs is determined by their ability to identify attacks and detect them accurately. A critical component in identifying breaches in IoT network security is the accuracy of classification, along with the quantity of features. The model makes use of the fitness function shown in Eq. 22:

$$\:Objective\:Function={C}_{1}.\:A+\frac{{C}_{2}}{\text{N}\text{F}}$$

(22)

Where “A” is Accuracy and “NF” is a number of features. There is a list of features in each feature subset. In a scenario when two subsets with differing numbers of characteristics meet the same Accuracy, the subset with fewer features will be chosen. Furthermore, the values of C₁ and C₂ in Eq. (22) can be independently determined. To do this, multiplying C₁ by A and C₂ by the inverse NF is the prerequisite for completion. In addition, accuracy is the primary consideration when it comes to the quantity of selected characteristics, and as a result, C₁ is given more weight than C₂. Thus, under all circumstances, C₁ should be greater than C₂, ensuring that the priority of accuracy. The performance of IDS is dependent on the proposed objective function. This balanced selection of the fitness function enables EGWO algorithm to dynamically choose those features which can help in accurate classification, yet diminishing risk on over fitting.

Step 3: IDS Stage

The selected features from Stage 2 are evaluated in this step using the Random Forest (RF) algorithm. RF is a machine-learning approach that combines random branch splitting and random node resampling to create decision trees. The various trees cast their votes to determine the ultimate categorization outcome⁶⁸. The RF is an effective collective classifier that trains several decision trees by combining feature randomness with bagging. One standard classification method is the decision tree. To categorize the data, it attempts to learn a series of if-then conditions. RF algorithm in the proposed work handle the inherent randomness in decision tree construction, by using Random Initialization Control, Bootstrap Aggregating and Balanced Class Weights mechanisms to ensure consistent model performance across different iterations and data subsets. The constant parameter of “random_state” ensures the generation of consistent decision trees when the model reruns with same parameters and datasets. Additionally, bootstrapping mechanism keeps track of the diverse decision boundaries to deal with randomness. The parameters of “class_weight” are used to deal with the unbalanced IoT datasets.

It is evident from Fig. 5 that the classifier will receive the output of stage 2. Furthermore, the dataset was divided into two subsets: a testing set (P_test, Q_test) and a training set (P_train, Q_train). In the training set, Q_test denotes the class label, while P_train represents the features. Whereas the features in the testing set constitute P_test, the class label for the features in the testing set is included in Q_test. P_train and Q_test will be used to train the machine learning RF classifier, and P_test will subsequently be utilized as input into the model. P₁, Q₂, and Q stand for the feature set and the class label, respectively. Subsequently, the model’s output will be investigated against the Q_test; if the two match, the classifier accurately identified the dataset record’s behavior.

Overfitting mitigation strategies

To make the proposed model more robust and generalizable a number of strategies are applied to avoid overfitting. Complex models can easily go through overfitting issues; it becomes even more relevant when fitting high dimensional data such as NF-ToN-IoT. The following are the methods, used to avoid Overfitting and validate the proposed model:

Cross-validation

k-fold cross-validation has chosen to validate the model over different splits of data, hence making sure improved generalization and saves from training cycle bias (if any). This methodology is a better guarantee against the performance of models for unknown data because it calculates various training-validation partitions so that each model will be trained and validated several times.

Regularization

Random Forest algorithm resistant to overfitting due ensemble averaging process but regularization was used to keep a controlled model complexity. It puts a bound on the depth of trees and specifies the minimum samples splits as well as other conditions in leaf level. These parameters helped in limiting the size of individual trees, reduce overfitting (learns noise) fit from training data.

Random Forest’s Ensemble Approach

The ensemble method of the Random Forest algorithm makes itself a reliable to prevent from overfitting. Normal Random Forests reduce variance by building multiple decision trees on random samples and averaging their forecasts.

Comparative analysis based on performance

To determine the extent of the improvements that these overfitting mitigation measures have attained, a comparative analysis of the different model performance measures including accuracy, precision, recall and F1 scores has established with and without the mitigation strategies. The results in Table 4 presented show that through the application of the proposed strategies, the model robustness and stability were improved, and the model was more balanced in terms of complexity and generalizability.

Table 4 Values of the parameters used in Random Forest Algorithm.

Full size table

Results

The evaluation characteristics and test findings are provided in this section. The IDS dataset of NF-ToN-IoT has been used for the performance evaluation of the proposed technique. The main types of targeted attacks used for both 80% training and 20% testing datasets are shown in Fig. 8. The proposed parameters for simulations by using the NF-ToN-IoT dataset are shown in Table 5.

Table 5 Parameters values for simulations.

Full size table

Using the Jupyter Notebook platform, Pandas, which offers the ability to write in Python on Numfocus, was utilized in the research. The model was trained and tested employing the following configuration on a computer:

11th Gen Intel(R) Core (TM) i3-1115G4 @ 3.00 GHz–2.90 GHz.
8 GB RAM.
64-bit operating system, x64-based processor.
Windows 11 Home Single Language.
256 GB SSD.

EGWO selected 23 features from the NF-ToN-IoT dataset randomly. The selected features are shown in Table 6. In Table 6, ‘ID,’ ‘SF,’ and ‘DT’ represent the feature’s id, selected feature, and the data type of the selected feature, respectively. Figure 9 demonstrates the improved convergence of the proposed EGWO algorithm while selecting features over 20 iterations. It shows the superiority of EGWO over the standard GWO algorithm by plotting the fitness values against 20 iterations. The lower values of the objective function’s fitness represent better results, i.e., optimized fitness. EGWO starts converging more quickly after the 10th iteration. On the contrary, the standard GWO algorithm’s convergence is slower.

Table 6 Selected features by utilizing EGWO Algorithm.

Full size table

The feature engineering process to concentrate on identification of attributes that directly affect intrusion detection classification performance in IoT networks. The algorithm selected some features (F₀-F₂₂) as shown in Table 6 because they strongly described singular data which indicated possible intrusions. The selected features by the improved model increase the predictability of IDS by helping differentiate between legitimate and intrusive network activity, reducing false positives with higher accuracy.

The relationship between the accuracy score achieved during the FS process, benefiting from the EGWO algorithm and fitness values, is shown in Fig. 10. The graphical illustration shows how the IDS accuracy is affected by the optimization method. The curve in Fig. 10 shows that while the optimization proceeds and the value of the fitness converges, i.e., reduces, the accuracy score becomes much better. As the objective function’s value gets closer to its lowest, the accuracy score eventually stabilizes at 99.93%, proving that the EGWO algorithm has successfully found the optimal feature selection to maximize the IDS accuracy.

After optimized feature selection by utilizing EGWO, the RF algorithm has been developed using the scikit-learn package. This classification algorithm is used for evaluating the performance of the proposed IDS with optimized feature selection. Table 4 represents the parameter setting for the Random Forest Algorithm. Accuracy, precision, recall and F1 score are the evaluation measures used to evaluate the effectiveness of the classification algorithm. These measures are computed by using the four parameters named: true-positive (T_P), false-negative (F_N), false-positive (F_P), and true-negative (T_N). The number of events that have been accurately classified as normal is known as TP. The FN is the number of events that incorrectly identify normal data as an intrusion, whereas the FP is the number of malicious occurrences that are incorrectly identified as normal. The number of events that are accurately identified as malicious is represented by TN. Equations (23)–(26) have been employed to generate each of these performance evaluation measures. Table 7 presents performance metrics with and without Mitigation strategies. The results of the performance metrics of the classification RF algorithm are presented in Table 8.

$$\:Accuracy=\frac{{T}_{p}+{T}_{N}}{{T}_{p}+{T}_{N}+{F}_{p}+{F}_{N}}$$

(23)

$$\:Precision=\frac{{T}_{p}}{{T}_{p}+{F}_{p}}$$

(24)

$$\:Recall=\frac{{T}_{p}}{{T}_{p}+{F}_{N}}$$

(25)

$$\:F1\:Score=\frac{2*Precision*Recall}{Precision*Recall}$$

(26)

Table 7 Performance metrics with and without mitigation strategies.

Full size table

A high accuracy in IDS means the model is successfully identifying all attacks (intrusions) and normal behaviour. But in case if the data is imbalanced, while using accuracy as the proposed IDS metric would be surprisingly misleading, because it will bias heavily to the majority class. Another metric needs to be consider with IDS is Precision, which determines how reliable the model can distinguish false alarms. The higher the precision value, fewer flagged events are legitimate intrusions but also there is less likelihood of false positives. A trade-off also exists between precision and recall. To minimise false positives, a model that prioritizes accuracy can ignore some threats. In addition, Recall is important in an IDS to make sure that all or most known intrusions are detected. This approach maximizes recall i.e. misses few malicious activities. By achieving a higher recall, the model will be more aggressive in detection what leads to accumulating false positives. IDS needs a balanced F1-score as the true detection of intrusions at minimal false alarm indicates good trade-off. It is extremely helpful with imbalanced dataset. Rather it prevents the bias and count every precision with an equal weight as recall, which prevents IDS system to be improved in detection but on false alarm rate.

Table 8 Performance results IDS developed by using EGWO and RF algorithms.

Full size table

A comparison of various existing studies that are conducted by utilizing the ToN-IoT dataset to the proposed IDS is presented in Table 9. This comparison is presented based on the algorithm used for Feature selection, a technique used for intrusion detection, the number of selected features, and the obtained accuracy.

Table 9 Comparison of existing studies (utilizing ToN-IoT dataset) based on accuracy.

Full size table

Formal and Informal security analyses are also performed. For formal security analysis ProVerif tool is used. This tool facilitates verifying the confidentiality and authenticity properties. Table 10 contains the generated output from formal analyses. Furthermore, Table 11 presents the results of informal analysis. These results are summarized in the table illustrating the detection of the IDS model against security threats.

Table 10 Output of formal analysis by using ProVerif tool.

Full size table

Table 11 Output of informal analysis.

Full size table

The results in Table 10 show that the confidentiality of key and data in Query 1 has been successfully achieved. In addition, Query 2 found potential lack in data confidentiality and Query 3 ensure high authenticity of the protocol.

Interpretability of recommendations

For an IDS to be practical and useful in exercising practical real-life applications, there is need for understanding the model used and reasons for arriving at given decisions. In this section, different methods regarding model interpretability are described to enhance the understanding of practitioners studying the model’s decision paths and making its suggestions more comprehensible.

Feature selection analysis

Random Forest used in the proposed model provides the results of selected feature importance. The degree of the feature’s contribution to IDS model decisions. In Table 4, F₀-F₂₂ are the selected features that have relatively high importance because they directly impact the possibility of distinguishing intrusive network activity from the non-intrusive one. This way, based on this ranking of features, the network administrators get to know this list of attributes together with the threats that they identify for the purpose of monitoring and defense enhancements.

Decision path analysis

Random Forest state includes few decision trees where every tree is offering the path of decision depending on the features of the threshold values. From such paths, one can distinguish how decisions lead to certain classifications and, hence making comparisons.

Recommendation interpretations for actionable insights

The proposed IDS is not only able to detect threats but also offers recommendations based on the features analysis of situations. For instance, when the IDS alerts high tcp_win_max_out, and sign of anomalous activity, the IDS suggests capping TCP window sizes for malicious IPs. The recommendations provided in this paper are based on the IDS analysis of its most valuable functions, so they pertain to the identified threats directly.

Implementing all these techniques makes the IDS model an effective tool in that it provides understandable and constructive instructions in help with the proactive securing of the network. These interpretability elements enable users to believe, and act based on the IDS’s results that further increases operational value and applicability in diverse IoT environments.

Conclusion and future work

The purpose of this research is to evaluate how well an enhanced GWO algorithm can boost the IDS’s performance. This investigation employs the suggested EGWO technique to select an ideal subset of features with high accuracy in classification using the FS procedure. Furthermore, 20% of the NF-ToN-IoT dataset is utilized to evaluate the effectiveness of the suggested methodology. This study’s analysis method is predicated on data separation. This method is essential for assessing the effectiveness of the IDS system and demonstrating its performance by applying it to the test against different types of IoT network challenges. Additionally, research on the suggested method produced high classification accuracy using an ideal subset of data under various attack scenarios. Furthermore, the proposed approach’s accuracy has been verified by comparison with more contemporary methodologies, demonstrating superior accuracy. Despite achieving high accuracy the proposed approach may degrade in case of detections while the environment requires high scalability. Furthermore, the proposed EGWO algorithm introduces some computational complexities and overhead issues; mainly because of the alterations made over the standard GWO. The randomness in EGWO is designed via probabilistic split distribution in all dimensions, with an intention of enhancing exploration capabilities and avoiding premature convergence with inferior solutions. Nevertheless, the random characteristic generated by this approach implies extra position reconsiderations within the search space, which can increase computational complexity in case of highly scalable environment. Even though EGWO optimizes its scalability and convergence rate, the dependence on more features or higher data dimensionality in each iteration will lead to higher memory use and longer processing time.

In future work, the proposed metaheuristic-based technique of EGWO can be further improved to a more scalable solution by integrating it with the Particle Swarm Optimization Algorithm. This integration will facilitate a more optimal selection of features in terms of achieving improved convergence with scalability. The integration will support the feature selection granularity by utilizing PSO’s velocity and position selection mechanism. Additionally, at the same time EGWO’s adaptive learning mechanism improves precision.

Data availability

All the data is available within the document. No external data is used for findings.

References

Mohanty, J. et al. IoT security, challenges, and solutions: a review. Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2, : pp. 493–504. (2021).
Yaacoub, J. P. A. et al. Advanced digital forensics and anti-digital forensics for IoT systems: techniques, limitations and recommendations. Internet Things. 19, 100544 (2022).
Article MATH Google Scholar
Ahmad, R. et al. Zero-day attack detection: a systematic literature review. Artif. Intell. Rev. 56 (10), 10733–10811 (2023).
Article MATH Google Scholar
Elrawy, M. F., Awad, A. I. & Hamed, H. F. Intrusion detection systems for IoT-based smart environments: a survey. J. Cloud Comput. 7 (1), 1–20 (2018).
Article MATH Google Scholar
Martins, I. et al. Host-based IDS: a review and open issues of an anomaly detection system in IoT. Future Generation Comput. Syst. 133, 95–113 (2022).
Article MATH Google Scholar
Bivens, A. et al. Network-based intrusion detection using neural networks. Intell. Eng. Syst. Through Artif. Neural Networks. 12 (1), 579–584 (2002).
MATH Google Scholar
Samrin, R. & Vasumathi, D. Review on anomaly based network intrusion detection system. in 2017 international conference on electrical, electronics, communication, computer, and optimization techniques (ICEECCOT). IEEE. (2017).
Almaraz-Rivera, J. G., Perez-Diaz, J. A. & Cantoral-Ceballos, J. A. Transport and application layer DDoS attacks detection to IoT devices by using machine learning and deep learning models. Sensors 22 (9), 3367 (2022).
Article ADS PubMed PubMed Central MATH Google Scholar
Guigou, F., Collet, P. & Parrend, P. SCHEDA: lightweight euclidean-like heuristics for anomaly detection in periodic time series. Appl. Soft Comput. 82, 105594 (2019).
Article Google Scholar
Salman, T. & Jain, R. Networking protocols and standards for internet of things. Internet of things and data analytics handbook, : pp. 215–238. (2017).
Selvarajan, S. et al. Integrated probability relevancy classification (IPRC) for IDS in SCADA’, design framework for wireless network. Lect Notes Netw. Syst. 82 (1), 41–64 (2019).
Google Scholar
Halimaa, A. & Sundarakantham, K. Machine learning based intrusion detection system. in 2019 3rd International conference on trends in electronics and informatics (ICOEI). IEEE. (2019).
Nimbalkar, P. & Kshirsagar, D. Feature selection for intrusion detection system in internet-of-things (IoT). ICT Express. 7 (2), 177–181 (2021).
Article MATH Google Scholar
Parimala, G. & Kayalvizhi, R. An effective intrusion detection system for securing IoT using feature selection and deep learning. in 2021 international conference on computer communication and informatics (ICCCI). IEEE. (2021).
Syed, D. et al. A comparative analysis of metaheuristic techniques for high availability systems (September 2023). IEEE Access 12, 7382–7398, (2024). 10.1109/ACCESS.2024.3352078
Syed, D., Shaikh, G. M. & Rizvi, S. Systematic review: particle swarm optimization (PSO) based load balancing for Cloud Computing. Sir Syed Univ. Res. J. Eng. Technol. 14 (1), 86–94 (2024).
Article MATH Google Scholar
Rabie, O. B. J. et al. A novel IoT intrusion detection framework using decisive red Fox optimization and descriptive back propagated radial basis function models. Sci. Rep. 14 (1), 386 (2024).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Elmagzoub, M. et al. A survey of swarm intelligence based load balancing techniques in cloud computing environment. Electronics 10 (21), 2718 (2021).
Article MATH Google Scholar
Al Reshan, M. S. et al. A fast converging and globally optimized approach for load balancing in cloud computing. IEEE Access. 11, 11390–11404 (2023).
Article MATH Google Scholar
Syed, D., Muhammad, G. & Rizvi, S. Systematic Review: Load Balancing in Cloud Computing by Using Metaheuristic Based Dynamic Algorithms39 (Intelligent Automation & Soft Computing, 2024). 3.
Halim, Z. et al. An effective genetic algorithm-based feature selection method for intrusion detection systems. Computers Secur. 110, 102448 (2021).
Article MATH Google Scholar
Prashanth, S. et al. Optimal feature selection based on evolutionary algorithm for intrusion detection. SN Comput. Sci. 3 (6), 439 (2022).
Article MATH Google Scholar
Almomani, A. et al. Metaheuristic algorithms-based feature selection approach for intrusion detection, in Machine Learning for Computer and Cyber Security. CRC. 184–208. (2019).
Acharya, N. & Singh, S. An IWD-based feature selection method for intrusion detection system. Soft. Comput. 22, 4407–4416 (2018).
Article MATH Google Scholar
Jain, A., Verma, B. & Rana, J. Anomaly intrusion detection techniques: a brief review. Int. J. Sci. Eng. Res. 5 (7), 1372–1383 (2014).
MATH Google Scholar
Jose, S. et al. A survey on anomaly based host intrusion detection system. in Journal of Physics: Conference Series. IOP Publishing. (2018).
Pattawaro, A. & Polprasert, C. Anomaly-based network intrusion detection system through feature selection and hybrid machine learning technique. in 2018 16th International Conference on ICT and Knowledge Engineering (ICT&KE). IEEE. (2018).
Pandey, S. K. Design and performance analysis of various feature selection methods for anomaly-based techniques in intrusion detection system. Secur. Priv. 2 (1), e56 (2019).
Article MATH Google Scholar
Ferrag, M. A. et al. Deep learning-based intrusion detection for distributed denial of service attack in agriculture 4.0. Electronics 10 (11), 1257 (2021).
Article MATH Google Scholar
Kumar, P. et al. Sad-iot: security analysis of ddos attacks in iot networks. Wireless Pers. Commun. 122 (1), 87–108 (2022).
Article ADS MATH Google Scholar
Shitharth, S. et al. Intelligent Intrusion Detection Algorithm Based on multi-attack for edge-assisted Internet of Things, in Security and Risk Analysis for Intelligent Edge Computingp. 119–135 (Springer, 2023).
Hajisalem, V. & Babaie, S. A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection. Comput. Netw. 136, 37–50 (2018).
Article MATH Google Scholar
Kumar, P., Gupta, G. P. & Tripathi, R. Toward design of an intelligent cyber attack detection system using hybrid feature reduced approach for iot networks. Arab. J. Sci. Eng. 46 (4), 3749–3778 (2021).
Article MATH Google Scholar
Fiore, U. et al. Network anomaly detection with the restricted Boltzmann machine. Neurocomputing 122, 13–23 (2013).
Article MATH Google Scholar
Alom, M. Z. & Bontupalli, V. and T.M. Taha. Intrusion detection using deep belief networks. in 2015 National Aerospace and Electronics Conference (NAECON). IEEE. (2015).
Verma, A. & Ranga, V. Machine learning based intrusion detection systems for IoT applications. Wireless Pers. Commun. 111 (4), 2287–2310 (2020).
Article MATH Google Scholar
Liu, Z. et al. Anomaly detection on iot network intrusion using machine learning. in 2020 International conference on artificial intelligence, big data, computing and data communication systems (icABCD). IEEE. (2020).
Moustafa, N., Turnbull, B. & Choo, K. K. R. An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of internet of things. IEEE Internet Things J. 6 (3), 4815–4830 (2018).
Article MATH Google Scholar
Shitharth, S. et al. IDS Detection Based on Optimization Based on WI-CS and GNN Algorithm in SCADA Network. In: Das, S.K., Samanta, S., Dey, N., Patel, B.S., Hassanien, A.E. (eds) Architectural Wireless Networks Solutions and Security Issues. Lecture Notes in Networks and Systems, vol 196, (2021). Springer, Singapore. https://doi.org/10.1007/978-981-16-0386-0_14.
Zhang, Y., Li, P. & Wang, X. Intrusion detection for IoT based on improved genetic algorithm and deep belief network. IEEE Access. 7, 31711–31722 (2019).
Article Google Scholar
Dwivedi, S. et al. Implementation of adaptive scheme in evolutionary technique for anomaly-based intrusion detection. Evol. Intel. 13 (1), 103–117 (2020).
Article MATH Google Scholar
Liu, J. et al. Research on intrusion detection based on particle swarm optimization in IoT. IEEE Access. 9, 38254–38268 (2021).
Article Google Scholar
Mehanović, D. et al. Feature selection using cloud-based parallel genetic algorithm for intrusion detection data classification. Neural Comput. Appl. 33, 11861–11873 (2021).
Article Google Scholar
Li, Y., Ghoreishi, S. & Issakhov, A. Improving the accuracy of network intrusion detection system in medical IoT systems through butterfly optimization algorithm. Wireless Pers. Commun. 126 (3), 1999–2017 (2022).
Article MATH Google Scholar
Neto, E. C. P. et al. CICIoT2023: a real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors 23 (13), 5941 (2023).
Article ADS PubMed PubMed Central MATH Google Scholar
Verma, R. & Chandra, S. RepuTE: a soft voting ensemble learning framework for reputation-based attack detection in fog-IoT milieu. Eng. Appl. Artif. Intell. 118, 105670 (2023).
Article MATH Google Scholar
Wang, Z. et al. A lightweight intrusion detection method for IoT based on deep learning and dynamic quantization. PeerJ Comput. Sci. 9, e1569 (2023).
Article PubMed PubMed Central Google Scholar
Qu, Z. et al. Localization of dummy data injection attacks in power systems considering incomplete topological information: a spatio-temporal graph wavelet convolutional neural network approach. Appl. Energy. 360, 122736 (2024).
Article Google Scholar
Li, Y. et al. Detection of false data injection attacks in smart grid: a secure federated deep learning approach. IEEE Trans. Smart Grid. 13 (6), 4862–4872 (2022).
Article MATH Google Scholar
Qu, Z. et al. Active and passive hybrid detection method for power CPS false data injection attacks with improved AKF and GRU-CNN. IET Renew. Power Gener. 16 (7), 1490–1508 (2022).
Article MATH Google Scholar
Yaras, S. & Dener, M. IoT-Based intrusion detection system using New Hybrid Deep Learning Algorithm. Electronics 13 (6), 1053 (2024).
Article MATH Google Scholar
Sangeetha, K., Shitharth, S. & Mohammed, G. B. Enhanced SCADA IDS security by using MSOM hybrid unsupervised algorithm. Int. J. Web-Based Learn. Teach. Technol. (IJWLTT). 17 (2), 1–9 (2022).
Article Google Scholar
Almomani, O. A Hybrid Model Using Bio-Inspired Metaheuristic Algorithms for Network Intrusion Detection System68 (Computers, Materials & Continua, 2021). 1.
Nazir, A. & Khan, R. A. A novel combinatorial optimization based feature selection method for network intrusion detection. Computers Secur. 102, 102164 (2021).
Article MATH Google Scholar
Khraisat, A. et al. Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2 (1), 1–22 (2019).
Article Google Scholar
Massa, D. & Valverde, R. A fraud detection system based on anomaly intrusion detection systems for e-commerce applications. Comput. Inform. Sci. 7 (2), 117–140 (2014).
Article MATH Google Scholar
Kizza, J. M. System Intrusion Detection and Prevention, in Guide to Computer Network Securityp. 295–323 (Springer, 2024).
Synopsys. Software Vulnerability Snapshot., (2023). https://www.synopsys.com/software-integrity/resources/analyst-reports.html., Editor.
Kamronbek, Y. et al. Time Series Mean Normalization for Enhanced Feature Extraction in In-Vehicle Network Intrusion Detection System. in International Conference on Broadband and Wireless Computing, Communication and Applications. Springer. (2023).
Mohy-Eddine, M. et al. An efficient network intrusion detection model for IoT security using K-NN classifier and feature selection. Multimedia Tools Appl. 82 (15), 23615–23633 (2023).
Article Google Scholar
Gharaee, H. & Hosseinvand, H. A new feature selection IDS based on genetic algorithm and SVM. in 2016 8th International Symposium on Telecommunications (IST). IEEE. (2016).
Wali, S. & Khan, I. Explainable AI and Random Forest Based Reliable Intrusion Detection System (Authorea Preprints, 2023).
Rezaei, H., Bozorg-Haddad, O. & Chu, X. Grey wolf optimization (GWO) algorithm. Advanced optimization by nature-inspired algorithms, : pp. 81–91. (2018).
Selvarajan, S. A comprehensive study on modern optimization techniques for engineering applications. Artif. Intell. Rev. 57 (8), 194 (2024).
Article MATH Google Scholar
Talukder, M. A. et al. Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction. J. big data. 11 (1), 33 (2024).
Article MATH Google Scholar
Bagui, S. & Li, K. Resampling imbalanced data for network intrusion detection datasets. J. Big Data. 8 (1), 6 (2021).
Article MATH Google Scholar
Kim, D. S. et al. Fusions of GA and SVM for anomaly detection in intrusion detection system. in Advances in Neural Networks–ISNN 2005: Second International Symposium on Neural Networks, Chongqing, China, May 30-June 1, 2005, Proceedings, Part III 2. Springer. (2005).
Dey, P. & Bhakta, D. A new random forest and support vector machine-based intrusion detection model in networks. Natl. Acad. Sci. Lett. 46 (5), 471–477 (2023).
Article MATH Google Scholar
Kumar, P., Gupta, G. P. & Tripathi, R. TP2SF: a trustworthy privacy-preserving secured Framework for sustainable smart cities by leveraging blockchain and machine learning. J. Syst. Architect. 115, 101954 (2021).
Article MATH Google Scholar
Gad, A. R., Nashat, A. A. & Barkat, T. M. Intrusion detection system using machine learning for vehicular ad hoc networks based on ToN-IoT dataset. IEEE Access. 9, 142206–142217 (2021).
Article Google Scholar
Mohamed, R. H., Mosa, F. A. & Sadek, R. A. Efficient intrusion detection system for IoT environment. Int. J. Adv. Comput. Sci. Appl., 13(4). (2022).
Ibrahim Hairab, B. et al. Anomaly detection of zero-day attacks based on CNN and regularization techniques. Electronics 12 (3), 573 (2023).
Article MATH Google Scholar
Dobrojevic, M. et al. Addressing internet of things security by enhanced sine cosine metaheuristics tuned hybrid machine learning model and results interpretation based on shap approach. PeerJ Comput. Sci. 9, e1405 (2023).
Article PubMed PubMed Central Google Scholar
Maseno, E. M. & Wang, Z. Intrusion Detection in IoT Using Ensemble Approach. in ATAIT. (2023).
Dey, A. K., Gupta, G. P. & Sahu, S. P. Hybrid meta-heuristic based feature selection mechanism for cyber-attack detection in iot-enabled networks. Procedia Comput. Sci. 218, 318–327 (2023).
Article MATH Google Scholar

Download references

Funding

The authors are thankful to the Deanship of Graduate Studies and Scientific Research at University of Bisha for supporting this work through the Fast-Track Research Support Program.

Author information

Authors and Affiliations

Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah, 42351, Saudi Arabia
Saad Said Alqahtany
Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
Asadullah Shaikh
Emerging Technologies Research Lab (ETRL), College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
Asadullah Shaikh
College of Computing and Information Technology, University of Bisha, Bisha, Bisha, 61922, Saudi Arabia
Ali Alqazzaz

Authors

Saad Said Alqahtany
View author publications
Search author on:PubMed Google Scholar
Asadullah Shaikh
View author publications
Search author on:PubMed Google Scholar
Ali Alqazzaz
View author publications
Search author on:PubMed Google Scholar

Contributions

Saad Said Alqahtany.: Conceptualization, Software, Investigation, Data curation, Writing – original draft, Writing – review & editing. Asadullah Shaikh.: Conceptualization, Investigation, Project administration, Data curation, Writing – original draft, Writing – review & editing. Ali Alqazzaz.: Validation, Investigation, Data curation, Writing – review & editing.

Corresponding author

Correspondence to Asadullah Shaikh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Alqahtany, S.S., Shaikh, A. & Alqazzaz, A. Enhanced Grey Wolf Optimization (EGWO) and random forest based mechanism for intrusion detection in IoT networks. Sci Rep 15, 1916 (2025). https://doi.org/10.1038/s41598-024-81147-x

Download citation

Received: 05 July 2024
Accepted: 25 November 2024
Published: 14 January 2025
DOI: https://doi.org/10.1038/s41598-024-81147-x

Keywords

This article is cited by

Enhancing intrusion detection in wireless sensor networks using a Tabu search based optimized random forest
- Vivek Kumar Pandey
- Shiv Prakash
- Sheikh Tahir Bakhsh
Scientific Reports (2025)
An intelligent deep representation learning with enhanced feature selection approach for cyberattack detection in internet of things enabled cloud environment
- Hayam Alamro
- Sami Saad Albouq
- Ohud Alasmari
Scientific Reports (2025)
Artificial intelligence-driven cybersecurity system for internet of things using self-attention deep learning and metaheuristic algorithms
- Fahad Alblehai
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

A new intrusion detection method using ensemble classification and feature selection

Machine learning based intrusion detection framework for detecting security attacks in internet of things

(IoT) Network intrusion detection system using optimization algorithms

Introduction

Background

Related work

Materials and methods

Intrusion detection systems (IDS)

IDS based on FS adoption

Grey Wolf optimization (GWO) algorithm

Methodology of the proposed IDS Model

Step 1: Preparation of NF-ToN-IoT dataset

Preprocessing the NF-ToN-IoT dataset

Step 2: feature selection

Subset generation: preprocessing the NF-ToN-IoT dataset

Enhanced Grey Wolf Optimization (EGWO) algorithm

Improved Exploration and Exploitation

Better exploration

Balanced convergence

Improved scalability

Evaluation of the generated subset

Step 3: IDS Stage

Overfitting mitigation strategies

Cross-validation

Regularization

Random Forest’s Ensemble Approach

Comparative analysis based on performance

Results

Interpretability of recommendations

Feature selection analysis

Decision path analysis

Recommendation interpretations for actionable insights

Conclusion and future work

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Enhancing intrusion detection in wireless sensor networks using a Tabu search based optimized random forest

An intelligent deep representation learning with enhanced feature selection approach for cyberattack detection in internet of things enabled cloud environment

Artificial intelligence-driven cybersecurity system for internet of things using self-attention deep learning and metaheuristic algorithms

Search

Quick links