Intrusion detection in metaverse environment internet of things systems by metaheuristics tuned two level framework

Antonijevic, Milos; Zivkovic, Miodrag; Djuric Jovicic, Milica; Nikolic, Bosko; Perisic, Jasmina; Milovanovic, Marina; Jovanovic, Luka; Abdel-Salam, Mahmoud; Bacanin, Nebojsa

doi:10.1038/s41598-025-88135-9

Download PDF

Article
Open access
Published: 28 January 2025

Intrusion detection in metaverse environment internet of things systems by metaheuristics tuned two level framework

Milos Antonijevic¹,
Miodrag Zivkovic¹,
Milica Djuric Jovicic²,
Bosko Nikolic³,
Jasmina Perisic¹,
Marina Milovanovic¹,
Luka Jovanovic¹,
Mahmoud Abdel-Salam⁴ &
…
Nebojsa Bacanin^1,5,6

Scientific Reports volume 15, Article number: 3555 (2025) Cite this article

4754 Accesses
29 Citations
Metrics details

Subjects

Abstract

Internet of Things (IoT) is one of the most important emerging technologies that supports Metaverse integrating process, by enabling smooth data transfer among physical and virtual domains. Integrating sensor devices, wearables, and smart gadgets into Metaverse environment enables IoT to deepen interactions and enhance immersion, both crucial for a completely integrated, data-driven Metaverse. Nevertheless, because IoT devices are often built with minimal hardware and are connected to the Internet, they are highly susceptible to different types of cyberattacks, presenting a significant security problem for maintaining a secure infrastructure. Conventional security techniques have difficulty countering these evolving threats, highlighting the need for adaptive solutions powered by artificial intelligence (AI). This work seeks to improve trust and security in IoT edge devices integrated in to the Metaverse. This study revolves around hybrid framework that combines convolutional neural networks (CNN) and machine learning (ML) classifying models, like categorical boosting (CatBoost) and light gradient-boosting machine (LightGBM), further optimized through metaheuristics optimizers for leveraged performance. A two-leveled architecture was designed to manage intricate data, enabling the detection and classification of attacks within IoT networks. A thorough analysis utilizing a real-world IoT network attacks dataset validates the proposed architecture’s efficacy in identification of the specific variants of malevolent assaults, that is a classic multi-class classification challenge. Three experiments were executed utilizing data open to public, where the top models attained a supreme accuracy of 99.83% for multi-class classification. Additionally, explainable AI methods offered valuable supplementary insights into the model’s decision-making process, supporting future data collection efforts and enhancing security of these systems.

Improved model for intrusion detection in the Internet of Things

Article Open access 01 July 2025

Using machine learning algorithms to enhance IoT system security

Article Open access 27 May 2024

Orchestrating machine learning models in a swarm architecture for IoT inline malware detection

Article Open access 20 December 2025

Introduction

The Internet of Things (IoT) integrates together physical objects into the digital world, transforming how the users engage with both realities within the emerging and evolving landscape of the Metaverse¹. IoT networks drive the development of novel virtual ecosystems across sectors like smart cities, healthcare and entertainment. Thanks to IoT, data is autonomously collected, processed, and shared without interruption². The Metaverse enhances this connection by supporting immersive experiences, personalized interactions, and real-time decision-making³. Being a crucial part of the Metaverse, IoT leverages traditional networks into highly interconnected environments, promoting innovation and redefining user experiences by blending together real-world interactions with virtual opportunities. Thus, one of the requirements is to provide reliable operation of these networks, along with high level of availability⁴.

Personal IoT networks, consisting of wearables, smart home systems, and AR/VR gadgets, provide users with unprecedented levels of convenience and control over their Metaverse experiences. These devices allow establishment of a tangible connection between an individual’s virtual environment or avatar and their physical surroundings, supporting more inherent and advanced management of virtual spaces. The swift expansion of IoT is propelling the evolution of the Metaverse, breaking the limits of connectivity and merging the physical and digital worlds into a smooth and immersive experience^5,6.

However, this rapid expansion of IoT within the Metaverse faces significant challenges including protecting interconnected devices that handle sensitive user data, and mitigating real-time cyber threats that could disrupt immersive experiences⁷. Primarily, IoT devices in general are highly vulnerable to cyberattacks because of their limited processing capabilities and reliance on basic systems^8,9. This vulnerability is even more critical within the Metaverse, where essential virtual and physical infrastructures are managed by interconnected systems. Potential malicious users may exploit these weaknesses and disrupt online healthcare services, cause financial losses in Metaverse commerce, or gain unauthorized access to personal data streams, smudging the thin line between virtual and real-world consequences. Therefore, innovative security frameworks are essential for ensuring a secure and immersive Metaverse experience for all users. These solutions must hit the balance between the lightweight architecture of IoT gadgets and robust security measures, like advanced encryption methods and real-time updates^10,11,12.

The principal constraints of traditional security solutions may be summed up as the difficulty to keep up to date with dynamic and swiftly changing Metaverse environment. They are not adaptable enough to cancel out novel emerging threats like botnets attacks¹³, attempting to exploit the vulnerabilities of the complex correlations among the real and digital worlds within the Metaverse, as they are mostly designed to be reactive. On the other hand, cybersecurity solutions combined with artificial intelligence (AI) provide considerably more adaptable and data-driven defensive options^14,15. AI-fueled solutions are capable of analyzing immense datasets in real-time, allowing identification of trends and drifts in risk to prevent damage before it happens. This is vital for maintaining the robustness of the expanding Metaverse, providing users safe and continuous experience while they are exploring and producing new virtual contents.

Despite numerous advantages, AI faces some weaknesses as well. Primarily, inadequate quality of data, ill-judged algorithms and incompetently chosen hyperparameter configurations. Consequently, models trained by inadequate quality datasets may result in unreliable outcomes, highlighting the necessity of high quality data for proper training. Alternatively, selection of the appropriate machine learning (ML) models is crucial, as various methods have tendency to perform unalike regarding of the challenge being solved and utilized dataset. Hyperparameters’ configuration, like number of layers, learning rate or dropout can additionally heavily impact the model’s performance, and must be carefully optimized to achieve optimal outcomes. Wolpert’s no free lunch (NFL) assumption¹⁶ discloses non-existance of all-round solution that works well for all classification challenges. As a result, models have to be selected and adapted to each specific task. Nevertheless, optimizing hyperparameters is broadly recognized as an NP-hard optimization challenge due to its inherent complexity. A key challenge for AI scientists is determining the appropriate hyperparameter configuration in such situations, as it is computationally infeasible. Conventional optimization algorithms regularly fall short in these scenarios, as they struggle to deliver the desired outcomes within a tolerable time frame. One potential answer is to utilize metaheuristics algorithms, capable to scan immense solution spaces to deliver approximate solutions. These methods are well-suited for addressing complex real-world challenges where finding exact answers is impractical.

This paper proposes a framework consisting of two levels, galvanized by the architecture explored in the previous research¹⁷. Convolutional neural network (CNN) is utilized in the primary layer of the architecture, and assigned the role to extract the features. As outlined by other relevant previous publications^18,19,20, significant improvements in the performance of CNN can be introduced with replacement of the ultimate dense CNN’s layer by other classifiers like AdaBoost or XGBoost. Consequently, this study takes similar approach, by feeding the intercepted output of the final CNN’s data processing layer to the inputs of the second level of framework, where CatBoost and LightGBM classifiers are used for further improvement of the classification capability of the architecture, especially for high volume massive streams of data generated by Metaverse IoT networks, necessitating real time processing. Moreover, configuration of both levels of framework is optimized by metaheuristics algorithms, assigned to tune the hyperparameters of regarded models. This approach ensures achieving the finest possible outcomes of the proposed combined framework. Generally speaking, the proposed methodology maximizes the benefits provided by both deep learning and ensemble approaches, where metaheuristics algorithms warrant the proper configuration of models’ hyperparameters for achieving superior performance.

An altered version of chimp optimization algorithm (ChOA)²¹ was used in this research to tune the hyperparameters of both layers of the framework to ensure good performance. ChOA metaheuristics was selected after careful experimentation with different optimizers, since NFL¹⁶ elaborates that a ubiquitous optimizer that could deliver the best performance for all optimization problems does not exist. Despite the existence of other powerful optimizers like crayfish optimization algorithm (COA)²², red fox optimizer (RFO)²³ and reptile search algorithm (RSA)²⁴, elementary version of ChOA rallied astounding results over the smaller scale simulations, and it was consequently selected for auxiliary modifications that would allow reaching even more desirable outcomes for intrusion classification problem.

With respect to all presented facts, primary contributions of this research may be delineated along the following lines:

A proposition of the novel two-level AI framework for enhancing Metaverse IoT network security.
Framework comprised of combination of CNN and boosting ensemble classifiers to perform threat classification in IoT networks.
A proposition of a modified optimization metaheuristics tailored for the problem in hand, building upon the baseline ChOA, that was employed to tune the framework’s models.
The top-performance models were subjected to Explainable AI to establish the relative importance of the features and their effect on forecasts made by the system.

This study is arranged in the following units. Section "Related works" puts forth the related works on this matter along with the utilized techniques’ foundations. Next section "Methods" delineates the baseline ChOA metaheuristics, and showcases the altered version of algorithm that was later employed in the experiments. The settings of the experimental environment required for reproducibility of the simulations are set forth in Section "Experimental setup", while simulation outcomes of all simulations that were carried out are delineated and discussed in Section "Results". Ultimately, Section "Conclusion and future work" delivers the concluding remarks and suggests possible ways forward in the future research.

Related works

Conventional systems used for network protection, that revolve around firewall and blacklist solutions, have very constrained capabilities. They are not flexible enough, depending of the collection of rules and human interventions to adjust to the novel attacks. Moreover, they can only be upgraded with novel attacking patterns after the system was already breached. In other words, they are capable of responding to the events that have already happened in the past. This drawback makes conventional systems ineffective when encountering zero-day attack and emerging menaces, leaving the networks vulnerable and open to sophisticated cyber-threats. Many approaches were used from early 2000s²⁵, typically divided into intrusion detection systems (IDS) and intrusion prevention systems (IPS). Nowadays, a wide spectrum of tools is openly available for security of the systems, including firewall and antivirus applications, however, their restricted functionality leaves them open to novel types of threats.

One way to handle these novel types of threats that emerge each day revolves around integration of AI and IoT security applications. Generally speaking, AI couples seamlessly with IoT networks for different purposes as evidenced by numerous practical implementations^26,27. The role of AI in this scenario is to enhance the security of IoT networks through identification of anomalous behavior in real time, where ML models are utilized to detect and classify possible threats from normal traffic. Hybrid ML solutions tailored specifically for IoT security challenge have been introduced by papers such as²⁸, highlighting their superiority in threat detection across various IoT devices and architectures. More focused research, such as²⁹, explored intrusion detection specifically within healthcare-related IoT networks, employing ML classifiers adjusted by hybrid metaheuristics techniques. While these studies showcased the significant potential of ML models, they also emphasized the challenges associated with selecting the appropriate hyperparameters, which is crucial to achieve optimal performance.

Optimizing the hyperparameters of ML models is essential for achieving optimal results and maximizing effectiveness, not only within cybersecurity but across various other fields. Poor tuning often leads to model failure and underperformance. A significant portion of recent research focuses on hyperparameters tuning for various ML structures utilizing metaheuristics algorithms^30,31. This applies to the IoT intrusion detection problems as well, where hybrid approaches where ML models were tuned by metaheuristics delivered promising outcomes^32,33.

Despite recent progress in this field, a significant research gap remains. While metaheuristics-tuned ML models have been explored to some extent for IoT networks and intrusion detection, the focus has primarily been on models like XGBoost and AdaBoost, with limited investigation into LightGBM tuning. Additionally, the two-level framework suggested within this research, which combines a CNN with CatBoost and LightGBM classifiers and uses metaheuristics techniques to tune both levels, has not been previously studied for the observed challenge. Furthermore, the dataset³⁴ employed within the experiments, published in 2023, has yet to be thoroughly explored.

The remainder of this section yields brief background of the techniques utilized in this research, by providing basics of CNNs, CatBoost and LightGBM classifying models, followed by a short overview of metaheuristics approaches along with their prosperous applications.

Convolutional neural networks

Convolutional neural networks³⁵ are famous of their image classification and object detection capabilities, but they also excel in other tasks. Inspirited by the mammal visual cortex structure, they follow similar layered architecture. Input data passes through all layers in a particular order, making use of transfer activation functions like ReLu, tanh and sigmoid for mapping of the non-linear output.

To construct a deep CNN, it is essential to include a convolutional layer along with nonlinear, pooling, and fully connected layers³⁶. For the provided input data, multiple filters skid over the convolutional layer, producing an output as the sum of element-wise multiplication of each filter and the receptive field of the input data. This weighted sum is then placed as an element in the subsequent layer. Nonlinear layers primarily function to alter or constrain the output which is produced. Various nonlinear functions are available for use in CNNs, but ReLU remains one of the most widely employed options³⁷. The pooling layer effectively shrinks the dimensionality of the input data. The most commonly used method, max pooling, selects the highest value within each pooling filter. Max pooling is highly regarded in the relevant literature for its efficacy, as it downsamples the input by approximately 75%, delivering significant outcomes. Fully connected layers execute the classification task.

Convolution operation, expressed by Eq. (1), manages processing of the inputs:

$$\begin{aligned} z^{[l]}_{i, j, k} = w^{[l]}_{k}x^{[l]}_{i, j} + b^{[l]}_{k}, \end{aligned}$$

(1)

here, $z^{[l]}_{i, j, k}$ corresponds to the output feature outcomes produced by k-th feature map on position i, j within l-th layer. The input located on i, j is marked as x, w denotes the filter set, while b describes the bias scores.

Following the convolution operation, activation is executed according to the Eq. (2):

$$\begin{aligned} g^{[l]}_{i, j, k} = g(z^{[l]}_{i, j, k}) \end{aligned}$$

(2)

here, $g(\cdot )$ describes non-linear operation administered to the outputs.

The outputs resolution is reduced by the pooling layers, that apply either average or max pooling in the majority of the practical applications. This procedure is expressed by Eq. (3).

$$\begin{aligned} y^{[l]}_{i, j, k} = pooling(g^{[l]}_{i, j, k}). \end{aligned}$$

(3)

here, y represents the pooling layer’s result.

Ultimately, dense layers perform the classifying task. For multi-labeled data, softmax layer executes classifying task, while for binary classification problems, the logistic (sigmoidal) layer is employed. As the epochs pass by, the network updates the weights and bias scores reducing the cross-entropy loss function in a gradient-descent manner³⁸. This is mathematically expressed by Eq. (4).

$$\begin{aligned} H(p,q) = - \sum _{x}p(x)ln(q(x)) \end{aligned}$$

(4)

where p and q each denote distribution defined over discrete parameter x.

Optimizing CNN’s hyperparameters is essential, as they greatly influence the network’s accuracy. Key hyperparameters encompass the count and size of kernels within every convolutional layer, learning rate, batch size, the count of convolutional and fully connected (dense) coats, weight regularization within dense coat, activation functions, dropout rate, and others. Since there is no universal solution for hyperparameter tuning procedure, a “trial and error” approach is often necessary.

CNNs are widely adopted in computer vision³⁵, with recent advancements across areas such as facial recognition³⁹, document analysis⁴⁰, medical images classifying task and diagnostic support in general⁴¹. Additionally, CNNs also play an essential role in climate change analysis and extreme weather prediction⁴², among numerous other applications^43,44.

CatBoost classification model

Handling categorical datasets poses a considerable challenge within machine learning. Often, substantial preprocessing or conversion is required prior to effectively use data in models. Categorical features are characterized by a set of distinct values known as categories that cannot be compared. One common approach for working with categorical features in boosted tree models is one-hot encoding⁴⁵, where each category is represented by a novel binary feature. However, for features with large cardinality, this approach can synthesize an impractically large count of new features. A solution to this issue is to group categories into a limited count of clusters prior to applying one-hot encoding. One popular method for this is employing target statistics (TS)⁴⁵, where each category is represented by its projected target value. Yandex scientists devised the CatBoost algorithm⁴⁶ specifically to enhance the handling of categorical data compared to traditional approaches.

CatBoost adopts a more advantageous outlook inspirited by online learning frameworks, which process training samples sequentially over time, relying on a concept of ordering. In this method, TS for each instance are computed based solely on prior observations. To adapt this concept for traditional offline environments, CatBoost introduces a pseudo-time by creating a random permutation of the training samples. This allows the TS for each instance to be calculated with respect to all available historical data up to that point. Additionally, CatBoost employs a technique called ordered boosting, which prevents prediction shift, further leveraging the model’s reliability⁴⁶. Catboost produces $s + 1$ discrete random permutations of the training dataset at the beginning. Here, $\sigma _0$ is utilized to select the leaf scores $b_j$ of the generated trees $h(x) = \sum _{j=1}^{J}b_j\mathbb {I}_{\{x\in R_j\}}$, and the permutations $\sigma _1,...,\sigma _s$ are used to establish tree structure (like internal nodes). Let the model training is performed employing I trees. There have to exist $F^{I-1}$ of them exercised without the sample $x_k$ if there exist unshifted residual $r^{I-1}(x_k,y_k)$. Instances cannot be used in training $F^{I-1}$ since unbiased residuals are necessary for all training samples. Nevertheless, it is possible to maintain a set of models that differ with respect to the samples included in their training process. To compute the residual for a particular example, a model trained without that example is utilized. This set of models can be constructed through application of the same ordering principle utilized for TS. The algorithm for this approach is showcased as follows:

Within CatBoost, base estimators behave like oblivious decision trees, meaning that the same splitting criteria are applied across all tree levels. This structure considerably enhances execution speed for testing, creates balanced trees, reduces susceptibility to overfitting issue, and enables significant performance acceleration. The scores within the leaves in the ultimate model are established through the standard gradient boosting procedure, applied consistently across both modes, incorporating all constructed trees. Each training sample is mapped to specific leaves, such as $leaf_0(i)$, with the permutation $\sigma _0$ utilized to calculate TS within this context. In testing phase, when the final model is applied to a novel example, TS values are calculated utilizing the entire training dataset.

As the count of categorical features within a dataset grows, the possible combinations increase exponentially, making it impractical to process them all. To address this, CatBoost uses a greedy approach to produce feature combinations. For each split in a tree, CatBoost combines all categorical features and their combinations previously employed in earlier splits of the current tree with all categorical features in the dataset.

LightGBM classification model

LightGBM (light gradient boosting machine)⁴⁷ was introduced by Microsoft and made open-source. It is a gradient boosting framework designated for high performance and efficiency when dealing with immense datasets. It manages to achieve excellent performance thanks to a novel method labeled gradient-based one-side sampling (GOSS), that decreases the count of data samples while keeping the accuracy. Moreover, LightGBM also employs exclusive feature bundling (EFB) for combining the mutually exclusive attributes, effectively decreasing the data dimensionality and leveraging the computing efficacy. This pair of innovative procedures helps LightGBM to perform training faster in comparison to conventional boosting models, and efficiently handle immense datasets comprising of thousands and millions of samples and features.

LightGBM exhibited excellent performance in classification, regression and ranking challenges, and consequently has become popular choice for various ML-based applications that span from medicine⁴⁸ and climate factors⁴⁹, all the way to civil engineering⁵⁰ and fault detection⁵¹. Moreover, innovative design provides support for parallel and distributed processing, allowing it to be scaled with great efficiency over several computing machines. The most commonly optimized LightGBM hyperparameters encompass the count of leaves in a tree (principal parameter to control the tree complexity), maximum depth and learning rate, among others. The model’s level of performance is significantly affected by proper choice of these values.

Stochastic optimizers

Metaheuristics optimization encompasses a set of algorithms aimed at discovering approximate resolutions for complex optimization challenges (NP-hard), which are impractical to solve exactly with administration of deterministic conventional mechanisms. Many of these methods take inspiration from natural events, such as evolution or collective behavioral patterns⁵². These are especially valuable to resolve large-scale, nonlinear, or unstructured problems where deterministic techniques fall short because of excessive resource requirements and/or infeasible time-frames⁵³. Metaheuristics provide versatility and scale well, allowing them to explore a wide search domain while keeping the risk of becoming trapped in local optima at minimum. Despite they are not able to guarantee the establishment of the global optimal solution, they can discover near-optimal results in acceptable time. Swarm intelligence algorithms represent a subset of these optimization techniques, drawing inspiration from nature, where plain individuals can express complex and smart collective behavior. Due to their distributed nature, algorithms belonging to this group are particularly effective for tackling large, high-dimensional optimization problems^54,55.

Notable exemplars of metaheuristics approaches encompass conventional and broadly-respected algorithms such as particle swarm optimization (PSO)⁵⁶, genetic algorithm (GA)⁵⁷, variable neighborhood search (VNS)⁵⁸, artificial bee colony (ABC)⁵⁹, firefly algorithm (FA)⁶⁰ and bat algorithm (BA)⁶¹. A considerable portion of more recent techniques were introduced in the last few years, such as COLSHADE⁶², crayfish optimization algorithm (COA)²², reptile search algorithm (RSA)²⁴, red fox optimizer (RFO)²³ and recently developed sinh cosh optimizer (SCHO)⁶³. Methods belonging to this particular family of algorithms are well known as powerful optimizers, and as such were applied in practice in a broad range of application domains, like time series forecasting⁶⁴, software development^17,65, healthcare³¹, cloud and edge computing systems^66,67 and power grids tuning⁶⁸. Moreover, the application of metaheuristics algorithms in the domain of AI models hyperparameters optimization can remarkably enhance their performance⁶⁹, as evidenced by numerous preceding studies^70,71,72. IoT networks were also leveraged with the application of metaheuristics optimization algorithms⁷³, addressing challenges like data aggregation⁷⁴, blockchain performance optimization⁷⁵ and security^76,77.

Methods

This unit commences by briefly introducing the concepts of the baseline chimp optimization algorithm, followed by its known constraints. After the limitations are discussed, this chapter offers a modified variant of the algorithm that improves the performance of the elementary version.

Baseline Chimp optimization algorithm

The chimp optimization algorithm (ChOA) belongs to the group of swarm intelligence metaheuristics techniques, and it was developed to emulate the hunt technique and collective behavioral patterns of a troop of chimpanzees²¹. In this approach, chimpanzees are divided into four key subgroups: attackers, chasers, holders, and callers, each contributing uniquely to enhance the optimization procedure. This collaborative approach aids the algorithm to maintain a balance betwixt exploration (search for novel areas) and exploitation (improving existing solutions).

In the baseline ChOA, the attacking group moves in line with their position update pattern, governed by Eq. (5):

$$\begin{aligned} X_{\text {attack}} = X_{\text {best}} - A \cdot \left| C \cdot X_{\text {best}} - X\right| \end{aligned}$$

(5)

here, $X_{\text {best}}$ denotes the top-performing chimp location, while A and C correspond to the coefficient vectors dynamically adjusted within each round, empowering the exploration procedure.

Individuals belonging to the chasing pack $X_{\text {chase}}$ update their positions as governed by Eq. (6):

$$\begin{aligned} X_{\text {chase}} = X_{\text {attack}} - B \cdot \left| D \cdot X_{\text {attack}} - X\right| \end{aligned}$$

(6)

where B and D serve as control variables for maintaining the balance among exploration and exploitation phases.

The individuals belonging to the holder troop $X_{\text {hold}}$ refresh their positions in line with Eq. (7):

$$\begin{aligned} X_{\text {hold}} = X_{\text {chase}} - E \cdot \left| F \cdot X_{\text {chase}} - X\right| \end{aligned}$$

(7)

here, E and F serve as supplementary parameters that govern this update.

Finally, the individuals from caller troop $X_{\text {call}}$ preform position update according to the Eq. (8):

$$\begin{aligned} X_{\text {call}} = X_{\text {hold}} - G \cdot \left| H \cdot X_{\text {hold}} - X\right| \end{aligned}$$

(8)

here G and H have similar roles to A and C, ajdusted to the callers’ function of the optimization process.

By iteratively refining positions, ChOA leverages the collective intelligence of the different chimpanzee roles to solve complex optimization tasks, proving to be an efficient method for addressing high-dimensional search domains within a broad spectrum of real-world applications.

Altered ChOA

Notwithstanding excellent optimization characteristics of the relatively novel ChOA algorithm, thorough experiments on the CEC benchmark function collection⁷⁸ exposed some areas of the algorithm that may be targeted for enhancements. These empirical experiments have showcased that the baseline ChOA could profit from the early bolster of the population diversity. Moreover, baseline algorithm’s converging speed and balance betwixt diversification and intensification stages could also be leveraged. With these opportunities for improvements in mind, several alterations are proposed in this study.

First added alteration targets boosting of the population diversity over the initialization stage, by incorporating the quasi-adaptive learning (QRL)⁷⁹ procedure to the elementary ChOA. In the modified initialization stage, only a half of the solutions are synthesized by applying the conventional ChOA initialization process. Other half of solutions are synthesized with QRL mechanism to boost diversification in the early phase of the algorithm’s run. Novel solutions are synthesized as quasi-reflexive opposite individuals with respect to the Eq. 9.

$$\begin{aligned} X^{qr}_j = rnd \bigg ( \frac{lb_j + ub_j}{2}, x_j\bigg ) \end{aligned}$$

(9)

here, $\frac{lb_j + ub_j}{2}$ is the arithmetic mean for each parameter’s search limits, while rnd() represents an arbitrary selection procedure within the given boundaries.

Another modification that was implemented into the ChOA algorithm is the soft rollback mechanism, introduced by this study. If the algorithm stagnates in T/3 iterations (empirically established), where T is the maximum number of iterations, rollback of the entire population to the previous state is performed. Two novel control parameters were introduced to support this alteration, stagnation counter sc and stagnation threshold st. The initial values of these parameters are $sc=0$ and $st=\frac{T}{3}$. If there is no improvement in the current iteration, sc is incremented. If the value of sc reaches st, soft rollback is performed. Ultimately, elitism is also applied in the following way. When the rollback is performed, the best individual (having the best fitness value) is kept in the population, while the rest are produced by applying the proposed initialization described above.

Considering all included modifications, the novel algorithm was named iteration stagnation aware ChOA (ISA-ChOA), with the pseudo code given in Algorithm 2. It is also necessary to note that the ISA-ChOA utilizes identical control parameters’ values and their update procedures as suggested by the authors of the baseline algorithm²¹. Regarding the complexity of the introduced algorithm, it is common to express it in terms of fitness function evaluations (FFEs), since it is the most expensive calculation during the metaheuristics algorithm’s execution. Since soft rollback is executed after every three iterations (if the stagnation is confirmed), the complexity of ISA-ChOA algorithm is in the worst case scenario $O(n) = N + N \times T + (N-1) \times T / 3$, where N is the count of solutions, while T is the count of iterations. However, in practice, soft rollback is on average triggered only once per run, which is significantly less than the worst case scenario.

Experimental setup

The experiments in this study were conducted with a recent CICIoT2023 intrusion detection dataset³⁴, publicly accessible at https://www.unb.ca/cic/datasets/iotdataset-2023.html. This dataset was developed to evaluate security analytics programs for practical IoT environments and includes 33 distinct attack variants executed across an IoT topology of 105 devices. These attacks are categorized into seven types: DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai. This allows for both binary classification (attack versus benign traffic) and multiclass classification with either 8 classes (normal and each attack type) or 34 classes (normal and each of the 33 individual attacks). Class dispersal of both binary and multiclass problems is showcased within Fig. 1. This research addressed 8-class multiclass prediction task. The original dataset contains 1048575 samples. Due to the immense size of the dataset and overwhelming computing requirements, it has been reduced to 20% of the initial size, by performing stratifying of the target 8 classes to keep the imbalance among classes identical to the original dataset. The resulting dataset contains 209715 samples, and is subsequently separated into 70% training and 30% testing data. Models were validated by applying conventional ML metrics: accuracy, precision, sensitivity, and F1-score.

Matthews correlation coefficient (MCC)⁸⁰ was opted as the objective function that requires maximization. MCC represents an important indicator particularly when facing imbalanced datasets like CICIoT2023. The imbalance of the utilized dataset is indeed a challenge, however, it reflects a real-world situation, since most of the real-life network traffic is not balanced. Thus, it is crucial that the proposed model should work properly with highly imbalanced data. MCC value is established by utilizing the Equation (10). Moreover, the classification error (which is defined by $1-accuracy$) was monitored across all simulations and acted as the indicator function.

$$\begin{aligned} \text {MCC} = \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}} \end{aligned}$$

(10)

here, TP corresponds to the true positive forecasts, TN represents the count of true negative predictions, FP is the amount of false positive predictions, and finally, FN denotes the count of false negative classifications.

CNN set of hyperparameters was tuned within the premier tier of the introduced ML framework. The collection of opted hyperparameters, along with search region limits for each parameter is presented in Table 1. Batch size of 512 was used, with early stopping enabled.

Table 1 Collection of CNN hyperparameters tuned in this study with their search restrictions.

Subjects

Abstract

Similar content being viewed by others

Introduction

Related works

Convolutional neural networks

CatBoost classification model

LightGBM classification model

Stochastic optimizers

Methods

Baseline Chimp optimization algorithm

Altered ChOA

Experimental setup

Results

Layer 1 CNN multiclass experiments

Layer 2 CatBoost multiclass experiments

Layer 2 LightGBM multiclass experiments

Comparison with state of the art classifcaiton models

Statistical analysis and the best models interpretation

Interpretation of the best performing models

Conclusion and future work

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Search

Quick links