An enhanced ensemble defense framework for boosting adversarial robustness of intrusion detection systems

Awad, Zeinab; Zakaria, Magdy; Hassan, Rasha

doi:10.1038/s41598-025-94023-z

Download PDF

Article
Open access
Published: 23 April 2025

An enhanced ensemble defense framework for boosting adversarial robustness of intrusion detection systems

Zeinab Awad^1,2,
Magdy Zakaria¹ &
Rasha Hassan¹

Scientific Reports volume 15, Article number: 14177 (2025) Cite this article

6231 Accesses
10 Citations
Metrics details

Subjects

Abstract

Machine learning (ML) and deep neural networks (DNN) have emerged as powerful tools for enhancing intrusion detection systems (IDS) in cybersecurity. However, recent studies have revealed their vulnerability to adversarial attacks, where maliciously perturbed traffic samples can deceive trained DNN-based detectors, leading to incorrect classifications and compromised system integrity. While numerous defense mechanisms have been proposed to mitigate these adversarial threats, many fail to achieve a balance between robustness against adversarial attacks, maintaining high detection accuracy on clean data, and preserving the functional integrity of traffic flow features. To address these limitations, this research investigates and integrates a comprehensive ensemble of adversarial defense strategies, implemented in two key phases. During the training phase, adversarial training, label smoothing, and Gaussian augmentation are employed to enhance the model’s resilience against adversarial perturbations. Additionally, a proactive preprocessing defense strategy is deployed during the testing phase, utilizing a denoising sparse autoencoder to cleanse adversarial input samples before they are fed into the IDS classifier. Comparative evaluations demonstrate that the proposed ensemble defense framework significantly improves the adversarial robustness and classification performance of DNN-based IDS classifiers. Experimental results, validated on the CICIDS2017 and CICIDS2018 datasets, show that the proposed approach achieves aggregated prediction accuracies of 87.34% and 98.78% under majority voting and weighted average schemes, respectively. These findings underscore the effectiveness of the proposed framework in combating adversarial threats while maintaining robust detection capabilities, thereby advancing the state-of-the-art in adversarial defense for intrusion detection systems.

Universal attention guided adversarial defense using feature pyramid and non-local mechanisms

Article Open access 12 February 2025

MultiNet-IDS: an ensemble of DRL and LSTM for improved intrusion detection in wireless sensor networks

Article Open access 29 September 2025

Smart intrusion detection model to identify unknown attacks for improved road safety and management

Article Open access 27 May 2025

Introduction

The adoption of machine learning (ML) and deep learning (DL) techniques has significantly advanced various domains, including information security, natural language processing, and computer vision. These approaches have enabled the development of applications with exceptional learning capabilities, particularly in classification and regression tasks, achieving demonstrably high accuracy and robust performance. Notable examples include financial systems¹, healthcare applications², and cybersecurity solutions³. In the context of cybersecurity, deep learning-based network intrusion detection systems (NIDS) have emerged as a powerful tool for safeguarding network traffic against security breaches. These systems can identify malicious activities within large and diverse traffic flows with remarkable precision. The unique strengths of deep learning, such as automated feature extraction, non-convex optimization, and end-to-end learning frameworks, enable it to outperform traditional signature-based and conventional machine learning-based approaches in terms of both detection accuracy and scalability⁴. However, despite their potential, deep neural networks (DNNs) are vulnerable to adversarial attacks. These attacks exploit the inherent weaknesses of DNNs, compromising their reliability and effectiveness. Consequently, while deep learning holds great promise for developing advanced intrusion detection systems, the susceptibility of these systems to adversarial threats poses a significant challenge that must be addressed to ensure their robustness and security⁵.

Adversarial attacks aim to deceive machine learning detectors by introducing subtle, carefully crafted perturbations to input samples, known as adversarial examples, during the deployment phase of deep neural networks (DNNs). These adversarial instances exploit vulnerabilities in the model, leading to misclassifications while remaining imperceptible to human observers. The existence of adversarial examples can be attributed to two primary factors⁶. First, there is often a discrepancy between the distribution of the training data and the distribution of real-world data. This mismatch creates a gap between the learned decision boundary of the model and the optimal decision boundary, rendering the model susceptible to adversarial manipulations. Second, the inherent linearity of neural networks, despite their nonlinear activation functions, contributes to their vulnerability. This linearity allows adversaries to generate adversarial examples near the manifold of the training data using gradient-based techniques, such as the fast gradient sign method (FGSM). These factors collectively underscore the challenges in developing robust machine learning models capable of withstanding adversarial threats¹⁶.

Deep learning-based network intrusion detection systems (NIDS) are threatened by adversarial attacks that compromise their performance. The attacker crafts the adversarial examples by Introducing perturbation into the training or testing input of the DL models causing them to misclassify certain classes (targeted attack) or misclassify the input sample without emphasis on specific class output (untargeted attack). To tackle the situation, several mitigation approaches have been proposed by the defenders to mitigate or defend against such attacks. Among those common approaches are adversarial training⁷, adversarial detection, feature transformation, label and spatial smoothing, adversarial perturbation elimination, and ensemble defense⁸. The purpose of the research is to assess the effect of adversarial evasion attacks introduced at test time on the intrusion detection system classifier performance. Examining the impact of assembling multiple adversarial defense strategies like adversarial training, gaussian augmentation, and label smoothing to promote robustness on deep learning-based intrusion detection decisions at training time. As an additional preprocessing defense, we investigate the effect of input denoising using an unsupervised deep denoising autoencoder to evade the adversarial instances injected into the input samples before being fed into the IDS classifier at the testing time, which is considered a proactive defense to boost the IDS classifier performance.Additinally, using an ensemble defense rather than an individual defense mechanism offers several advantages. These advantages stem from the diversity, robustness, and complementary strengths of multiple defense strategies working together. All the defense mechanisms are evaluated in semi-white box and black box settings. A black box setting is where the attacker does not have any prior knowledge about the trained model architecture or initialization parameters, or even the defense mechanism and only can access the dataset used. In contrast, the attacker has full knowledge of the trained model and the dataset in the white box setting which is an unrealistic assumption. In a semi-white box environment, the attacker has only partial knowledge about the trained model architecture without the training parameters initialization which is considered more realistic. To evaluate the efficacy of the proposed ensemble approach We implement a baseline model as an IDS classifier to measure the impact of the generated adversarial examples on the IDS classifier performance before and after applying the defense approaches.

The proposed work has made the following significant contributions, as outlined below:

Building a multi-module adversarial defense ensemble technique in the context of intrusion detection systems. The proposed defense ensemble mechanism aggregates multi-source adversarial training, gaussian argumentation retraining, adversarial denoising, and label smoothing that significantly increase IDS classifier robustness against adversarial threats while maintaining the intrusion detector’s accuracy on clean data.
analyze and provide an assessment of DNN-based IDS effectiveness under adversarial free settings, different state-of-the-art evasion attack scenarios, and adversarial mitigation strategies using the CIC-IDS 2017,CIC-IDS 2018 benchmark datasets.
Allow an Effective one-time retraining and evaluation of the multi-class IDS detector, instead of considering separate intrusion class retraining or reducing the IDS detector task to binary classification.
Fulfil the constraint of Preserving the functionality of traffic features during the adversarial generation by Leveraging Extremely Randomized Trees for robust non-functional feature selection.
This paper provides a thorough theoretical foundation, technical understanding of the suggested ensemble defense strategy with two major stages during the training and testing time by providing step-by-step implementation procedures and algorithms in detail.
highlight the key benefits of using an ensemble defense mechanism rather than relying on individual defense mechanisms in enhanced robustness against adversarial attacks and improved detection accuracy through the complementary strength of the ensemble members.

The following sections are organized as follows; Section 2. Background provides a brief background about machine learning-based intrusion detection, adversarial attack methods, and adversarial threat model. Whereas most recent literature studies are discussed in Section 3 . Related articles. Section 4. The proposed approach illustrates the proposed two-phase ensemble defense approach. The experimental analysis and results are covered in Section 5. Experimental results and evaluation. The results obtained and their comparison to the existing approaches are discussed in Section 6. Discussion and comparison with related works. Section 7. Conclusion and future work concludes the proposed work and provides recommendations for future work.

Background

Intrusion detection systems

An intrusion detection system is a mechanism that attempts to discover or detect abnormal access to computers or intrusions that incorporate unauthorized access, malicious activities, and cyber-attacks³. Current IDS face the problems of low detection rates and hence slow response in addition to high false positive and false negative rates due to the dynamic and complex nature of cyber threats. Deep Neural networks (DDN)⁴ have become a powerful approach in cyber security IDS, achieving supreme attack detection and prevention rates with low false positive rates over many classical ML approaches.

Adversarial threat model

Although machine learning-based intrusion detection system (IDS) models demonstrate effective performance in classifying benign and malicious input instances while reducing false alarms, they remain vulnerable to certain types of attacks, particularly adversarial attacks. Adversarial attacks involve the introduction of subtle, often imperceptible perturbations into input samples, which can deceive the trained models into producing incorrect classifications or invalid responses¹⁵. These adversarial instances are carefully crafted to exploit the vulnerabilities of machine learning models, undermining their reliability and effectiveness. The optimization objective of adversarial attacks can be formally represented as shown in Eq. (1):

$$\:{\varvec{m}\varvec{a}\varvec{x}}_{{{\varvec{\Vert}\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v}}-\varvec{x}\varvec{\Vert}}_{\varvec{p}\le\:\varvec{\varepsilon}}}\varvec{\vert}\varvec{c}\left({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v}}\right)-\varvec{c}\left(\varvec{x}\right)\mathbf{\vert}$$

(1)

Here, ∥⋅∥p denotes the p-norm distance metric between the adversarial sample x_adv and the original input x. The perturbation must remain within a predefined constraint ϵ, ensuring that the adversarial example x_adv does not deviate significantly from x. This constraint is crucial to maintain the adversarial example within the bounds of its original class while maximizing the classification discrepancy between x_adv and x¹⁶. In the context of network intrusion detection systems (NIDS), adversarial perturbations must also preserve the functionality of network traffic. Therefore, ϵ must be carefully chosen to be sufficiently small to avoid disrupting the traffic flow, yet large enough to induce misclassification by the IDS model. This balance ensures that the adversarial example remains effective without compromising the integrity of the network traffic.

Adversarial attacks in intrusion detection systems (IDS) can be categorized into two primary types based on their timing and objectives. Poisoning attacks occur during the training phase, where adversarial instances corrupt the training data to compromise the model’s learning process. In contrast, evasion attacks occur during the testing or validation phase, where adversarial perturbations are introduced to input samples to deceive the trained model into making incorrect predictions. Despite the growing interest in adversarial attacks and defense mechanisms for intrusion detection models, this area remains in its early stages of research, with significant opportunities for further exploration and development¹⁸. This paper focuses on evaluating the impact of various gradient-based evasion attacks on the performance of intrusion detection models. Specifically, we examine the Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Jacobian-based Saliency Map Attack (JSMA), DeepFool (DF), and Projected Gradient Descent (PGD) as well as the Carlini & Wagner (C&W) adversarial attack. Furthermore, we investigate a comprehensive ensemble of defense strategies designed to mitigate the effects of these attacks. The proposed defense ensemble integrates adversarial training, label smoothing, Gaussian augmentation, and adversarial denoising to enhance the robustness of the intrusion detection classifier. Adversarial denoising, a preprocessing defense mechanism, relies on transforming input samples to remove adversarial perturbations before they are fed into the model¹⁴. In this study, we employ a denoising sparse autoencoder to achieve this objective, ensuring that the input data is cleansed of adversarial noise while preserving its functional integrity.

Attack strategies

Adversarial examples generation depends on various adversarial attack algorithms that encompass the fast gradient sign method (FGSM), basic iterative method (BIM), Deep fool (DF), and Jacobian saliency map method (JSMA), and Projected Gradient Descent (PGD) method as well as the Carlini & Wagner (C&W) method.

Fast gradient sign method (FGSM)

FGSM⁹ calculates the gradients of a loss function (such as categorical cross-entropy or mean-squared error) for the input instance. An adversarial instance is then created by maximizing the loss using the sign of the gradient by multiplying the signed gradient with a small adjustment Ɛ close to the loss function value to ensure the indistinguishable perturbations. The perturbation equation is given by Eq. (2)

$$\:\varvec{a}\mathbf{d}{\mathbf{v}}_{\mathbf{x}}=\mathbf{x}+\mathbf{\epsilon}\mathbf{*}\mathbf{s}\mathbf{i}\mathbf{g}\mathbf{n}\left({\nabla\:}_{\mathbf{x}\:}\mathbf{j}\left(\varvec{\uptheta\:},\mathbf{x},\mathbf{y}\right)\right)$$

(2)

In Eq. (2), adv_x represents the perturbed variant of the input sample x, with predicted class label y; θ stands for the network weights; and j(θ, x, y) is the loss function that was utilized to train the neural network. To disturb the benign data sample, the sign of the gradient of the loss function is scaled by $\:\mathbf{\epsilon}$ and added.

Basic iterative method (BIM)

The Basic Iterative Method (BIM) is an iterative extension of the Fast Gradient Sign Method (FGSM), designed to perform a more refined optimization of adversarial perturbations. Unlike FGSM, which applies a single-step perturbation, BIM iteratively applies FGSM with a small step size over multiple iterations. Starting from the original input, BIM incrementally adjusts the perturbation at each step, ensuring that the adversarial example remains within a predefined valid range (bound) by clipping the perturbation after each iteration. This iterative process allows BIM to generate more effective adversarial examples with subtle, carefully controlled modifications, making it a stronger attack compared to the single-step FGSM¹⁰.

The projected gradient descent (PGD)

PGD is a more general and powerful iterative attack method. It is used for both targeted and untargeted attacks. It is considered a universal first-order adversary because it is highly effective against many defenses. Like BIM, PGD iteratively applies small perturbations and projects the result back into the valid input space. Apply small perturbations iteratively. Projection Ensures the adversarial example stays within the ϵ-ball around the original input. It is often more effective and stronger than BIM because it starts from a random point with the ϵ-ball, making it more robust to defend.

The deepfool (DF)

It is a non-targeted attack introduced by Moosavi Dezfooli et al.¹¹. It is one of the attack methods to apply the minimal perturbation for misclassification under the L₂ distance metric. The method performs iterative steps on the adversarial direction of the gradient provided by a locally linear approximation of the classifier until the decision hyperplane has crossed.

Jacobian-based saliency map attack (JSMA)

Papernote et al.¹² designed an efficient adversarial attack method called JSMA in 2016. The algorithm selects features with the greatest impact on the output target class, called target features, using forward derivative and adversarial saliency maps. It then obtains adversarial examples x_adv by modifying target features. If the adversarial example is unsuccessful in fooling the target DNN, the adversarial saliency map of x_adv is calculated again until the attack is successful.

Carlini & Wagner (C&W) adversarial attack

It is a powerful optimization-based method for generating adversarial examples introduced by Nicholas Carlini and David Wagner³⁰. It is widely used to evaluate the robustness of machine learning models and to develop defenses against adversarial attacks. By formulating the problem as an optimization task, the C&W attack can generate highly effective adversarial examples with minimal perturbation. The C&W attack optimization function could be formulated as follows:

The C&W attack solves the following optimization problem:

$$\:\text{min\:}\delta\:|{\updelta\:}{|}_{p}+c\cdot\:f\left(x+{\updelta\:}\right)$$

(3)

Subject to: $\:x+{\updelta\:}\in\:{\left[\text{0,1}\right]}^{n}$

Where δ is the perturbation added to the input x. ∥δ∥p is The Lp norm of the perturbation (e.g., L₂, L_∞) to measure its magnitude, c is a hyperparameter that balances the trade-off between minimizing the perturbation and maximizing the effectiveness of the attack. $\:f$(x + δ) is A loss function designed to ensure that the adversarial example x + δ is misclassified by the model.

. The objective is typically a combination of two conflicting objectives:

Minimizing the perturbation: To ensure that the adversarial example remains visually like the original input.
Maximizing the misclassification confidence: To guarantee that the perturbed input is misclassified by the target model, and it is controlled by c weight.

Deep autoencoder

Deep autoencoder¹⁴ is a type of neural network architecture trained in an unsupervised way to learn the manifold of the training samples distribution. The goal is to create an encoded representation of the input by extracting the most relevant aspects from it called latent space representation and then decompress it to reobtain the original input. The autoencoder is composed of two neural networks: an Encoder that compresses the information of the input into a lower dimensional latent code, and a Decoder that decompresses the encoded representation back to the original input.in our work, we use the sparse autoencoder that learns better representations and outperforms the original autoencoder due to L₁ sparsity regularization which makes its activations sparser.

The proposed approach

This Section explains the proposed ensemble defense approach to mitigate and defend against multi-source adversarial examples. It presents the theoretical foundation of the ensemble defense model components encompassing formal derivation for the optimization function of each defense method and the overall optimization function of the ensemble defense model. The suggested defense approach is implemented in two main stages. First, at the training time, by retraining the DNN-based IDS classifier with adversarial examples in adversarial training, data samples were augmented with Gaussian noise in Gaussian augmentation and data samples with the smoothed label in the label smoothing technique. The purpose of this stage is to boost the robustness of the target IDS classifier against adversarial threats. Second, at the inference time denoising or purifying the input samples to face and mitigate the effect of adversarial examples using sparse denoising autoencoder. We begin by building and training a deep learning-based intrusion detection system on the CIC-IDS2017 dataset²⁰ clean samples. Next, both before and after building the adversarial ensemble defense, the deep learning-based IDS’s robustness to adversarial examples is investigated and confirmed. We generate multi-source adversarial samples based on the substitution-trained classifier and evaluate their impact on the DNN-based IDS classifier without the use of any adversarial defense. The following techniques were used to create adversarial samples for this study: the Fast Gradient Sign Method (FGSM), DeepFool (DF), Basic Iteration Method (BIM), Jacobian-based Saliency Map Attack (JSMA), projected gradient descent (PGD), Carlini and Wagner (C&W).

This research investigates the following key propositions: Adversarial attacks are executed during the inference phase of the deep neural network (DNN)-based intrusion detection system (IDS) model, specifically focusing on evasion attacks. We assume that the attacker has limited or no knowledge of the IDS model’s internal architecture and parameters, thereby considering both semi-white box and black box attack scenarios. Furthermore, the adversarial attacks under examination are untargeted, meaning the objective is not to misclassify input samples into a specific target class but rather to deceive the DNN-based IDS classifier into producing incorrect predictions. Because of these attacks, we anticipate a degradation in the performance of the DNN-based IDS classifier, as evidenced by a decline in various performance metrics.

Figure 1 illustrates the framework of this study, which consists of nine sequential stages. The process begins with the collection of the CIC-IDS dataset, followed by data preprocessing to clean and prepare the data for analysis. Next, the Extremely Randomized Trees (ERTs) feature selection technique is applied to identify and retain the most relevant and functional traffic features while excluding redundant or irrelevant ones. The dataset is then split into training and testing sets to facilitate model development and evaluation. Subsequently, the intrusion detection system (IDS) model is trained, and its performance is evaluated on clean data. Adversarial examples are then generated to simulate potential attack scenarios, and the IDS model is re-evaluated under these adversarial conditions to assess its vulnerability. To mitigate the impact of adversarial attacks, the proposed adversarial ensemble defense is applied, incorporating multiple defense strategies to enhance model robustness. Finally, the IDS model is re-evaluated to measure the effectiveness of the defense mechanisms in improving its performance under adversarial settings.

To evaluate the effectiveness of each defense technique—adversarial training, Gaussian augmentation, label smoothing, and denoising autoencoder—and to assess their combined impact within the proposed framework, we conduct an ablation study. This study involves training and testing the intrusion detection system (IDS) model with each defense technique applied individually, followed by a comprehensive evaluation of their performance. The results are then compared against the performance of the ensemble approach, which integrates all defense mechanisms. The primary objective of this ablation study is twofold: (1) to determine the standalone effectiveness of each defense technique in mitigating adversarial threats, and (2) to demonstrate the synergistic advantages of combining these techniques into a unified defense mechanism, thereby enhancing the overall robustness and resilience of the IDS model.

Problem formulation

Let:$\:\mathcal{D}=\{\left({x}_{i},{y}_{i}\right){\}}_{i=1}^{N}\:$ be the training dataset, where xi is the input and yi is the corresponding label.fθ be the model with parameters θ.L(f_θ(x), y) be the loss function (e.g., cross-entropy loss).The goal is to train a robust model fθ that minimizes the classification loss of both clean and adversarial inputs.so the threat model is formalize as:

$$\:{x}_{\text{adv}}=x+{\updelta\:},\hspace{1em}{\updelta\:}=\text{arg}\underset{\left|{\updelta\:}\right|\le\:{\epsilon}}{\text{max}}\mathcal{L}\left({f}_{{\uptheta\:}}\left(x+{\updelta\:}\right),y\right)$$

where $\:\mathcal{D}=\{\left({x}_{i},{y}_{i}\right){\}}_{i=1}^{N}\:$ be the training dataset, where x_i is the input and y_i is the corresponding label. F_θ be the model with parameters θ. L(f_θ(x), y) be the loss function.

Theoretical foundation for ensemble defense mechanisms

Adversarial training, Gaussian augmentation, label smoothing, and denoising autoencoders (DAEs) are complementary defense mechanisms that enhance the robustness of machine learning models against adversarial attacks. Each technique addresses different aspects of the adversarial vulnerability problem, and their combination creates a more robust and generalized defense. Below is a theoretical foundation explaining their synergy and contribution to adversarial robustness of the model, along with formal derivation and mathematical modeling of the ensemble defense’s optimization function.

Adversarial training involves augmenting the training data with adversarial examples x adv, which are generated by perturbing the input data to maximize the model’s loss. This forces the model to learn robust features that are less sensitive to small perturbations.so as a contribution to robustness The model learns to be robust against worst-case perturbations. The adversarial training objective function can be written as:

$$\:\underset{{\uptheta\:}{E}_{\left(x,y\right)\sim\:\mathbb{D}}}{\text{min}}\left[\underset{\left|{\updelta\:}\right|\le\:{\epsilon}}{\text{max}}\mathcal{L}\left({f}_{{\uptheta\:}}\left(x+{\updelta\:}\right),y\right)\right]$$

where:

f_θ is the model with parameters θ,δ is the adversarial perturbation,ϵ is the perturbation bound, L is the loss function (e.g., cross-entropy loss).The min-max optimization ensures the model is robust to worst-case perturbations.

Gaussian augmentation adds random Gaussian noise $\:\:\eta\:\:\:$to the input data during training. This helps the model generalize better and become more robust to small input variations.

For a given input x, the augmented input x ~ is:$\:\stackrel{\sim}{x}=x+\eta\:,\hspace{1em}\eta\:\sim\:\mathcal{N}\left(0,{\sigma\:}^{2}I\right)$

where σ controls the noise level,$\:\eta\:\:is\:the$ Gaussian noise with variance σ₂.the training objective becomes:

$$\:\underset{\theta\:}{\text{min}}{E}_{\left(x,y\right)\sim\:D,\eta\:\sim\:\mathbb{N}\left(0,{\sigma\:}^{2}I\right)}\left[L\left({f}_{\theta\:}\left(x+\eta\:\right),y\right)\right]$$

The overall goal of this is to find the optimal model parameters θ that minimize the average loss over the data samples (x, y) that are drawn from distribution D, considering the perturbations introduced by Gaussian noise $\:\eta\:$. This encourages the model to learn features that are invariant to small noise.so as contribution to robustness it Improves generalization and robustness to small input variations.

Label smoothing replaces hard labels (e.g., one-hot encoded vectors) with smoothed labels, which distribute some probability mass to other classes. This reduces overconfidence and improves generalization.

For a label y (one-hot encoded), the smoothed label y~ is given by:

$$\:\stackrel{\sim}{y}=\left(1-\alpha\:\right)y+\frac{\alpha\:}{K}$$

where α is the smoothing parameter and K is the number of classes. The loss function becomes:

L(f θ(x), y~) =$\:{\underset{\theta\:}{\text{m}\text{i}\text{n}}\:E}_{\left(x,y\right)\sim\:\mathbb{D}}\left[\mathcal{L}\left({f}_{{{\uptheta\:}}_{2}}\left(x\right),\stackrel{\sim}{y}\right)\right]$

a general form of the optimization function that incorporates label smoothing:

$$\:L\left(\theta\:\right)=-\frac{1}{N}{\sum\:}_{i=1}^{N}{\sum\:}_{j=1}^{K}\stackrel{\sim}{{y}_{ij}}\text{log}\left({p}_{ij}\right)$$

Where: L(θ) is the loss function with label smoothing represents the parameters of the IDS model. N is the number of samples. K is the number of classes.$\:\stackrel{\sim}{{y}_{ij}}$ is the smoothed label for the i-th sample and the j-th class. $\:{p}_{ij}$is the predicted probability for the i-th sample and the j-th class.

By optimizing this loss function with label smoothing, the IDS model can become more robust to adversarial perturbations, as it reduces the model’s confidence on any single class and encourages it to be more aware of the other classes. This reduces the model’s sensitivity to adversarial examples by preventing overfitting hard labels.so as a contribution to robustness it Reduces overconfidence and improves generalization, making the model less sensitive to adversarial examples, and hence encourages the model to learn more generalized decision boundaries.

A denoising autoencoder (DAE) is trained to reconstruct clean inputs from perturbed inputs (with adversarial examples) at the testing time, the optimization function will focus on minimizing the reconstruction error between the clean input x and the reconstructed output g_ϕ(x_adv), where x_adv is the adversarial example.

Let g_ϕ be the DAE with parameters ϕ. The DAE is trained to minimize the reconstruction error between the clean input x and the reconstructed output g_ϕ(x _adv) according to the following optimization function:

$$\:\underset{{\upvarphi\:}{E}_{\left(x,y\right)\sim\:\mathbb{D}}}{\text{min}}\left[|{g}_{{\upvarphi\:}}\left(x{+x}_{\text{adv\:}}\right)-x{|}^{2}\right]$$

Where x is the clean input, x_adv = x + δ is the adversarial example, with δ being the adversarial perturbation bounded by ϵ, G_ϕ is the DAE with parameters ϕ.∥g_ϕ(x_adv) − x∥² is the reconstruction error (e.g., mean squared error).The reconstruction error is measured using the squared L2 norm (mean squared error).The optimization function takes the expectation over the dataset D, ensuring that the DAE generalizes well to all inputs.so as contribution to robustness it Removes adversarial perturbations from the input, and hence improving robustness at testing time.

At testing time, the input x is preprocessed by the DAE:

$$\:x\sim={g}_{{\upvarphi\:}}\left(x\right)$$

which reduces the effect of adversarial perturbations before the input is passed to the model.

Ensemble defense optimization function

The ensemble defense model combines the strengths of all four mechanisms into a unified framework, ensuring the model is resilient to adversarial attacks in IDS.

The overall model optimization function can be modeled as:

For the training phase:

$$\:\underset{{{\uptheta\:}}_{1},{{\uptheta\:}}_{2},{{\uptheta\:}}_{3}}{\text{min}}\left[{{\lambda\:}_{1}E}_{\left(x,y\right)\sim\:\mathbb{D}}\left[\underset{\left|{\updelta\:}\right|\le\:{\epsilon}}{\text{max}}\mathcal{L}\left({f}_{{{\uptheta\:}}_{1}}\left(x+{\updelta\:}\right),\stackrel{\sim}{y}\right)\right]+{\lambda\:}_{2}{E}_{\left(x,y\right)\sim\:\mathbb{D}}\left[\mathcal{L}\left({f}_{{{\uptheta\:}}_{2}}\left(x\right),\stackrel{\sim}{y}\right)\right]+{{\lambda\:}_{3}E}_{\left(x,y\right)\sim\:\mathbb{D},{\upeta\:}\sim\:\mathbb{N}\left(0,{{\upsigma\:}}^{2}I\right)}\left[\mathcal{L}\left({f}_{{{\uptheta\:}}_{3}}\left(x+{\upeta\:}\right),\stackrel{\sim}{y}\right)\right]\right]$$

The weighting hyperparameters (λ₁,λ₂,λ₃) are used to balance the contributions of each term. The use of y_{smooth (}$\:\stackrel{\sim}{y})$ ensures the model learns smoother decision boundaries and reduces overconfidence during the entire training phase.

For the testing phase:

At testing time, the input is preprocessed by the DAE $\:{\varvec{g}}_{\varvec{\varphi\:}}$ and given by the following:

$$\:\stackrel{\sim}{\varvec{x}}={\varvec{g}}_{\varvec{\varphi\:}}\left(\varvec{x}\right)$$

The model obtains the prediction from each model as:

$$\:\widehat{{\varvec{y}}^{1}}={\varvec{f}}_{{\varvec{\uptheta\:}}_{1}}\left({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})}\right)\:,\hspace{1em}\widehat{{\varvec{y}}^{2}}={\varvec{f}}_{{\varvec{\uptheta\:}}_{2}}\left({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})}\right)\:,\hspace{1em}\widehat{{\varvec{y}}^{3}}={\varvec{f}}_{{\varvec{\uptheta\:}}_{3}}\left({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})}\right)\:,\:\widehat{{\varvec{y}}^{4}}={\varvec{f}}_{{\varvec{\uptheta\:}}_{4}}\left(\stackrel{\sim}{\varvec{x}}\right)$$

The aggregation of the predictions using weighted averaging combination method is modeled as:

$$\:\widehat{{\varvec{y}}_{\text{final}}}={\varvec{w}}_{1}{\varvec{f}}_{{\varvec{\uptheta\:}}_{1}}\left({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})}\right)+{\varvec{w}}_{2}{\varvec{f}}_{{\varvec{\uptheta\:}}_{2}}({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})}+{\varvec{w}}_{3}{\varvec{f}}_{{\varvec{\uptheta\:}}_{3}}({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})})+{\varvec{w}}_{4}{\varvec{f}}_{{\varvec{\uptheta\:}}_{4}}(\stackrel{\sim}{\varvec{x}})$$

$$\:\widehat{{\varvec{y}}_{\text{final}}}={\varvec{w}}_{1}\widehat{{\varvec{y}}^{1}}+{\varvec{w}}_{2}\:\widehat{{\varvec{y}}^{2}}\:+{\varvec{w}}_{3}\widehat{{\varvec{y}}^{3}}+{\varvec{w}}_{4}\widehat{{\varvec{y}}^{4}}$$

the aggregation of the predictions using majority voting method is formulated as:

$$\:\widehat{y\text{final}}=\text{mode}\left(\left({\varvec{f}}_{{\varvec{\uptheta\:}}_{1}}\right({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})}),{\varvec{f}}_{{\varvec{\uptheta\:}}_{2}}({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})},{\varvec{f}}_{{\varvec{\uptheta\:}}_{3}}\left({\varvec{x}}_{\varvec{a}\varvec{d}\varvec{v})}\right),{\varvec{f}}_{{\varvec{\uptheta\:}}_{4}}\left(\stackrel{\sim}{\varvec{x}}\right)\right)$$

Where g_ϕ: Denoising autoencoder that reconstructs the input x + x_adv,$\:\stackrel{\sim}{y}$ are Smoothed labels used during training. Adversarial perturbation is bounded by ϵ,η is the Gaussian noise added for augmentation.

Synergy between defense mechanisms

To elucidate the synergy between the defense mechanisms and their complementary strengths within the proposed ensemble defense framework, we outline the roles and contributions of each component:

Adversarial training: This mechanism directly enhances the model’s robustness by training it on adversarial examples, ensuring that the model can effectively handle worst-case perturbations and maintain performance under adversarial conditions.
Gaussian augmentation: By exposing the model to random noise during training, Gaussian augmentation improves the model’s generalization capabilities. It complements adversarial training by addressing small, non-adversarial perturbations that may otherwise degrade performance.
Label smoothing: This technique reduces the model’s overconfidence and improves its calibration by encouraging smoother decision boundaries. As a result, it becomes more challenging for adversarial examples to exploit sharp decision boundaries, thereby enhancing the model’s resilience to adversarial attacks.
Denoising autoencoders: Acting as a preprocessing step, denoising autoencoders removes noise and adversarial perturbations from input data. This enhances the effectiveness of adversarial training and Gaussian augmentation by ensuring that the input data fed into the model is cleansed of adversarial noise, thereby improving overall robustness.

Together, these defense mechanisms form a cohesive ensemble that leverages their complementary strengths to significantly enhance the robustness and performance of the intrusion detection system (IDS) against adversarial threats.

Platform and tools

We use the Jupiter notebook hosted in Google Collaboratory²¹ to develop our deep learning source code in Python. Google Collaboratory provides an interactive environment for writing and running code in Python and other languages. In addition to being cloud-hosted and equipped with multiple deep learning frameworks and libraries pre-installed to speed up the process of creating machine learning models, Collaboratory also provides powerful GPU features. All the models mentioned earlier were implemented in the TensorFlow and Keres framework. Multiple experiments were performed to evaluate the effectiveness of the proposed defense algorithm. It was examined on the Google Colab platform on an Intel(R) Core (TM) i7- 4600(U) CPU 2.70 GHz personal computer with 8GB RAM. Adversarial examples are produced using the open-source IBM Adversarial Robustness Toolbox (ART) framework²².

Network traffic datasets

To demonstrate the robustness, generalization capability, and scalability of proposed defense mechanisms, the experimental evaluation considered in our paper is carried out on two recent public network traffic datasets, CIC-IDS2017 [36] and CIC-IDS 2018. CIC-IDS2017 is chosen because it is a good representation of large modern network environments. On the other hand, CIC-IDS 2018 is More realistic due to larger scale, encrypted traffic, and modern attacks. Table 1. lists the key differences between the two datasets:

Table 1 Key differences between CIC-IDS2017 and CIC-IDS2018 datasets.

Full size table

Preprocessing the datasets

For training and evaluating our model, we use the CIC-IDS-2017,CIC-IDS-2018 datasets produced by the Canadian Institute of Cyber Security by Lash Kari et al.^23,31 respectively. CIC-IDS-2017consists of labeled normal and attack traffic data (flow)samples of 15 classes with 79 features, whereas CIC-IDS-2018 consists of labeled normal and attack traffic data (flow)samples of 8 classes with similar 79 features. Various preprocessing steps were performed on the datasets. Preprocessing steps include data cleaning. Encoding categorical features and label encoding, feature standardization (scaling) to limit the values between 0,1, and normalization to remove outliers and enhance performance. To balance the entire dataset before splitting, the data sampling is adopted by selecting 50% only of each class member except those containing less than 2000 instances they are taken entirely.

To ensure that the functionality of the network communication channel is preserved while handling the traffic flow samples by the IDS, the non-functional features that do not matter the most are selected for training and testing purposes.in this context, we use the Extremely Randomized Trees (ERTs) classifier to select the most important (functional)features and exclude them. Thus, only data samples with 58 features will be used for training and testing the IDS model. The data flow diagram (DFD) for the entire preprocessing stage is exhibited in Fig. 2. Figure 3. depicts the 20 most prominent features that will be removed.

Adversarial examples generation

Several experiments have been conducted to generate diverse adversarial samples, encompassing a range of state-of-the-art adversarial attack methodologies, including the Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Deep Fool, Jacobian-based Saliency Map Attack (JSMA), Projected Gradient Descent (PGD), and Carlini & Wagner (C&W) attacks. Table 2 lists the key attack parameters used.

Table 2 The key parameters of the generated adversarial attacks.

Full size table

For the generation of attack instances, we consider as a substitute model a fully connected feed-forward neural network classifier with an input layer of dimension 58 to accept input samples with only the non-functional features to be perturbed followed by two layers of 100 neurons each. The output layer contains 15 neurons, which correspond to one of 15 possible label classes. We employ a learning rate of 0.01 within 30 training epochs,256 mini-batch size, and categorical cross-entropy loss function. We obtain the same detection ability as the IDS baseline classifier on the clean data samples, with an average F1 score of 0.98.

We implement a feed-forward deep neural network (DNN) based IDs model as a baseline model to evaluate the generated adversarial samples effect before and after applying the adversarial ensemble defense. From the training dataset, we randomly select 40,000 samples (10,000 for each attack) of which 20,000 represent regular traffic and the remaining represent intrusions. From the testing data set we select 20.000 samples (5000 for each attack) The size of perturbation is ranged from 0.1 to 0.3 for keeping the attributes of the original traffic samples, with the mean squared error norm (p = 2) serving as the measure metric. The baseline IDS classifier would incorrectly identify the normal as intrusions if perturbations were added, and vice versa.

Defense strategies

We begin by evaluating the robustness of the baseline deep learning-based intrusion detection system (IDS) model against the generated adversarial examples without implementing any defense mechanisms. This initial assessment serves as a benchmark to quantify the model’s vulnerability to adversarial attacks in its default state. Subsequently, we examine the impact of integrating multiple defense strategies—label smoothing, Gaussian augmentation and adversarial training —during the training phase to enhance the classifier’s robustness against adversarial samples. Notably, the application of these defense mechanisms, particularly when trained with smoothed labels, results in a significant improvement in the classifier’s accuracy and resilience. This demonstrates the effectiveness of the proposed defense strategies in mitigating adversarial threats and improving the overall performance of the IDS model.

As an additional defense strategy implemented during the testing phase, we propose the use of a Denoising Autoencoder (DAE) to serve as a preprocessing framework. This framework is designed to purify or denoise the input traffic flow samples by removing adversarial perturbations before they are fed into the intrusion detection system (IDS) classifier. As outlined in Algorithm 5, the denoising process involves training the DAE to reconstruct clean samples from perturbed inputs. Specifically, the DAE is trained on a combination of clean and adversarial samples, enabling it to learn robust representations that effectively filter out adversarial noise. Once trained, the DAE acts as a denoiser, processing perturbed test samples and producing denoised outputs. The IDS classifier is then evaluated using these denoised samples to assess its robustness against adversarial attacks. Figure 4 illustrates the proposed ensemble defense framework, which integrates this denoising mechanism alongside other defense strategies. The complete ensemble defense algorithm is formalized in Algorithm 1, providing a comprehensive approach to enhancing the robustness of the IDS model.

As mentioned earlier, the ensemble defense strategy is adopted in two stages. At the training time to enhance the IDS classifier robustness. First, by applying the label smoothing defense by training the IDS classifier with a vector of smooth labels to prevent overfitting, stabilize the training process, and overcome over-confident prediction, as depicted in Algorithm 2. Second, by adopting the Gaussian augmentation defense as described in Algorithm 3, and finally, by using the adversarial training as illustrated in Algorithm 4.

As mentioned in²⁴ label smoothing y_ls replaces y_hot with a combination of the uniform distribution and the one-hot encoded label vector y_hot according to Eq. (4).

$$\:\mathbf{y}\_\mathbf{l}\mathbf{s}\:=\:(1\:-\:)\:\mathbf{*}\:\mathbf{y}\_\mathbf{h}\mathbf{o}\mathbf{t}\hspace{0.17em}+\hspace{0.17em}\:/\:\mathbf{K}\:$$

(4)

where α is a hyperparameter that controls the degree of smoothing and K is the number of label classes. The original one-hot encoded y_hot is obtained if α = 0, whereas we obtain the uniform distribution if α = 1.

Gaussian augmentation (GA) defense promotes the IDS classifier robustness against adversarial instances. It involves training the IDS classifier with dataset of the original data samples as well as newly created noisy samples (augmentation = True) with the same size as the train and test samples. As illustrated in Eq. (5)¹⁵

$$\:\mathbf{M}\mathbf{i}\mathbf{n}\:\:\mathbf{E}\:(\mathbf{x},\:\mathbf{y})\:\sim\:\:\mathbf{D}\:\mathbf{E}\:\mathbf{x}\:\sim\:\:\mathbf{N}\:(0,\:2)\:\mathbf{J}\:(,\:\mathbf{x}\hspace{0.17em}+\hspace{0.17em}\mathbf{x},\:\mathbf{y})\:$$

(5)

The symbol σ in Eq. (3) denotes the degree of unnoticeable disturbance brought about by GDA. It is a formal depiction of the training objective function for the NIDS model. The expected value over the training dataset is indicated by the expression “E (x, y) ~ D,” while the expectation over perturbations resulting from a normal distribution with a mean of 0 and variance of “σ²” is indicated by the term “EΔx ~ N(0,σ²).” Reducing the expectation is the goal. It guarantees that the predictions made by the model (‘p(y|x)’) roughly correspond to a Gaussian distribution with a variance of ‘σ²′ and a centering around the initial input ‘x’. Through the introduction of variability during training, this technique helps to improve the robustness of the model and its ability to handle perturbed input data.

Adversarial Training defense expands the training and testing dataset with the adversarial examples and then trains the IDS classifier with the generated samples.

Experimental results and evaluation

In the subsections that follow the evaluation metrics that assess the IDS classifier performance on CIC-IDS2017 under different settings are explained. Next, the DNN-based IDS architecture characteristics and the hyperparameters that control the learning process are clarified. Finally, a Comparison of Adversarial Resilience on deep learning-based IDS performance concerning adversarial free, adversarial threats, adversarial defenses, and ensemble defense is obtained. The experimental results and evaluation of the defense model on CIC-IDS2018 are included in the supplementary file.

Evaluation metric

To evaluate and interpret the impact of our defense strategies on the IDS classifier performance against the adversarial environment, different metrics are being measured. Among them is the Prediction Accuracy (AC): which refers to the Total number of correctly classified samples from benign and attack samples among all samples and given by Eq. (6):

$$\:\varvec{A}\varvec{c}\varvec{c}=\:\frac{\varvec{T}\varvec{P}+\varvec{T}\varvec{N}}{\varvec{T}\varvec{P}+\varvec{T}\varvec{N}+\varvec{F}\varvec{P}+\varvec{F}\varvec{N}}$$

(6)

Specificity (or Precision): refers to the number of true positives among samples classified as positive. As formulated in Eq. (7):

$$\:\varvec{P}\varvec{e}\varvec{r}\varvec{c}\varvec{i}\varvec{s}\varvec{i}\varvec{o}\varvec{n}=\:\frac{\varvec{T}\varvec{P}}{\varvec{T}\varvec{P}+\varvec{T}\varvec{N}}$$

(7)

Sensitivity (or Recall): known as the True Positive Rate (TPR), which is the number of True Positives (TP) among all actual positive samples. Made by Eq. (8):

$$\:\varvec{R}\varvec{e}\varvec{c}=\:\frac{\varvec{T}\varvec{P}}{\varvec{T}\varvec{P}+\varvec{F}\varvec{N}}$$

(8)

The harmonic mean of precision and recall is the F1-score, which is an effective measurement, formulated in Eq. (9):

$$\:\varvec{F}1-\varvec{s}\varvec{c}\varvec{o}\varvec{r}\varvec{e}=\:\frac{2\varvec{*}\varvec{p}\varvec{r}\varvec{e}\varvec{c}\varvec{i}\varvec{s}\varvec{i}\varvec{o}\varvec{n}\varvec{*}\varvec{r}\varvec{e}\varvec{c}}{\varvec{p}\varvec{r}\varvec{e}\varvec{c}\varvec{i}\varvec{s}\varvec{i}\varvec{o}\varvec{n}+\varvec{r}\varvec{e}\varvec{c}}$$

(9)

Architecture characteristics and learning setup

To evaluate the effectiveness of the proposed defense model, a series of experiments were conducted targeting the deep learning-based intrusion detection system (IDS) classifier. A feed-forward deep neural network (DNN) model was trained on a preprocessed clean training dataset. For the CIC-IDS2017 dataset, the training set initially comprised 79 traffic flow features, which were reduced to 58 features using the Extremely Randomized Trees (Extra Trees) classifier as a feature selection technique. This feature reduction step ensures that only the most relevant and discriminative features are retained for model training. The trained DNN-based IDS classifier serves as the baseline model, which is subsequently targeted by adversarial examples to assess its vulnerability. It is worth mentioning that we apply the same framework with the similar settings on the CIC-IDS2018 dataset for generalization capability. The architecture design and hyperparameters of the DNN-based IDS classifier are detailed in Table 3, providing a comprehensive overview of the model configuration used in this study.

Table 3 DNN-based IDS classifier architecture design and hyper- parameters.

Full size table

Comparisons and analysis

This section presents a comparative analysis of the adversarial resilience of deep learning-based intrusion detection systems (IDS) when equipped with various adversarial defense mechanisms. The study evaluates the effectiveness of individual and ensemble defense strategies, including adversarial training, Gaussian augmentation, label smoothing, and denoising autoencoders, in mitigating the impact of adversarial attacks. The performance of the IDS model is assessed under both clean and adversarial conditions, with a focus on key metrics such as accuracy, precision, recall, and f1-score, robustness, and resilience to adversarial perturbations. The results highlight the relative strengths and limitations of each defense mechanism, as well as the synergistic benefits of combining them into a unified defense framework using majority voting and weighted average aggregation. This comparison provides valuable insights into the trade-offs and effectiveness of different adversarial defense strategies in enhancing the robustness of deep learning-based IDS models.

Comparison of adversarial resilience in deep learning-based intrusion detection system with respect to adversarial defense mechanisms

In the following subsections, we analyze and compare the performance of the deep neural network (DNN)-based intrusion detection system (IDS) detector under three distinct scenarios: (1) adversarial-free conditions, (2) adversarial attack conditions, and (3) ablation study to evaluate the effectiveness of individual defense techniques and their combined impact. The ablation study involves training and testing the IDS model with each defense mechanism applied independently. The performance of these individual defense strategies is then compared against the ensemble approach, which integrates all defense mechanisms using majority voting and weighted average aggregation. This comprehensive evaluation aims to quantify the relative contributions of each defense technique and demonstrate the synergistic benefits of combining them into a unified defense framework. We evaluate the proposed defense mechanism over the CIC-IDS2017,CIC-IDS2018.The evaluation results obtained from CIC-IDS 2018 are included in supplementaryS1 through S6.

A.
DNN-based IDS classifier performance under the adversarial free setting:

We begin by evaluating the DNN-based IDS detector performance on clean data samples before attacking. Table 4. shows the DNN-based IDS model performance under the adversarial free setting. We observe that the target DNN-based IDS classifier achieves outstanding classification metrics values on clean data samples. Figure 5. Illustrates its performance.

Table 4 DNN-based IDS detector performance under adversarial free settings.

Full size table

B. Adversarial samples’ impact on DNN-based IDS performance

Experimental results demonstrate that the efficacy of the target deep neural network (DNN)-based intrusion detection system (IDS) classifier significantly degrades when adversarial samples are introduced into the input data. The initial accuracy of the DNN-based IDS classifier, achieved under adversarial-free conditions, is illustrated in Fig. 5. However, when the model is evaluated using the generated adversarial instances, a substantial decline in prediction accuracy is observed. Specifically, the accuracy drops to 54% under Fast Gradient Sign Method (FGSM) adversarial samples, 53% under DeepFool adversarial samples, 81% under Jacobian-based Saliency Map Attack (JSMA) adversarial samples, 45% under Basic Iterative Method (BIM) adversarial samples, and 36% under Carlini & Wagner (C&W) adversarial samples. These results are summarized in Table 5 and visually represented in Fig. 6, highlighting the vulnerability of the IDS model to various adversarial attack strategies.

C. Ablation study

As mentioned above; to evaluate the effectiveness of each defense technique and their combined impact, we conduct an ablation study. The study involves training and testing the IDS model with each defense technique individually and comparing their performance against the ensemble approach.

Performance of individual techniques:

Label smoothing defense impact on DNN based IDS performance:

Results listed in Table 6 demonstrate the prediction accuracy of the IDS classifier performance against the tested adversarial attacks improved to almost 82.9When applying the label smoothing defense. This improvement verified that applying label smoothing on training data can reduce overconfidence, prevent overfitting hard labels and hence improve the generalization.

Gaussian augmentation defense impact on DNN based IDS performance:

experiments verified that fitting the ids classifier with gaussian noise augmented training data and the Cross ponding smooth labels can lead to overall improvement to the prediction accuracy against all the tested adversarial attacks and to almost 80.66 when evaluated against the PGD attack as shown in Table 6. Additinally as indicated by Table 7 all the performance measures are improved when applying the gaussian augmentation defense at the training time which enhance the model generalization and make it more robust against noise variations.

Adversarial training defense impact on DNN based IDS performance:

as depicted in Tables 6 and 7 Further improvement in the model performance is achieved by training the IDS model with adversarial examples and the corresponding smooth labels.This is reflected particularly in improvement in accuracy to 80.75% which verify that the model robustness against worst case adversarial perturbations is enhanced and becomes less sensitive to it.

Denoising autoencoder defense impact on DNN based IDS performance:

When evaluating the ids performance on the denoised version of data obtained by using the sparse denoising autoencoder at the testing time the model showed further improvements in robustness against the selected adversarial attacks and security metrics by reconstructing clean inputs from noisy or perturbed inputs. Tables 6 and 7 lists the IDS performance results on the preprocessed input data.

Experiments verified that the ensemble model prediction accuracy against the embedded adversarial examples within the input samples has significantly improved. After using DAE to purify the input samples. The prediction accuracy increased by approximately 83.85% when evaluating the denoised output with the DNN-based IDS model. Tables 6 and 7 lists the IDS performance results on the preprocessed input data.

Table 5 DNN-based IDS detector performance against the generated adversarial examples.

Full size table

Performance of the ensemble defense model:

Table 7 shows the DNN-based ID classifier accuracy after adopting the proposed defenses. We begin by analyzing the IDS classifier performance when applying the label smoothing defense followed by applying Gaussian augmentation and adversarial training defenses at the training time to improve the adversarial robustness of the IDS classifier. Additionally, at the testing time the perturbed input is denoised by the sparse denoising autoencoder, and the IDS classifier performance is measured against the denoised samples. Figure 7 clarifies the results more. Besides, to take full advantages of using an ensemble defense rather than relying on individual defense mechanisms to combat adversarial attacks the defense models predictions are aggregated using the majority voting and the weighted average combination methods and evaluated by the DNN based IDS classifier.to achieve the optimal performance of the IDS classifier when applying ensemble defense to mitigate against the adversarial threats the majority voting prediction is optimized with soft voting, and the weighted average prediction is optimized with Bayesian optimization. Table 8 summarizes the improvement in accuracy score obtained by the ensemble defense aggregation method.

Table 6 DNN-based IDS detector defense ability after applying the ensemble defense mechanisms.

Full size table

Table 7 IDS performance measures obtained by evaluating the defense mechanisms against the multisource adversarial samples.

Full size table

Table 8 Accuracy score of majority voting and weighted average aggregation methods and their optimized versions.

Full size table

Optimizing ensemble defense model aggregation:

implementing soft voting alongside majority voting in an ensemble defense model for an intrusion detection system (IDS) can significantly improve the ids performance in fighting the adversarial samples by leveraging both the confidence scores obtained by (soft voting) and the consensus of predictions obtained by (majority voting). Table 8. depicts the improvements achieved in the IDS accuracy score after applying the soft voting technique alongside the majority voting combination method on the ensemble defense model.

As another optimization direction, we use Bayesian optimization for optimizing the performance of the weighted average aggregation of an ensemble defense.model,.Bayesian optimization efficiently searches for optimal weights to maximize the accuracy of ensemble defense model.we run the optimization function with 50 iterations and selecting 100 initial random points. At each iteration we set up the optimizer with the scoring function and the parameter bounds that define the limits on the weights. The process continues until we find the optimal weights that maximize the performance of the weighted average ensemble defense model. Table 8. lists the optimized weighted average aggregation accuracy score obtained by IDS classifier against the tested adversarial attacks instances. The supplementary S7. illustrates the accuracy score obtained from both of CIC-IDS2017,CIC-IDS2018 datasets in all tested Scenarios.

- The Matthews Correlation Coefficient (MCC) is a robust metric for evaluating the performance of binary classification models, especially in cases of imbalanced datasets. It ranges from − 1 to + 1, where + 1 indicates perfect prediction,0 indicates random prediction or no better than random, and-1 indicates total disagreement between prediction and observation(total inverse prediction).

For adversarial robustness, MCC is particularly useful because it accounts for all four values in the confusion matrix (TP, FP, FN, TN) and provides a balanced measure even when the classes are imbalanced. If the IDS (Intrusion Detection System) model is a multiclass classification model, the Matthews Correlation Coefficient (MCC) can still be used, but it needs to be generalized to handle multiple classes. The multiclass MCC is an extension of the binary MCC and provides a single metric to evaluate the performance of a multiclass classifier. It accounts for all classes and is particularly useful for imbalanced datasets.

Since our primary goal is to evaluate the adversarial robustness of DNN based IDS model after adopting the ensemble defense methods we reduce the classification task into binary classification for simplicity. The formula for MCC is:

$$\:MCC=TP\cdot\:TN-FP\cdot\:FN/\sqrt{\left(TP+FP\right)\left(TP+FN\right)\left(TN+FP\right)\left(TN+FN\right)}$$

Table 9 Matthew’s correlation coefficient (MCC) for each defense method.

Full size table

Table 9 lists Matthews Correlation Coefficient (MCC) for each defense method, Label Smoothing Defense has the highest MCC (0.698), indicating the best adversarial robustness among the evaluated methods. Gaussian Augmentation Defense has the lowest MCC (0.596), indicating relatively low robustness. Adversarial Training Defense and DAE Defense show moderate robustness with MCC values of 0.608 and 0.640, respectively.

Discussion and comparison with related works

The proposed defense strategy fits into the category of adversarial detection by building up a robust IDS detector. The main aim of the study is to boost the IDS detector’s adversarial robustness through two major phases. The first phase encompasses multi-source adversarial retraining, gaussian noise augmentation, and label smoothing for an improved training process. The second phase promotes the ID’s robust detection further by denoising the adversarial input samples before being fed into it. In contrast with other defense techniques in^13,14, and²⁶ that use the same CIC-IDS 2017dataset, massive adversarial retraining enhances the IDS detector robustness represented by robust detection accuracy but may degrade initial detector standard detection performance (standard accuracy) on clean data and unseen patterns of input samples. Moreover, selecting only the robust nonfunctional feature for adversarial generation also supports this interpretation and explains why the techniques mentioned above may produce higher detection performance.

Other criteria that may affect the detection performance in comparison with the above-mentioned techniques are the adversarial attack strength, type, settings especially the amount of perturbation, the dimension of the input samples to be perturbed, and the testing data size.

Despite the later differences mentioned with other techniques we still obtain a reasonably close detection accuracy that reaches 84.34%.,84.45% respectively by when majority voting and weighted average aggregation fusion rules are applied to our ensemble defense. The reason behind that may be because of denoising the tested input samples before being fed into the IDS detector which is considered an extra layer of defense at testing time.

Using an ensemble defense rather than relying on individual defense mechanisms to combat adversarial attacks in Intrusion Detection Systems (IDS) offers several significant advantages. These advantages stem from the diversity, robustness, and complementary strengths of multiple defense strategies working together.key benefits of this approach are (1) enhanced robustness against adversarial attacks due to:

Diverse defense mechanisms: An ensemble combines multiple defense strategies (e.g., adversarial training, Gaussian augmentation, label smoothing, and denoising autoencoders), each targeting different vulnerabilities in the model. This diversity makes it harder for adversaries to craft perturbations that can bypass all defenses simultaneously.
Reduced single point of failure: Individual defenses may have specific weaknesses that adversaries can exploit. An ensemble mitigates this risk by ensuring that even if one defense fails, others can still detect or neutralize the attack.

and (2)improved detection accuracy due to Complementary Strengths because different defense mechanisms excel at detecting and mitigating the effect of different types of adversarial perturbations. For example: adversarial training improves robustness against known attack patterns, denoising autoencoders can filter out noise and perturbations in input data, and label smoothing reduces overconfidence in predictions, making the model less sensitive to hard labels.

To the best of our knowledge, this work introduces the first adversarial ensemble defense strategy designed for multi-class classification in Intrusion Detection System (IDS) detectors, utilizing the CIC-IDS2017 and CIC-IDS2018 benchmark datasets. Unlike conventional approaches that simplify the task to binary classification, our method preserves the granularity of attack types, offering a more intuitive and practical solution for real-world intrusion detection scenarios. The effectiveness of our model is demonstrated in Table 10, which provides a comparative analysis against state-of-the-art models on the CIC-IDS2017 dataset, highlighting the advancements achieved by our approach.

Table 10 Comparing different models with our model.

Full size table

Conclusion and future work

In this work, we investigate the impact of an adversarial ensemble defense on enhancing the robustness of deep neural network (DNN)-based intrusion detection systems (IDS) against multi-source adversarial attacks. The proposed ensemble defense integrates multiple adversarial mitigation strategies, including label smoothing, Gaussian augmentation, and adversarial training with smooth labels during the training phase. During the testing phase, a deep sparse autoencoder is employed to denoise adversarial samples, serving as an additional layer of defense to purify input samples from adversarial perturbations before they are fed into the DNN-based IDS classifier for evaluation. This preprocessing step ensures that the input data is cleansed of adversarial noise, thereby improving the system’s resilience. The proposed ensemble defense not only significantly enhances the robustness and prediction accuracy of the IDS in resisting adversarial attacks but also maintains high accuracy on clean data samples. Furthermore, it leverages the strengths of ensemble learning, which combines the diversity, robustness, and complementary capabilities of multiple defense mechanisms to achieve superior performance compared to individual defense strategies. A key feature of the proposed ensemble defense is its ability to preserve the functionality of traffic features through robust feature selection, ensuring that the integrity of the data is maintained throughout the detection process. Future research directions include testing the model under various adversarial attack settings, evaluating it with different machine learning (ML) and DNN-based IDS architectures, and validating its performance on other benchmark IDS datasets. Additionally, we aim to explore emerging adversarial attack strategies, investigate new mitigation techniques to further enhance the resilience of ML-based IDS models, and develop advanced methods for robust feature selection.

Data availability

The Data used within this manuscript is obtained CICIDS 2017 Intrusion Detection Evaluation Dataset by the Canadian Institute for Cybersecurity available at https://www.unb.ca/cic/datasets/ids-2017.html, and the CICIDS2018 Dataset available at: https://www.unb.ca/cic/datasets/ids-2018.html.

Abbreviations

Acc:: Accuracy
BIM:: Basic Iterative Method
CNN:: Convolutional Neural Network
DT:: Decision Tree
DA:: Deep Autoencoder
DL:: Deep Learning
DNN:: Deep Neural Network
DF:: Deepfool
DDoS:: Distributed Denial of service
ERTs:: Extremely Randomized Trees
FN:: False Negative
FP:: False Positive
FPR:: False Positive Rate
FGSM:: Fast Gradient Sign Method
GDA:: Gaussian Data Augmentation
GAN:: Generative Adversarial Networks
GD:: Gradient Descent
HTTP:: Hyper Text Transfer Protocol
ICMP:: Internet Control Message Protocol
CIC-IDS:: Intrusion Detection Evaluation Dataset by the Canadian Institute for Cybersecurity
IDS:: Intrusion detection systems
JSMA:: Jacobian-based Saliency Map Attack
LightGBM:: Light Gradient-Boosting Machine
LR:: Logistic Regression
LSTM:: Long-Short Term Memory
ML:: Machine Learning
MSE:: Mean Squared Error
MLP:: Multi-Layer Perceptron |
RFs:: Random Forests
Rec:: Recall
SDN:: Software Defined Networks
SS:: Standard Scaler
SVM:: Support Vector Machine
TCP:: Transmission Control Protocol
TN:: True Negative
TP:: True Positive
TPR:: True Positive Rate
UDP:: User Datagram Protocol

References

Ahmed, M., Mahmood, A. N. & Islam, M. R. A survey of anomaly detection techniques in the financial domain. Future Generation Comput. Syst.55, 278–288. https://doi.org/10.1016/j.future.2015.01.001 (2016).
Article Google Scholar
Javaid, M., Haleem, A., Singh, P., Suman, R., Rab, S. & R., and Significance of machine learning in healthcare: features, pillars and applications. Int. J. Intell. Networks. 3, 58–73. https://doi.org/10.1016/j.ijin.2022.05.002 (2022).
Article Google Scholar
Roshan, K. & Zafar, A. Deep learning approaches for anomaly and intrusion detection in computer networks: A review, in Lecture Notes on Data Engineering and Communications Technologies, vol. 73, pp. 551–563. (2022).
Tidjon, L. N., Frappier, M. & Mammar, A. Intrusion detection systems: A Cross-Domain overview. IEEE Commun. Surveys& Tutorials, 21, 4, pp. 3639–3681, Fourth quarter 2019.
Goodfellow, I., McDaniel, P. & Papernot, N. Making machine learning robust against adversarial inputs. Commun. ACM. 61 (June), 7 (2018).
Google Scholar
Yuan, X., He, P., Zhu, Q. & Li, X. Adversarial Examples: Attacks and Defenses for Deep Learning, in IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 9, pp. 2805–2824, Sept. (2019).
Pawlicki, M., Chora´s, M. & Kozik, R. Defending network intrusion detection systems against adversarial evasion attacks. Future Generation Comput. Syst.110, 148–154. https://doi.org/10.1016/j.future.2020.04.013 (2020).
Article Google Scholar
Zhang, C., Costa-Perez, X. & Patras, P. Adversarial attacks against deep learning-based network intrusion detection systems and defense mechanisms. IEEE/ACM Trans. Networking. 30 (3), 1294–1131. https://doi.org/10.1109/TNET.2021.3137084 (2022).
Article Google Scholar
Ian, J., Goodfellow, J., Shlens & Szegedy, C. Explaining and Harnessing Adversarial Examples [Online]. (2014). Available: https://arxiv.org/abs/1412.6572
Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial Machine Learning at Scale. ICLR, (2017), ArXiv. https://arxiv.org/abs/1611.01236.
Moosavi-Dezfooli, S. M., Fawzi, A. & Frossard, P. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. (2016).
Papernot, N. et al. The limitations of deep learning in adversarial settings, in 2016 IEEE European Symposium on Security and Privacy (EuroS P’19), pp. 372–387, Saarbruecken, Germany, 2016.
Ms Khushnaseeb Roshan, Aasim Zafar, Boosting robustness of network intrusion detection systems: A novel two phase defense strategy against untargeted white-box optimization adversarial attack, Expert Systems with Applications, Volume 249, Part A, 2024, 123567, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2024.123567.
Alper Sarıkaya, M. & Demirci, R. A. I. D. S. Robust autoencoder-based intrusion detection system model against adversarial attacks, Computers Security, Volume135,2023,103483, ISSN01674048. https://doi.org/10.1016/j.cose.2023.103483
Matheus, P., Novaes, Luiz, F., Carvalho, J., Lloret, M. L. & Proença Adversarial Deep Learning approach detection and defense against DDoS attacks in SDN environments, Future Generation Computer Systems, Volume 125, Pages 156–167. (2021).
Wang, J., Pan, J., AlQerm, I. & Liu, Y. Def-IDS: An Ensemble Defense Mechanism Against Adversarial Attacks for Deep Learning-based Network Intrusion Detection, 2021 International Conference on Computer Communications and Networks (ICCCN), Athens, Greece, pp. 1–9, (2021). https://doi.org/10.1109/ICCCN52240.2021.9522215
Zhong, Y. et al. An adversarial learning model for intrusion detection in real complex network environments, in International Conference on Wireless Algorithms, Systems, and Applications. Springer, pp. 794–806. (2020).
Abou Khamis, R., & Matrawy, A. (2020, October). Evaluation of adversarial training on different types of neural networks in deep learning-based idss. In 2020 international symposium on networks, computers and communications (ISNCC) (pp. 1-6). IEEE.
Usama, M. et al. \Generative adversarial networks for launching and thwarting adversarial attacks on network intrusion detection systems, in 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), p. 78{83, IEEE, (2019)
CIC-IDS Intrusion Detection Evaluation Dataset by Canadian Institute for Cybersecurity available at (2017). https://www.unb.ca/cic/datasets/ids-2017.html
Google. Google Collaboratory. Retrieved April 18, 2024, from (2024). https://colab.research.google.com/
Nicolae, M. I. et al. 2018. Adversarial Robustness Toolbox v1. 0.0. arXiv preprint arXiv:1807.01069.
Houda Jmila, M. I. & Khedher Adversarial machine learning for network intrusion detection: A comparative study, computer networks, 214, (2022). https://doi.org/10.1016/j.comnet.2022.109073
Müller, R., Kornblith, S. & Hinton, G. When Does Label Smoothing Help? NeurIPS 2019. (2019).
Islam Debicha, R. et al. Transfer learning-based multi-adversarial detection of evasion attacks against network intrusion detection systems. Future Generation Comput. Syst., 138,2023, Pages185197, ISSN0167739, https://doi.org/10.1016/j.future.2022.08.011
Roshan, K., Zafar, A. & Ul Haque, S. B. A novel deep learning-based model to defend network intrusion detection system against adversarial attacks. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 386–391. (2023).
Mayra Macias, C., Wu, W. & Fuertes Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems. Expert Syst. Appl., 238, Part E,2024,122223,0957–4174, https://doi.org/10.1016/j.eswa.2023.122223
Prasad, A. & Chandra, S. VMFCVD: an optimized framework to combat volumetric DDoS attacks using machine learning. Arab. J. Sci. Eng.47 (8), 9965–9983. https://doi.org/10.1007/s13369-021-06484-9 (2022).
Article PubMed PubMed Central Google Scholar
Prasad, A. & Chandra, S. BotDefender: A collaborative defense framework against botnet attacks using network traffic analysis and machine learning. Arab. J. Sci. Eng.49 (4), 3313–3329. https://doi.org/10.1007/s13369-023-08016-z (2024).
Article Google Scholar
Carlini, N. & Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP) (pp. 39–57). IEEE. (2017). https://doi.org/10.1109/SP.2017.49
CSE-CIC-IDS2018 Dataset. [Online]. Available: https://www.unb.ca/cic/datasets/ids-2018.html

Download references

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

This research did not receive any specific grant from funding agencies in the public, commercial, or non-profit sectors.

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, 35516, Egypt
Zeinab Awad, Magdy Zakaria & Rasha Hassan
Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
Zeinab Awad

Authors

Zeinab Awad
View author publications
Search author on:PubMed Google Scholar
Magdy Zakaria
View author publications
Search author on:PubMed Google Scholar
Rasha Hassan
View author publications
Search author on:PubMed Google Scholar

Contributions

Zeinab Awad designed the study, performed the computations, interpreted data, and wrote the manuscript. Magdy Zakaria and Rasha Hassan Sakr encouraged Zeinab Awad1 to investigate the problem of the study, supervise the study, and contribute to the final manuscript.

Corresponding author

Correspondence to Zeinab Awad.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Awad, Z., Zakaria, M. & Hassan, R. An enhanced ensemble defense framework for boosting adversarial robustness of intrusion detection systems. Sci Rep 15, 14177 (2025). https://doi.org/10.1038/s41598-025-94023-z

Download citation

Received: 23 December 2024
Accepted: 11 March 2025
Published: 23 April 2025
Version of record: 23 April 2025
DOI: https://doi.org/10.1038/s41598-025-94023-z

Keywords

This article is cited by

FORT-IDS: a federated, optimized, robust and trustworthy intrusion detection system for IIoT security
- Alanoud Al Mazroa
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

Universal attention guided adversarial defense using feature pyramid and non-local mechanisms

MultiNet-IDS: an ensemble of DRL and LSTM for improved intrusion detection in wireless sensor networks

Smart intrusion detection model to identify unknown attacks for improved road safety and management

Introduction

Background

Intrusion detection systems

Adversarial threat model

Attack strategies

Fast gradient sign method (FGSM)

Basic iterative method (BIM)

The projected gradient descent (PGD)

The deepfool (DF)

Jacobian-based saliency map attack (JSMA)

Carlini & Wagner (C&W) adversarial attack

Deep autoencoder

Related articles

The proposed approach

Problem formulation

Theoretical foundation for ensemble defense mechanisms

Ensemble defense optimization function

Synergy between defense mechanisms

Platform and tools

Preprocessing the datasets

Adversarial examples generation

Defense strategies

Experimental results and evaluation

Evaluation metric

Architecture characteristics and learning setup

Comparisons and analysis

Discussion and comparison with related works

Conclusion and future work

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

FORT-IDS: a federated, optimized, robust and trustworthy intrusion detection system for IIoT security

Search

Quick links