Enhancing SDN security with deep learning and F-balanced cross-entropy for DDoS detection

Naeen, Hossein Monshizadeh; Ghadamyari, Malihe; Barmar, Mobin

doi:10.1038/s41598-025-18826-w

Download PDF

Article
Open access
Published: 29 September 2025

Enhancing SDN security with deep learning and F-balanced cross-entropy for DDoS detection

Hossein Monshizadeh Naeen^1,2^na1,
Malihe Ghadamyari¹^na1 &
Mobin Barmar¹

Scientific Reports volume 15, Article number: 33419 (2025) Cite this article

523 Accesses
Metrics details

Subjects

Abstract

Software-Defined Networking (SDN) offers centralized control and programmability, transforming network management but also introducing vulnerabilities, particularly to Distributed Denial of Service (DDoS) attacks that can overwhelm the control plane and disrupt network functionality. Traditional DDoS detection methods, including rule-based systems and conventional machine learning models, often fall short in SDN due to high false-positive rates and limited adaptability to evolving network traffic. While recent deep learning approaches show promise, they continue to face challenges with real-time adaptability and scalability in SDN environments. In this study, we propose Attention-Enhanced Cross-Entropy (AECE), a novel Deep Neural Network (DNN)-based DDoS detection model that integrates attention mechanisms to prioritize critical features in network traffic data, allowing the model to focus on patterns indicative of DDoS attacks. A core innovation in AECE is the F-Balanced Cross-Entropy (FBCE) Loss function, which combines cross-entropy with an F1-score-based component to balance precision and recall, effectively reducing both false positives and false negatives. Additionally, AECE incorporates ReLU and GELU activations, batch normalization, dropout, and the Adamax optimizer to enhance learning stability and computational efficiency. Experimental results demonstrate that the proposed system achieves high detection accuracy, significantly outperforming existing DDoS detection methods and providing a robust, low-latency solution to safeguard SDN infrastructures against evolving DDoS threats.

A novel optimization-driven deep learning framework for the detection of DDoS attacks

Article Open access 14 November 2024

Metaparameter optimized hybrid deep learning model for next generation cybersecurity in software defined networking environment

Article Open access 23 April 2025

An entropy and machine learning based approach for DDoS attacks detection in software defined networks

Article Open access 06 August 2024

Introduction

Distributed Denial of Service (DDoS) attacks have become a major challenge in modern network environments, especially with the increasing adoption of Software-Defined Networking (SDN)^1,2. SDN, known for its centralized control and programmability, enables flexible and efficient network management but is vulnerable to attacks that can exploit this centralized architecture³. In particular, DDoS attacks can severely disrupt SDN by overwhelming the control plane, leading to significant performance degradation and potential network outages. Therefore, enhancing the security of SDN against such threats is essential for ensuring reliable network operations⁴.

Several traditional approaches have been employed to detect and mitigate DDoS attacks, including rule-based intrusion detection systems (IDS), statistical analysis, and machine learning (ML) techniques^5,6. However, rule-based systems struggle with the rapidly evolving nature of DDoS attacks, while conventional ML models often fail to generalize across diverse traffic patterns and attack types encountered in SDN environments. Furthermore, the unique characteristics of SDN traffic, such as the high frequency of OpenFlow protocol messages and the dynamic nature of flows, necessitate a detection approach tailored to SDN-specific challenges².

DL models, particularly deep neural networks (DNNs), have shown promise in handling complex data and detecting nuanced patterns in network traffic⁷. However, applying DNNs to SDN DDoS detection presents unique obstacles, including the need for high detection accuracy with low false positive rates, fast response times, and the ability to process large, diverse datasets typical of SDN traffic⁸. Existing DL solutions often lack the architecture and data handling methods required to achieve these objectives effectively within an SDN context.

In this paper, we introduce Attention-Enhanced Cross-Entropy (AECE), a DNN-based model specifically designed for DDoS detection in SDN environments. The main contributions of this paper include:

To address the challenge of high false positive rates, we introduce the F-Balanced Cross-Entropy (FBCE) Loss function, which combines binary cross-entropy with an F1-score-based component. This loss function is specifically designed to address class imbalance, and reduces both false positives and false negatives, essential for minimizing disruption in SDN networks caused by incorrect attack classifications.
Two attention layers are strategically integrated within the AECE architecture, enhancing the model’s ability to focus on the most informative features of network traffic data. This mechanism prioritizes critical traffic patterns, improving detection accuracy and reducing false positives by allowing the model to adapt to SDN’s complex data structure.
Presenting an optimized DNN architecture with effective regularization techniques, suitable activation functions, and the Adamax optimizer for accurate and efficient SDN traffic analysis.
Our model leverages an updated SDN-focused dataset with features specifically chosen to capture real-time SDN traffic characteristics, including packet rates, byte counts, protocol details, and flow-specific metrics.

This study introduces AECE as a robust, attention-enhanced DL model for DDoS detection in SDNs, with the FBCE Loss as a core contribution that enhances detection accuracy by balancing classification precision and recall. Our approach not only achieves high accuracy and low false positive rates but also demonstrates enhanced adaptability to the unique characteristics of SDN traffic, contributing to the robust defense of SDN infrastructures against evolving DDoS threats. To facilitate understanding of the terminology used throughout this paper, Table 1 summarizes the key abbreviations and their corresponding full terms.

Table 1 Table of Abbreviations.

Full size table

This paper continues by reviewing related literature in Sect. 2. Section 3 outlines the methodology, detailing dataset selection and preprocessing, model architecture, and the theoretical basis of the proposed model. In Sect. 4, the system implementation and evaluation are presented, focusing on the practical application of the model and its effectiveness. Finally, Sect. 5 concludes the paper and discusses future directions for research in this area.

Literature review

DDoS detection has been a critical focus in network security research due to the substantial damage such attacks can cause to both the network infrastructure and the services it supports.

Traditional DDoS detection approaches

Earlier DDoS detection efforts in SDN often adapted traditional intrusion detection systems (IDS), like Snort, by integrating them with OpenFlow, SDN’s foundational protocol for communication between the control and data planes. Systems such as SnortFlow⁹ combine Snort’s signature-based detection with OpenFlow’s programmability, enabling responsive network adjustments based on identified threats. However, this approach faces notable limitations, such as the time-intensive nature of matching traffic headers to signature databases, and frequent updates to the signature database are required to maintain relevance against evolving threats¹⁰. Such time costs and the reliance on known signatures limit SnortFlow’s scalability and effectiveness in real-time, high-throughput SDN environments, where ML and DL approaches now offer more adaptive solutions.

In recent research, the authors¹¹ emphasized the importance of maximizing intrusion detection efficiency in IoT networks using extreme learning machines, highlighting the evolving challenges in network security across different domains.Traditional detection methods, including rule-based IDS, statistical anomaly detection, and ML techniques, have been extensively studied for their effectiveness against DDoS attacks. However, these conventional approaches encounter limitations in adaptability and performance, particularly within SDN environments. SDN’s characteristics—centralized control, programmability, and decoupled control and data planes—amplify the challenges of DDoS detection and necessitate new methods capable of coping with high-dimensional data and dynamic network conditions^12,13.

Evolving research in DDoS detection

Machine Learning Approaches: ML methods have been widely explored in DDoS detection for SDN. Techniques like Support Vector Machines (SVM)¹⁴, k-Nearest Neighbors (KNN)^15,16, and decision trees¹⁷ are popular choices due to their adaptability in classifying network traffic. However, each method has unique limitations that hinder its scalability and effectiveness in complex SDN environments. For instance, SVM’s sensitivity to outliers and difficulty with high-dimensional data negatively affect its robustness under noisy SDN conditions. In contrast, KNN and decision trees exhibit high classification accuracy but struggle with computational efficiency, especially when applied to real-time DDoS detection tasks. These limitations reveal a pressing need for advanced approaches that can adapt to SDN’s dynamic environment without excessive computational overhead.
Deep Learning Approaches: Recent advancements have focused on DL models, given their capacity to handle high-dimensional data and recognize nuanced patterns in network traffic. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been used in DDoS detection, each with specific advantages and limitations. CNNs are effective in tasks with image-based data but have limited adaptability to sequential data like network traffic. RNNs, while better suited for sequence data, often require significant computational resources, reducing their feasibility for real-time SDN applications¹⁸. Studies have also explored Long Short-Term Memory (LSTM) networks³, which excel at managing sequence-based data but introduce high latency, potentially delaying response times in SDNs where rapid identification is essential^2,6. In a DL-based method¹⁹, the authors have leveraged a DNN to analyze traffic patterns in SDN, achieving high accuracy across datasets like CICIDS2018. While it demonstrates robustness, the approach lacks mechanisms to address imbalanced traffic and optimize precision-recall trade-offs, limiting its adaptability in real-time, diverse SDN scenarios.
Hybrid Approaches: Hybrid DL models also exist in the literature. For example, a hybrid CNN-LSTM model for DDoS detection in SDN-based industrial IoT environments was presented²⁰. By optimizing feature dimensions and employing a lightweight architecture, the model achieves a high accuracy on the CICDDoS2019 dataset. However, its focus on specific industrial IoT attack types and dependency on selected features limits generalization to diverse SDN scenarios. Hybrid approaches that combine DL with statistical features have also emerged. For example, entropy-based metrics integrated with ML models have been applied to capture the randomness in traffic, enabling early anomaly detection²¹. However, these methods typically demand high computational power and may introduce latency in high-throughput environments, posing challenges for real-time detection^15,22.

A recent effort by Zaidoun and Lachiri²³ introduced a hybrid DL architecture combining a DNN–LSTM structure with Transformer encoder layers for SDN-based DDoS detection. Their model benefits from LSTM’s ability to capture temporal dependencies and Transformers’ capacity for global feature contextualization, achieving promising results on SDN datasets. However, the architecture involves substantial computational overhead due to its sequential processing and attention-intensive design. Moreover, while effective in extracting temporal features, its generalization and efficiency under strict real-time constraints remain limited. In contrast, our proposed AECE framework offers a more computationally efficient architecture by integrating shallow attention layers selectively within a deep feed-forward network and combining them with a novel FBCE loss. This enables AECE to achieve higher detection performance while maintaining low latency, making it more suitable for real-time SDN deployments.

Data and loss function challenges in SDN security

The choice of dataset and features is another critical factor affecting detection performance. Much of the prior work still relies on legacy datasets like KDDCUP99¹ and NSL-KDD¹⁴, which, although foundational, contain outdated traffic patterns or contain features that do not reflect modern SDN dynamics. This misalignment can lead to models that perform well in controlled settings but fail with real-world SDN traffic. Our approach addresses this limitation by using a specialized SDN-specific dataset²⁴ containing features that accurately represent current SDN traffic patterns and protocol-specific characteristics.

Concerning loss functions, many existing models rely on standard binary cross-entropy, which optimizes general accuracy but lacks a mechanism to equally prioritize reducing false positives and false negatives—a crucial consideration in SDN environments where misclassifications can result in costly restrictions on legitimate traffic⁷. To address this, our work introduces the FBCE Loss, which incorporates an F1 component, enhancing the model’s balance between precision and recall. This tailored loss function reduces both false positives and negatives, providing significant advantages over traditional loss functions in critical SDN applications.

Though the Adam optimizer has been widely used in network-related ML tasks, only a few studies have explored the Adamax variant, which is particularly suitable for handling sparse gradients in high-dimensional data. Adamax enhances convergence stability and computational efficiency²⁵, making it ideal for the large, diverse datasets typical of SDN environments. Our approach demonstrates the specific advantages of using Adamax within SDN, enhancing the model’s stability and efficiency in DDoS detection tasks. Table 2 provides a summary of related works.

Table 2 Summary of related works.

Full size table

In summary, this study presents AECE, an optimized DNN-based detection model incorporating attention mechanisms, the SDN-specific FBCE Loss function, and tailored activation and optimization techniques to deliver high accuracy and real-time adaptability for DDoS detection. This work significantly contributes to DDoS detection research by addressing challenges in feature relevance, class imbalance, and model stability, thus providing a comprehensive solution for real-time SDN environments.

Methodology

The system model for DDoS detection is centrally located within the SDN controller, which provides a comprehensive view of network activity. By residing within the controller, the model leverages its centralized vantage point to monitor and analyze network-wide traffic data. This strategic placement enhances the model’s ability to quickly detect anomalies and trigger automated responses. The model processes incoming traffic data in real-time, extracting relevant features and classifying it as either benign or malicious. If a potential attack is identified, the controller can promptly update flow rules at the switch level, blocking or rerouting malicious traffic to safeguard network integrity. This integrated approach enables rapid, network-wide DDoS detection and mitigation, capitalizing on SDN’s inherent adaptability and centralized control.

The proposed AECE system is a DNN designed for DDoS attack detection in SDNs. AECE leverages attention mechanisms, batch normalization, and dropout to enhance feature extraction and classification accuracy. Additionally, the FBCE loss function is employed to optimize detection performance by minimizing false positives and false negatives, ensuring reliable classification in high-stakes SDN environments. The model’s architecture, combined with the FBCE loss function, effectively addresses the unique challenges of DDoS detection in SDNs. The flowchart of the proposed system is illustrated in Fig. 1 and the details are presented in the following sections.

Dataset selection

Selecting an appropriate dataset is crucial for effective DDoS detection. Traditional datasets, such as KDDCUP99^1,31 and NSL-KDD^4,14,15,32, are outdated and do not reflect the unique characteristics of modern SDN traffic, leading to issues in compatibility and reduced detection accuracy. CAIDA DDoS2007²⁹ and ISCX-IDS2012^2,3 have limited types of attacks and traffic protocols that do not reflect the diversity and complexity of modern networks. CICIDS2017^5,6,8,33 and CSE-CICIDS2018³⁴ have missing values, class imbalance and large file sizes that make them difficult to process and analyze. CTU-13^2,27 and UNSWNB-15¹ have different attack classes and scenarios that are not relevant to our research problem. To address these Limitations, this research employs a more recent and relevant SDN-specific dataset, containing 23 features directly relevant to attack detection within SDNs²⁴. Key features include packet and byte counts, which are indicators of abnormal traffic volume typical of DDoS attacks; port and protocol types, which are helpful in identifying attack patterns based on connection types; and flow metrics, such as packet rate and total data rate, which can detect deviations from normal traffic. These features align with the requirements of real-time SDN analysis, where timely and accurate classification of benign and malicious traffic flows is critical. Table 3 shows the description and type of each feature in this data set.

Table 3 Attributes of the records of the selected SDN-specific dataset.

Full size table

Data preprocessing

To ensure high-quality input data, we implemented the following preprocessing steps: Redundant or non-informative features, such as source and destination IP addresses, are removed to minimize potential biases (Feature Selection and Bias Removal); Categorical data, such as protocol names, are converted to numerical representations (e.g., TCP as 0, UDP as 1, ICMP as 2) to maintain compatibility with DNN input requirements (Encoding Non-Numerical Data); All features are normalized using $\:{Z}_{score}$ normalization, calculated with the dataset’s mean ($\:\mu\:$) and standard deviation ($\:\sigma\:$), ensuring consistency and improving model stability (Data Normalization):

$$\:{Z}_{score}=\frac{X-\mu\:}{\sigma\:}$$

(1)

Finally, the dataset was partitioned into training and testing sets to facilitate model training and evaluation (Training-Test Split). This preprocessing approach ensures data integrity, consistency, and suitability for effective DNN training and testing.

Model architecture

The AECE model is a sophisticated DNN architecture designed to enhance DDoS detection in SDN environments. It incorporates several advanced techniques, including attention mechanisms, batch normalization and dropout, to improve feature extraction and classification accuracy. The architecture consists of four hidden layers, with attention layers strategically placed after the first and third hidden layers to capture critical features. The first two layers, each with 50 neurons, perform initial feature extraction, followed by an attention layer that focuses on the most informative patterns in the data. The third layer, expanded to 100 neurons, handles more complex data relationships, with an attention layer refining and focusing the learned features. The final layer, with 50 neurons, consolidates the learned features before passing them to the output layer.

The attention mechanism in the AECE model is crucial for adaptive feature weighting, allowing the network to emphasize critical traffic features indicative of DDoS attacks while reducing the influence of less relevant data points. This selective focus helps capture nuanced patterns in SDN data, often overlooked by dense layers that process all features with equal weight. The early-stage attention layer, positioned after the first hidden layer, assigns weights to output features, emphasizing those most relevant to early-stage attack detection. The late-stage refinement layer, placed after the third hidden layer, refines the focus on important features before further processing. This approach provides the AECE model with enhanced feature prioritization, improving classification accuracy and reducing false positives in DDoS detection tasks.

To further enhance the model’s performance, each hidden layer is followed by batch normalization, which normalizes input features to stabilize learning and reduce internal covariate shifts. This process ensures consistent distributions of inputs to each layer, accelerating convergence and improving model generalization. Additionally, a dropout layer with a rate of 0.3 is added after the third hidden layer to prevent overfitting. By randomly disabling neurons during training, dropout encourages the model to develop redundant representations, enhancing its robustness on unseen data. Combining these techniques, the AECE model effectively extracts relevant features and accurately classifies DDoS attacks in SDN environments. Table 4 summarizes the model architecture, providing detailed information on the layer types, output shapes, and parameters.

Table 4 Summary of AECE model architecture, including layer types, output shapes, and Parameters.

Full size table

Theoretical rationale for architectural choices

To ground our architectural design in principled theory, we draw from established concepts within hierarchical feature-learning, smooth gating mechanisms, and advanced activation function properties prevalent in modern DNNs. These principles inform our strategic decisions regarding layer types, attention placement, and activation function selection.

Attention mechanism placement

The integration of attention mechanisms at specific points within the network hierarchy is motivated by the need for dynamic feature weighting and prioritization at different levels of abstraction. Placing an attention module after the first dense layer (Layer 1) facilitates early-stage selection. At this initial stage, the network processes raw, potentially noisy packet-level statistics. The attention mechanism here enables the model to learn to suppress less informative or potentially misleading features (e.g., transient network noise) while amplifying features that serve as early indicators of anomalous behavior. This is analogous to early-stage attention in other domains, such as ‘patch-level’ attention in vision transformers, which focuses on salient low-level input patterns.

A second attention module is strategically placed after the third dense layer (Layer 3). By this stage, the network has processed the data through multiple transformations, building more complex and abstract feature representations. The attention mechanism at this deeper level serves as a late-stage refinement process. It operates on these higher-level features, allowing the network to identify and emphasize the most discriminative feature combinations and relationships that are crucial for distinguishing sophisticated attack patterns from benign traffic. This hierarchical application of attention ensures that the network can adaptively focus on relevant information throughout the feature extraction process, from raw inputs to complex learned representations.

Activation function selection

The choice of activation functions at each layer is critical for introducing non-linearity and enabling the network to learn complex mappings. We employ a combination of Rectified Linear Unit (ReLU), Gaussian Error Linear Unit (GELU), and Softmax activation, each selected for its specific properties and suitability within the network structure.

ReLU is utilized in the initial dense layers due to its computational efficiency and effectiveness in introducing sparsity and mitigating the vanishing gradient problem in shallower networks. Its piecewise linear nature allows for straightforward optimization.

GELU is employed in deeper layers (specifically after Layers 2 and 4, as per the model design). GELU is a smoother approximation of ReLU, which has shown performance improvements in more complex and deeper architectures, particularly those involving transformer-like components or attention mechanisms. Its smooth, non-monotonic shape allows for more stable gradient flow during training and has been shown to improve performance in various natural language processing and computer vision tasks, suggesting potential benefits in capturing nuanced patterns in complex network traffic data. The smoothness of GELU aligns well with the concept of smooth gating frameworks, providing a more continuous weighting of inputs compared to the hard thresholding of ReLU.

Finally, the Softmax activation function is applied to the output layer. This is a standard choice for multi-class classification tasks, as it converts the raw output scores into a probability distribution over the possible classes (benign or different types of attacks, if applicable), ensuring that the output values are positive and sum to one, providing a clear measure of confidence for each class prediction.

By carefully selecting and placing attention mechanisms and activation functions according on these theoretical considerations, the AECE model is designed to effectively extract, prioritize, and process features from SDN traffic data, leading to enhanced performance in DDoS detection.

FBCE loss: F-balanced cross-entropy loss

The BCE loss function ($\:{\mathcal{L}}_{FBCE}$) is widely used for binary classification tasks and is defined as:

$$\:{\mathcal{L}}_{BCE}\left(y,{y}_{p}\right)=\:-[ylog\left({y}_{p}\right)+\left(1-y\right)\text{l}\text{o}\text{g}(1-{y}_{p})$$

(1)

where $\:\text{y}$ is the true class label and $\:{y}_{p}$ is the predicted probability. The gradient of the BCE loss with respect to the predicted probability, $\:{y}_{p}$, is:

$$\:\frac{\partial\:{\mathcal{L}}_{BCE}}{\partial\:{\text{y}}_{\text{p}}}=\frac{{y}_{p}}{{y}_{p}(1-{y}_{p})}$$

(2)

The gradient of BCE loss depends only on $\:y$ and $\:{y}_{p}$, which can lead to biased updates in imbalanced datasets toward the majority class. The FBCE loss ($\:{\mathcal{L}}_{FBCE}$) is designed to address class imbalance issues by balancing precision and recall through an F1-score component within the cross-entropy framework. The FBCE loss is defined as:

$$\:{\mathcal{L}}_{FBCE}\left(y,\:{y}_{p}\right)={\mathcal{L}}_{BCE}\left(y,\:{y}_{p}\right)+\alpha\:(1-{F}_{1}(y,{y}_{p}\left)\right)$$

(3)

where $\:\alpha\:\in\:\left[\text{0,1}\right]$ is a weighting factor (set to 0.5 in this study) controls the trade-off between the BCE and F1 components. A higher $\:\alpha\:$ places more emphasis on F1, enhancing balance in imbalanced datasets. Denoting the expected FBCE loss over the data distribution by $\:\mathbb{E}\left[{\mathcal{L}}_{FBCE}\right]$, then:

$$\:\mathbb{E}\left[{\mathcal{L}}_{FBCE}\right]=\mathbb{E}\left[{\mathcal{L}}_{BCE}\right]+\alpha\:(1-\mathbb{E}[{F}_{1}\left]\:\right)$$

(4)

In particular $\:\underset{\alpha\:\to\:0}{\text{lim}}{\mathcal{L}}_{FBCE}={\mathcal{L}}_{BCE}$, and $\:\underset{\alpha\:\to\:1}{\text{lim}}{\mathcal{L}}_{FBCE}={\mathcal{L}}_{FBC}+1-{F}_{1}$. Under perfect predictions $\:\left(y={y}_{p}\right)$, $\:{\mathcal{L}}_{BCE}=0$, and $\:{F}_{1}=1$, so $\:{\mathcal{L}}_{FBCE}=0$. However, considering the definitions of $\:{\mathcal{L}}_{BCE}$ and $\:{F}_{1}$, we have:

$$\:{\mathcal{L}}_{FBCE}\left(y,\:{y}_{p}\right)=\:-\left[ylog\left({y}_{p}\right)+\left(1-y\right)\text{log}\left(1-{y}_{p}\right)\right]+\alpha\:(1-\frac{2\times\:y\times\:{y}_{p}}{y+{y}_{p}})$$

(5)

The gradient of the FBCE loss with respect to $\:{y}_{p}$ is:

$$\:\frac{{\partial\:\mathcal{L}}_{FBCE}}{\partial\:{\text{y}}_{\text{p}}}=\:\frac{{y}_{p}}{{y}_{p}(1-{y}_{p})}+\alpha\:\frac{y-{y}_{p}}{{\left(y-{y}_{p}\right)}^{2}}\:$$

(6)

The additional gradient term in $\:{\mathcal{L}}_{FBCE}$, $\:\alpha\:\frac{y-{y}_{p}}{{\left(y-{y}_{p}\right)}^{2}}$, modulates parameter updates based on the F1-score. This term penalizes incorrect predictions and promotes better performance on minority class samples: (a) The gradient term is small when predictions align with true labels. (b) For incorrect positive predictions, the gradient is negative, discouraging these predictions. (c) For incorrect negative predictions, the gradient encourages recall improvement.

Therfore, the $\:FBCE$ loss function ensures a balanced focus on both false positives and false negatives, crucial in high-stakes environments like SDN where misclassifications can disrupt legitimate traffic flows. By incorporating the F1-score component, the FBCE Loss enhances the model’s ability to provide reliable, high-accuracy detection. The computation steps of the FBCE loss are summarized in pseudocode form in Algorithm 1.

Algorithm 1 FBCE Loss Computation

Full size table

Activation and optimization functions

A combination of activation functions was selected to optimize detection performance across the network layers. ReLU (Rectified Linear Unit) is applied in the hidden layers (first and third hidden layers) and outputs zero for negative inputs and the input itself for positive inputs:

$$\:ReLU\left(z\right)=\text{m}\text{a}\text{x}(0,z)$$

(7)

where $\:z$ represents the input to the activation function. ReLU avoids the vanishing gradient problem, enabling faster training on large datasets like network traffic. The simplicity and computational efficiency of ReLU align well with the real-time analysis needs of SDN traffic.

GELU provides smooth, probabilistic activation, particularly beneficial in deeper layers (second and fourth hidden layers) where more complex data transformations are required:

$$GELU\left( z \right) = z~ \times \Phi \left( z \right)$$

(8)

where $\:{\Phi\:}\left(z\right)$ represents the Gaussian cumulative distribution function. GELU’s probabilistic nature aids in capturing finer traffic patterns, especially in high-variance SDN data, making it a strategic choice for DDoS detection in complex environments. Softmax is used in the output layer to convert raw scores into class probabilities, assisting in distinguishing between “normal” and “attack” traffic:

$$\:Softmax\left({z}_{i}\right)=\frac{{e}^{{z}_{i}}\:}{\sum_{j}{e}^{{z}_{j}}}$$

(9)

where $\:{z}_{i}$ represents the score of class $\:i$ and $\:j$ iterates over all possible classes. By normalizing output probabilities, Softmax supports accurate class separation, enabling precise detection and response to attacks. Finally, the Adamax optimizer, a variant of the Adam optimizer, is used for training the model due to its ability to handle sparse gradients and high-dimensional data. The Adamax update rule is based on the infinity norm, defined as:

$$\:\theta\:=\:\theta\:-\frac{\alpha\:}{\parallel\:\text{v}{\parallel\:}_{{\infty\:}}\text{}}.m$$

(10)

where, $\:\theta\:$ represents the model parameters being optimized, $\:\alpha\:$ is the learning rate, which controls the step size for updating $\:\theta\:$, $\:m$ is the first moment vector, capturing the mean of the gradients, $\:v$ is the second moment vector, storing the uncentered variance of the gradients, and $\:\parallel\:\text{v}{\parallel\:}_{{\infty\:}}$denotes the infinity norm of $\:v$, effectively limiting large updates. Adamax is well-suited for the variability and dimensionality of SDN traffic, providing stable convergence and faster training times in high-dimensional spaces. This makes it ideal for real-time applications where large datasets and rapid detection are essential.

Adjusting the number of epochs

The number of epochs is one of the most important hyperparameters for training a neural network. In the proposed method, the data is split into two parts: training and validation. The number of epochs is initially set to a large number. The network is trained using the training data and evaluated using the validation data. The loss and accuracy values for each epoch are considered for both the training and validation data, so that the optimal number of epochs can be determined by analyzing the results. The training continues until the accuracy increases and the loss decreases for both the training and validation data. However, if the accuracy increases on the training data but decreases on the validation data, or if the loss decreases for the training data but increases for the validation data, then overfitting occurs.

System implementation and evaluation

In this section, we evaluate the proposed AECE model, implemented in Python using the Jupyter Notebook environment. The evaluation is conducted on a dataset comprising 23 features extracted from SDN traffic under both normal and DDoS attack scenarios, as discussed previously²³. The dataset is divided into two parts, with 70% used for training and 30% for testing. First, we performed data preprocessing, which included feature selection, normalization, shuffling, and splitting to enhance data quality and model performance.

While our experiments focus on an SDN-specific dataset chosen for its real-time flow features (packet/byte counts, protocol details, flow rates), we recognize the importance of validating detection performance across multiple public benchmarks. Legacy datasets such as NSL-KDD and KDDCUP99 contain outdated traffic patterns and do not reflect SDN control-plane characteristics, while CICIDS2017, CSE-CICIDS2018, and CTU-13 exhibit class imbalances, missing values, or non-SDN protocols that complicate direct comparison. Consequently, these benchmarks may yield misleading performance estimates for SDN deployments.

To determine the appropriate number of epochs for training, the accuracy and loss plots (Figs. 2 and 3) were drawn. Based on the approach explained in Sect. 3.5, no significant improvement was observed after 900 training epochs. Therefore, 900 epochs seems to be an appropriate number for training. After determining the optimal number of epochs, the neural network was retrained with this specific number of iterations.

Evaluation metrics

The performance of the AECE and benchmark FFDNN⁷ models was evaluated using several metrics, including accuracy, precision, recall, F1 score, and false positive and false negative rates. These metrics are calculated as follows:

$$\:Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$

(11)

$$\:Precision=\frac{TP}{TP+FP}$$

(12)

$$\:Recall=\frac{TP}{TP+FN}$$

(13)

$$\:{F}_{score}=\frac{2\times\:Recall\times\:Precision}{Recall+Precision}$$

(14)

$$\:F{P}_{rate}=\frac{FP}{FP+TN}$$

(15)

$$\:F{N}_{rate}=\frac{FN}{FN+TP}$$

(16)

where TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative) are based on predictions made by the model.

Comparison of models

As discussed, the AECE model architecture incorporates attention layers, ReLU and GELU activation functions in hidden layers, Softmax in the output layer, and the Adamax optimizer for loss minimization. In addition, AECE employs a custom FBCE loss function to prioritize accuracy in both false positives and false negatives, making it well-suited for high-stakes DDoS detection in SDNs. The performance of the AECE model was compared with FFDNN model⁷, a recent DL hybrid model combining LSTM and Attention mechanisms²³, and a variant of AECE without the custom FBCE loss function.

In contrast to AECE, the FFDNN model consists of three hidden layers with sigmoid activation functions and a Softmax output layer, optimized using binary cross-entropy and the Adamax optimizer. The lack of attention layers and FBCE loss in FFDNN offers a baseline to evaluate AECE’s innovations in terms of accuracy and error reduction. The LSTM-Attention hybrid model²³ utilizes a sequential processing approach with LSTM layers to capture temporal dependencies in network traffic, augmented by a multi-head attention mechanism to focus on the most relevant time steps and features. Its architecture, as detailed in²³, includes LSTM layers followed by dense layers and attention, aiming to leverage both sequential and salient features.

The results are presented in Fig. 4, illustrating improvements across accuracy, precision, recall, $\:{F}_{score}$, and false positive and false negative rates.

Fig. 4

The reuslts of different evaluation metrics for hybrid LSTM-Attention, FFDNN, AECE, and AECE without customized loss function.

Full size image

As shown in the Fig. 4, the AECE model with FBCE Loss significantly outperforms FFDNN. AECE achieves a 10% improvement in accuracy, a substantial increase in precision (11%) and recall (7%), and a reduced false positive (13%) and false negative rate (7%), highlighting its superior ability to accurately classify both attack and non-attack traffic.

Furthermore, when compared to the recent LSTM-Attention hybrid model²³, AECE demonstrates superior performance. AECE achieves higher Accuracy (95% vs. 91%), Precision (94% vs. 88%), Recall (96% vs. 93%), and F1-Score (95% vs. 90%). Notably, AECE also exhibits significantly lower False Positive Rate (7% vs. 10%) and False Negative Rate (4% vs. 7%). While the LSTM-Attention hybrid model leverages temporal features, AECE’s combination of strategic attention placement within a DNN framework and the FBCE loss function appears more effective in capturing the critical spatial and statistical features of SDN traffic relevant to DDoS detection, leading to more accurate and balanced classification performance. These results underscores the novelty and effectiveness of the proposed AECE model for robust and high-precision DDoS detection in SDN environments.

To visualize the classification effectiveness, Fig. 5 presents confusion matrices for FFDNN, hybrid LSTM-Attention, AECE without FBCE Loss, and AECE with FBCE Loss. The AECE model with the customized FBCE Loss demonstrates a higher true positive rate and a lower false positive and false negative rates compared to FFDNN, indicating its effectiveness in reducing misclassifications. FBCE directly addresses the class imbalance problem, ensuring balanced precision and recall while promoting balanced learning through gradient modifications. This helps prevent overfitting to dominant classes. Empirical validation further supports these findings, with significant reductions in false positives and false negatives, as reflected in the improved precision (94%), recall (96%), F1-score (95%), and overall accuracy (95%).

In addition to FFDNN and hybrid LSTM-Attention, the performance of AECE was compared to traditional ML models, including SVM, KNN, Decision Tree (DT), and Logistic Regression. The AECE model achieved the highest accuracy (95%) (Fig. 6) and F score (96%) (Fig. 7) among all evaluated models, underscoring its robustness in DDoS detection.

Fig. 7

Comparison of $\:{F}_{score}$ of AECE and different benchmark methods.

Full size image

The AECE model demonstrates marked improvements over FFDNN and other ML methods. Through its attention layers and FBCE Loss, AECE achieves a balanced optimization between precision and recall, reducing both false positives and false negatives. This balance is crucial for real-world DDoS detection applications, where misclassifications can lead to unnecessary network restrictions or security risks. The AECE model’s overall accuracy of 95% and F1 score of 95.4% affirm its efficacy and reliability for SDN-based DDoS detection. In conclusion, AECE’s innovative architecture and loss function make it a superior solution for high-precision, real-time DDoS detection in SDN environments.

Ablation study

To rigorously evaluate the contribution of the core components within the AECE architecture—namely, the attention layers, the FBCE loss function, the mixed ReLU and GELU activation functions, and the Adamax optimizer—we conducted an ablation study. This study compares the performance of the full AECE model against several ablated variants, each removing or altering one key component. The ablated variants are as follows:

AECE w/o Attention: The AECE model with the attention layers removed.
AECE with BCE (no FBCE): The AECE model trained using standard Binary Cross-Entropy loss instead of the proposed FBCE loss.
AECE (ReLU only): The AECE model where all hidden layers use only the ReLU activation function.
AECE (GELU only): The AECE model where all hidden layers use only the GELU activation function.
AECE with Adam (vs. Adamax): The AECE model trained using the standard Adam optimizer instead of Adamax.

The performance of the full AECE model and each ablated variant was evaluated on the held-out test set using the metrics defined in Sect. 4.1. Table 5 summarizes the results of this ablation study.

Table 5 Ablation study: effect of removing or replacing each key component of AECE.

Full size table

The results presented in Table 5 provide empirical validation for the necessity of each component within the AECE architecture. Removing the attention layers (AECE w/o Attention) results in a modest drop in F1-Score from 95.62 to 94.86%. However, this change is accompanied by a significant increase in the False Positive Rate, rising from 6.53 to 8.58%. This confirms the crucial role of the attention mechanism in improving the model’s precision and reducing false alarms by enabling it to focus on the most discriminative features.

Using a single activation function throughout the hidden layers, either ReLU-only (AECE (ReLU only)) or GELU-only (AECE (GELU only)), also leads to decreased performance compared to the mixed ReLU + GELU strategy. The F1-Score drops to 94.63% for ReLU-only and 94.59% for GELU-only, representing a reduction of approximately 1.0% compared to the full model. This verifies that the combination of ReLU’s efficiency in earlier layers and GELU’s smoother properties in deeper layers is beneficial for optimizing performance.

Finally, swapping the Adamax optimizer for the standard Adam optimizer (AECE with Adam) results in an F1-Score of 94.51%, a reduction of approximately 1.11% compared to the full AECE model. This indicates that the characteristics of Adamax, particularly its effectiveness with sparse gradients, are advantageous for optimizing the AECE model on the given SDN traffic dataset, contributing to better overall performance.

Collectively, these ablation experiments provide strong empirical evidence that each proposed component—attention layers, mixed activations, and the Adamax optimizer—contributes positively and distinctly to the AECE model’s superior performance in DDoS detection.

Conclusion and future directions

SDNs provide a flexible and intelligent architecture for modern network management, but ensuring their security remains a critical challenge. In particular, the detection and mitigation of Distributed Denial of Service (DDoS) attacks require robust and scalable solutions. This paper presents the AECE model, integrating attention mechanisms and the FBCE loss function to address the limitations of existing DDoS detection approaches.

The proposed AECE model demonstrated superior performance in detecting subtle attack patterns while balancing false positives and false negatives, achieving notable improvements in precision, recall, and F1-score compared to benchmark methods. The combination of attention mechanisms, effective feature prioritization, and architectural enhancements, such as the use of GELU activation and dropout layers, contributed to its robustness and adaptability in dynamic SDN environments. Future work may extend this research by exploring novel activation functions to enhance neural network convergence and mitigate gradient-related issues. We also plan to employ advanced optimization techniques, such as genetic algorithms, for fine-tuning hyperparameters to further improve detection performance.

This study highlights the potential of integrating attention-based methods and tailored loss functions to advance DDoS detection within SDN architectures, paving the way for more secure and efficient networks.

Future research can focus on enhancing the interpretability and adaptability of the proposed AECE model to address evolving cyber threats in SDN environments. One promising avenue is the integration of Explainable AI (XAI) techniques, such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations), to provide transparency in decision-making processes. This would enable network administrators to better understand the model’s predictions and improve trust in its deployment for critical systems. Additionally, extending the model to handle zero-day attacks through transfer learning and semi-supervised learning frameworks could improve generalization capabilities for unseen attack patterns. Incorporating federated learning approaches may also address privacy concerns by enabling distributed detection across multiple SDN domains without centralizing sensitive traffic data.

Data availability

The datasets analyzed during the current study are available in the Mendeley Data repository: https://data.mendeley.com/datasets/jxpfjc64kr/1.

References

Catak, F. O. & Mustacoglu, A. F. Distributed denial of service attack detection using autoencoder and deep neural networks. J. Intell. Fuzzy Syst. 37 (3), 3969–3979 (2019).
Google Scholar
Priyadarshini, R. & Barik, R. K. A deep learning based intelligent framework to mitigate DDoS attack in fog environment. J. King Saud University-Computer Inform. Sci. 34 (3), 825–831 (2022).
Article Google Scholar
Li, Y. Ddos detection approach combining lstm and bayes, in 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD). pp. 180–185. (2019).
Shaaban, A. R., Abd-Elwanis, E. & Hussein, M. DDoS attack detection and classification via Convolutional Neural Network (CNN), in 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS). pp. 233–238. (2019).
Haider, S. et al. A deep CNN ensemble framework for efficient DDoS attack detection in software defined networks. Ieee Access. 8, 53972–53983 (2020).
Article Google Scholar
Roopak, M., Tian, G. Y. & Chambers, J. Deep learning models for cyber security in IoT networks, in 2019 IEEE 9th annual computing and communication workshop and conference (CCWC). pp. 0452–0457. (2019).
Cil, A. E., Yildiz, K. & Buldu, A. Detection of DDoS attacks with feed forward based deep neural network model. Expert Syst. Appl. 169, 114520 (2021).
Article Google Scholar
Roopak, M., Tian, G. Y. & Chambers, J. An intrusion detection system against ddos attacks in iot networks, in 2020 10th annual computing and communication workshop and conference (CCWC). pp. 0562–0567. (2020).
Xing, T., Huang, D., Xu, L., Chung, C.-J. & Khatkar, P. Snortflow: A openflow-based intrusion prevention system in cloud environment. in 2013 Second GENI Research and Educational Experiment Workshop 89–92. https://doi.org/10.1109/GREE.2013.25 (IEEE, 2013).
Hu, Z. et al. Enhancing Detection of Malicious Traffic Through FPGA-Based Frequency Transformation and Machine Learning (IEEE Access, 2023).
Altamimi, S. & Abu Al-Haija, Q. Maximizing intrusion detection efficiency for IoT networks using extreme learning machine. Discov. Internet Things 4(1), 5. https://doi.org/10.1007/s43926-024-0006 (2024).
Article Google Scholar
Costa, L. C. et al. OpenFlow data planes performance evaluation. Perform. Evaluation. 147, 102194 (2021).
Article Google Scholar
Alhijawi, B. et al. A survey on dos/ddos mitigation techniques in sdns: classification, comparison, solutions, testing tools and datasets. Comput. Electr. Eng. 99, 107706 (2022).
Article Google Scholar
Sahoo, K. S. et al. An evolutionary SVM model for DDOS attack detection in software defined networks. IEEE Access. 8, 132502–132513 (2020).
Article Google Scholar
Dong, S. & Sarem, M. DDoS attack detection method based on improved KNN with the degree of DDoS attack in software-defined networks. IEEE Access. 8, 5039–5048 (2019).
Article Google Scholar
Xu, Y. et al. Efficient DDoS detection based on K-FKNN in software defined networks. IEEE Access. 7, 160536–160545 (2019).
Article Google Scholar
Chen, Y., Pei, J. & Li, D. DETPro: a high-efficiency and low-latency system against DDoS attacks in SDN based on decision tree, in ICC2019 –2019 IEEE International Conference on Communications (ICC). pp. 1–6. (2019).
Kim, J. et al. CNN-based network intrusion detection against denial-of-service attacks. Electronics 9 (6), 916 (2020).
Article Google Scholar
Hnamte, V. et al. DDoS attack detection and mitigation using deep neural network in SDN environment. Computers Secur. 138, 103661 (2024).
Article Google Scholar
Zainudin, A. et al. An efficient hybrid-dnn for Ddos detection and classification in software-defined Iiot networks. IEEE Internet Things J. 10 (10), 8491–8504 (2023).
Article Google Scholar
Fan, C. et al. Detection of DDoS attacks in software defined networking using entropy. Appl. Sci. 12 (1), 370 (2021).
Article Google Scholar
Mohammed, S. S. et al. A new machine learning-based collaborative DDoS mitigation mechanism in software-defined network, in., 14th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). 2018. pp. 1–8. (2018).
Zaidoun, A. S. & Lachiri, Z. A hybrid deep learning model for multi-class DDoS detection in SDN networks. Ann. Telecommun. 80, 459–472 (2025).
Article Google Scholar
Ahuja, N., Singal, G. & Mukhopadhyay, D. DDOS attack SDN dataset. Mendeley Data, Version 1 (2020).
Shao, Y. et al. An improved BGE-Adam optimization algorithm based on entropy weighting and adaptive gradient strategy. Symmetry 16 (5), 623 (2024).
Article ADS Google Scholar
Polat, H., Polat, O. & Cetin, A. Detecting DDoS attacks in software-defined networks through feature selection methods and machine learning models. Sustainability 12 (3), 1035 (2020).
Article ADS Google Scholar
Banitalebi Dehkordi, A., Soltanaghaei, M. & Boroujeni, F. Z. The DDoS attacks detection through machine learning and statistical methods in SDN. J. Supercomputing. 77, 2383–2415 (2021).
Article Google Scholar
de Rios, M. Detection of reduction-of-quality DDoS attacks using fuzzy logic and machine learning algorithms. Comput. Netw. 186, 107792 (2021).
Article Google Scholar
Ma, Z. & Li, B. A DDoS attack detection method based on SVM and K-nearest neighbour in SDN environment. Int. J. Comput. Sci. Eng. 23 (3), 224–234 (2020).
Google Scholar
Perez-Diaz, J. A. et al. A flexible SDN-based architecture for identifying and mitigating low-rate DDoS attacks using machine learning. IEEE Access. 8, 155859–155872 (2020).
Article Google Scholar
Rasikha, V. & Marikkannu, P. An ensemble deep learning-based cyber attack detection system using optimization strategy. Knowl. Based Syst. 301, 112211 (2024).
Article Google Scholar
Chen, Z. et al. An effective method for anomaly detection in industrial internet of things using XGBoost and LSTM. Sci. Rep. 14 (1), 23969 (2024).
Article ADS PubMed PubMed Central Google Scholar
Liang, X. & Znati, T. A long short-term memory enabled framework for DDoS detection, in 2019 IEEE global communications conference (GLOBECOM). pp. 1–6. (2019).
Hendrycks, D. & Gimpel, K. Gaussian error linear units (gelus). arXiv preprint arXiv:1606. 08415, 2016.

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

These authors contributed equally to this work: Hossein Monshizadeh Naeen and Malihe Ghadamyari.

Authors and Affiliations

Department of Computer Engineering, Ne.C., Islamic Azad University, Neyshabur, Iran
Hossein Monshizadeh Naeen, Malihe Ghadamyari & Mobin Barmar
Department of Computer Engineering, Ma.C., Islamic Azad University, Mashhad, Iran
Hossein Monshizadeh Naeen

Authors

Hossein Monshizadeh Naeen
View author publications
Search author on:PubMed Google Scholar
Malihe Ghadamyari
View author publications
Search author on:PubMed Google Scholar
Mobin Barmar
View author publications
Search author on:PubMed Google Scholar

Contributions

M.G. conceived the initial idea and implemented the algorithms, including benchmarks. H.M. improved the code and contributed to the conceptual development and manuscript writing. M.B. drafted the manuscript and prepared some figures.

Corresponding author

Correspondence to Hossein Monshizadeh Naeen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Naeen, H.M., Ghadamyari, M. & Barmar, M. Enhancing SDN security with deep learning and F-balanced cross-entropy for DDoS detection. Sci Rep 15, 33419 (2025). https://doi.org/10.1038/s41598-025-18826-w

Download citation

Received: 24 December 2024
Accepted: 03 September 2025
Published: 29 September 2025
DOI: https://doi.org/10.1038/s41598-025-18826-w

Subjects

Abstract

Similar content being viewed by others

A novel optimization-driven deep learning framework for the detection of DDoS attacks

Metaparameter optimized hybrid deep learning model for next generation cybersecurity in software defined networking environment

An entropy and machine learning based approach for DDoS attacks detection in software defined networks

Introduction

Literature review

Traditional DDoS detection approaches

Evolving research in DDoS detection

Data and loss function challenges in SDN security

Methodology

Dataset selection

Data preprocessing

Model architecture

Theoretical rationale for architectural choices

Attention mechanism placement

Activation function selection

FBCE loss: F-balanced cross-entropy loss

Activation and optimization functions

Adjusting the number of epochs

System implementation and evaluation

Evaluation metrics

Comparison of models

Ablation study

Conclusion and future directions

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links