Abstract
A denial of service (DoS) attack is an essential and nonstop threat to cybersecurity. Generally, DoS attacks are executed by forcing a victim’s computer to reset and consume its sources. Distributed DoS (DDoS) is the most underlined and significant attack in today’s cyber world. DDoS attacks have become a major threat on the Internet. Federated Learning (FL) is gaining attention in cybersecurity for collaboratively training deep learning (DL) models on distributed threat data without sharing raw data. However, the utilization of FL in this field is still in its early stages, with several key aspects yet to be explored. DL and machine learning (ML) enhance the ability to detect malicious traffic. Generally, DL and ML methods depend on necessary data samples for training a method using accuracy and efficacy. This manuscript proposes a Mitigating DDoS attack in Federated Learning Using Deep Reinforcement Learning and Frilled Lizard Optimization (MDDoSFL-DRLFLO) technique. The proposed MDDoSFL-DRLFLO model presents a collaborative FL approach to recognize and classify DDoS attacks quickly. At first, data normalization is performed using z-score normalization to standardize the input data within a specified range. Next, the feature selection process is implemented using an improved bacterial foraging optimization algorithm (IBFOA). In addition, the Dueling Double Deep Q-Network (D3QN) model is employed for the classification process. Finally, the hyperparameter tuning of the D3QN model is performed by using the frilled lizard optimization (FLO) approach. A wide range of experimental studies are implemented to ensure the enhanced performance of the MDDoSFL-DRLFLO method under the CICIDIS 2017 and ToN-IoT datasets. The performance validation of the MDDoSFL-DRLFLO technique portrayed a superior accuracy value of 99.52% on existing techniques under diverse evaluation measures.
Similar content being viewed by others
Introduction
Usually, DoS threats are aimed at mainly disrupting computer systems in networks. These threats are started from a particular machine illegally by targeting a system server through a threat1. DDoS threats are a more advanced form of DoS threats, where multiple sources launch attacks. These threats are introduced into environmental distributions, amplifying their impact compared to single-source DoS attacks2. A DDoS threat is an intended threat category typically presented in a distributed computational environment by pointing to a server or website to reduce its performance. To undertake this, an attacker utilizes several methods in a system3. DDoS is the most notorious threat out of these cloud threats because it is the reason for poor user experience, severe economic losses, and service disruption, leading to unsustainability for businesses utilizing cloud computing4. An attacker intends to deplete system capacity, compute resources, or infrastructure in a DDoS threat by devastating it with requests. It settles the cloud services and creates difficulties in responding to legal consumers. FL resolves data privacy concerns efficiently5. FL intends to construct a global method by allowing multiple participants using local information to train a similar method in a distributed method and by substituting method parameters or intermediary outcomes without substituting data samples6. But it carries another hidden risk of security and privacy, like poisoning and inference threats. Inference threat denotes that midway parameters such as method loss and gradient are revealed to the outside directly without intercepted encryption or malicious reason. Still, there are better solutions for such threats presently7.
The FL applications in cybersecurity for intrusion detection were discovered in prior investigations. The FL mechanism requires a representative test set accessible on the server side to control the training procedure8. With excellent classification abilities recently, ML methods, specifically DL, became extensively employed for cyberattack recognition difficulties9. Mainly, DL methods can efficiently absorb the several kinds of cyberattack signatures. Furthermore, DL methods can identify new threats that have never been trained or learned earlier. Consequently, with the capability to efficiently identify many known threats, DL-based approaches must be utilized to detect novel cyberattacks even without involving previous knowledge of the threat signatures10. However, reinforcement learning (RL) actively appeals to investigators as they advanced quickly in ML methods. It is recommended and application-oriented for dynamic environments. The RL methods are endorsed by high-tech companies, particularly when associated with massive datasets for sequential cyber threats. The primary aim of RL is to improve reward by acquiring sequential actions to respond to a dynamic environment. A Deep RL (DRL) approach, independent of threat variable initializations, is presented to resolve the optimum DoS threat scheduling concern in an iterative model-free method.
This manuscript proposes a Mitigating DDoS attack in Federated Learning Using Deep Reinforcement Learning and Frilled Lizard Optimization (MDDoSFL-DRLFLO) technique. The proposed MDDoSFL-DRLFLO model presents a collaborative FL approach to recognize and classify DDoS attacks quickly. At first, data normalization is performed using z-score normalization to standardize the input data within a specified range. Next, the feature selection process is implemented using an improved bacterial foraging optimization algorithm (IBFOA). In addition, the Dueling Double Deep Q-Network (D3QN) model is employed for the classification process. Finally, the hyperparameter tuning of the D3QN model is performed by using the frilled lizard optimization (FLO) approach. A wide range of experimental studies are implemented to ensure the enhanced performance of the MDDoSFL-DRLFLO method under the CICIDIS 2017 and ToN-IoT datasets. The key contribution of the MDDoSFL-DRLFLO method is listed below.
-
The MDDoSFL-DRLFLO model applies z-score normalization to standardize the dataset, ensuring that all features are scaled comparably. This technique improves the model’s stability and performance by preventing any one feature from dominating the learning process, improving the accuracy and reliability of the classification results.
-
The MDDoSFL-DRLFLO approach effectively utilizes the IBFOA technique to choose the most relevant features. By optimizing the feature space, it mitigates dimensionality, improving both model efficiency and computational speed. This results in an enhanced accuracy and more focused feature representation for classification tasks.
-
The MDDoSFL-DRLFLO methodology integrates D3QN, an RL technique, to improve classification performance. By evaluating both state-action values and advantages, D3QN enhances decision-making and effectively handles complex classification tasks. This approach strengthens the model’s capability to differentiate between classes with higher accuracy and robustness.
-
The MDDoSFL-DRLFLO model employs FLO for hyperparameter tuning, optimizing key parameters to improve performance. FLO effectively searches for the optimal set of hyperparameters, resulting in enhanced convergence during training. By fine-tuning the method’s capabilities, this results in more accurate and reliable predictions.
-
The MDDoSFL-DRLFLO model is unique in incorporating advanced techniques like IBFOA for feature selection, D3QN for classification, and FLO for hyperparameter tuning within a unified framework. This integration ensures more effective handling of complex classification tasks by optimizing feature selection, classification strategy, and model parameters. The approach delivers enhanced accuracy and robustness compared to conventional methods, presenting a comprehensive solution to challenging classification problems.
Literature of works
Majeed et al.11 present a Decentralized Autonomous Organization with FL (DAO-FL), a smart contract-based structure. DAO-FL projects the DAO theory of Membership Tokens (DAOMTs) as a power device inside a DAO. The structure integrates a Validation-DAO for Decentralized Input Verification (DIV) of the FL method, guaranteeing transparent and dependable LM validations. In addition, DAO-FL utilizes a multi-signature method aided by an Orchestrator-DAO to accomplish decentralized GM upgrades and, consequently, the Decentralized Output Verification (DOV) of the FL method. In12, a grey-box method is explained to attack DRL-based trading agents that are probable by trading in a similar stock market, with no additional admittance to the trading agent. An adversary agent applies a hybrid DNN method to the presented method as its procedure contains fully connected (FC) and convolutional layers. Saveetha et al.13 project a distributed ML mechanism called Federated ML (FML). The presented structure incorporates the federation of ML inside the BC system structure to recognize DDoS threats. Under the incorporated structure, miners train the blocks and participate in the ML training. A dynamic reputation-based miner selection mechanism that can balance exploitation and exploration is presented for optimum miner selection. In14, ML methods are discovered to improve ransomware protection in IoT gadgets running on the PureOS. This research also advanced a ransomware recognition structure utilizing ML that associates the ElasticNet and XGBoost models in a hybrid method. The implementation and design of the structure depend on the estimate of several present ML methods. Khoa et al.15 present an effectual collaborative cyberattack recognition method to safeguard BC systems. The laboratory mainly employs a BC system to construct an innovative dataset containing attack and normal traffic data. This research then presents a real-world collaboration learning method that allows nodes inside the system to share understanding learning without revealing their confidential data, thus considerably improving the method’s performance for the entire system. Aljrees et al.16 develop an innovative paradigm that synergizes new models, such as the quondam signature algorithm (QSA) and FL, effective data encryption, effectively counteracts randomly targeted threats in IoT methods. The FL integration promotes constant learning, encourages safety methods and data privacy, and offers a strong defence mechanism against developing attacks. The QSA has evolved into a problematic solution and is proficient at moderating susceptibilities related to MiTM threats. Ismail et al.17 present a decentralized BC-based method incorporated with smart contracts to accomplish authentication, participant registration, and control admittance to system sources. BC-DRLzSC accepts a ZT structure to reinforce SC security that is attained using an advocate to confirm every object’s dependability before retaining or granting admittance to the system source. DRL is applied to advance a proactive threat recognition method.
In18, a Non-Fungible Token (NFT) aided Knowledge Distillation (KD) structure is presented, intending to leverage the BC characteristics on data security to resolve the fundamental sturdiness faults in a naive KD structure and decreasing all over handling period by inserting a local BC layer. The main contributions are as follows: a modified Hybrid KD-NFT structure, improving decreasing latency and effectiveness; advancing a hybrid multilayer BC structure relating public and private chains; and conducting wide-ranging experiments to validate the presented methods. Najar and Naik19 present an efficient DDoS detection system using Balanced Random Sampling (BRS) and Convolutional Neural Networks (CNNs) in SDN environments. It utilizes mitigation techniques like filtering, rate limiting, and IP tables to block spoofed IPs, as well as a monitoring system to ensure legitimate traffic. Afraji, Lloret, and Peñalver20 discuss DDoS attack types, challenges with data quality, and the requirement for explainable AI in cybersecurity. It highlights limitations in current detection systems and proposes improvements through better datasets and more accurate algorithms. Najar21 proposes an innovative feature selection approach to create a robust intrusion detection system capable of detecting and classifying recent common DDoS attack types. Dilshad, Syed, and Rehman22 propose a DDoS detection approach by utilizing the Gini index for feature selection and FL for decentralized, privacy-preserving model training, improving accuracy and scalability. Najar et al.23 introduce a novel DDoS detection framework by implementing random sampling to address class imbalance and feature selection techniques such as low information gain, quasi-constant elimination, and PCA for optimal feature reduction. Popli et al.24 propose an FL framework for underwater networks, allowing drones to train an intrusion detection model while preserving data security collaboratively. It improves cyber intrusion detection, comprising zero-day attacks, by utilizing local insights without exposing sensitive data. Najar and Manohar Naik25 propose the ‘AE-CIAM’ hybrid framework for detecting and classifying low-rate DDoS attacks in cloud environments. It integrates an autoencoder (AE) with an attention module for effectual feature extraction and dimensionality reduction, followed by a CNN Inception model with attention for attack classification, offering high performance with low computational cost. Nomikos et al.26 present the DTE architecture for anomaly detection, encompassing data management, AI/ML training, and classification. An FL methodology is introduced to improve privacy, and the AI/ML-aided DTE’s anomaly detection and model training are discussed. Najar and Manohar Naik27 present diverse ML methods for detecting DDoS attack packets and types, using random forest (RF), multilayer perceptron, support vector machine (SVM), and k-nearest neighbour, achieving promising results. Shaha et al.28 tackle DRDoS attack detection by utilizing four ML methods such as SVM, decision tree (DT), RF, logistic regression (LR) and principal component analysis (PCA), achieving outstanding results. Saheed, Omole, and Sabit29 developed a robust threat intrusion detection model for the Industrial IoT (IIoT) using a genetic algorithm with an attention mechanism and modified adaptive moment estimation-optimized long short-term memory (GA-mADAM-LSTM) technique.
Saheed and Misra30 proposed a robust intrusion detection model for IoT networks using an ensemble learning technique optimized by the Gray wolf optimizer (GWO) model. The model integrates information gain (IG) and PCA for dimensionality reduction and combines multiple classifiers, namely DT, RF, kNN, and multilayer perceptron (MLP), through a voting-based ensemble framework to improve detection accuracy and reliability on real-world datasets. Saheed and Misra31 developed a privacy-preserving and explainable anomaly detection framework for cyber-physical systems in IoT (CPS-IoT) using a deep neural network (DNN) integrated with SHapley Additive exPlanations (SHAP) models. Saheed et al.32 presented a lightweight and efficient intrusion detection model for CPS using a transfer learning (TL) approach based on ResNet50 with one-dimensional CNNs (ResNet50-CNN1D). The adaptive gradient (Adagrad) optimizer is also utilized for loss minimization and is designed to improve detection accuracy and reduce training time. Oscar et al.33 proposed a bimodal access gateway authentication mechanism using an artificial neural network (ANN) approach. The model secures distributed cloud IoT data warehouse servers interconnected via a 5G radio frequency access network. Saheed and Chukwuere34 introduced an explainable and efficient intrusion detection model for the Internet of Vehicles (IoV) using an eXplainable Artificial Intelligence (XAI) Ensemble TL framework. The model integrates SHAP for interpretability, hybrid bidirectional LSTM with AE (BiLAE) for dimensionality reduction, and barnacle mating optimizer (BMO) for optimizing CNN-TL architectures such as ResNet, Inception and MobileNet to accurately detect zero-day botnet attacks with minimal reliance on labelled data. Manoj, Makkithaya, and Narendra35 proposed the AgriFLChain method by utilizing decentralized identifiers (DIDs), verifiable credentials (VCs), smart contracts, and differential privacy (DP) techniques to ensure authentication, data provenance, transparency, and traceability while maintaining predictive accuracy for crop yield forecasting using models like ResNet-16, CNN-LSTM, and others. Saheed, Misra, and Chockalingam36 developed a lightweight and efficient intrusion detection system for industrial control systems (ICS) using AE-based feature dimensionality reduction incorporated with deep CNN (DCNN) and LSTM models. Domb et al.37 presented a comprehensive security protocol for 5G networks, named Secure 5G Access Control (S5GAC), which integrates multi-factor authentication (MFA), contextual access control, and continuous monitoring. Using dynamic risk assessment based on network traffic anomalies, user activity, and device behaviour, S5GAC improves authentication accuracy and mitigates cyber threats. Saini et al.38 developed a hybrid model that enhances data privacy and trust in cloud environments by incorporating k-anonymity for user privacy, an optimized firefly algorithm (FA) for trust generation, and a time-aware modified best fit decreasing (T-MBFD) approach for efficient resource allocation.
Despite crucial improvements, existing studies portray various limitations and research gaps while addressing them. Many frameworks depend heavily on large labelled datasets, which are often unavailable or costly to obtain, restricting the efficiency of ML and FL models in real-world scenarios. Various models suffer from a lack of interpretability, mitigating user trust and hindering practical deployment. Blockchain lacks scalability and latency threats, although it enhances transparency and security. Various IDS models concentrate on specific attack types like DDoS, ransomware, or botnets, lacking comprehensive adaptability across diverse threats. Privacy preservation models are not consistently integrated with anomaly detection, creating vulnerabilities in sensitive CPS, IoT, and ICS environments. Furthermore, managing decentralized authentication and identity verification mechanisms remains challenging. Existing resource allocation algorithms tend to overlook dynamic network conditions, resulting in suboptimal performance. The limited integration of adaptive risk assessment and continuous monitoring reduces responsiveness to growing cyber threats. Lastly, many solutions do not adequately balance computational cost and detection accuracy, affecting the feasibility of deployment on resource-constrained devices. These gaps emphasize the requirement for more scalable, interpretable, and privacy-preserving frameworks with robust adaptive capabilities.
Proposed methodology
This manuscript develops an MDDoSFL-DRLFLO technique. The proposed MDDoSFL-DRLFLO model presents a collaborative FL approach to providing fast recognition and classification of DDoS attacks. To obtain this, the model includes various phases, such as data normalization, feature selection, classification process, and hyperparameter tuning model. Figure 1 characterizes the complete workflow of the MDDoSFL-DRLFLO technique.
Z-score normalization
At first, the data normalization is applied using z-score normalization to standardize the input data within a specified range39. This is chosen for its capability to standardize features, making them comparable in scale by transforming data to have a mean of zero and a standard deviation of one. This is crucial in models sensitive to the scale of input features, such as gradient-based methods or distance-based algorithms. By normalizing the data, the model ensures that no single feature dominates the learning process, resulting in more stable and reliable training. Furthermore, z-score normalization improves convergence speed during optimization, as features with vastly diverse ranges are standardized. Z-score normalization is less sensitive to outliers than other techniques, such as Min-Max scaling, making it a more robust choice in many real-world datasets. This approach improves the model’s performance and ensures better generalization across various data types.
Standardization is a general pre-processing phase for information utilized in DL models, also known as \(\:Z\)-score normalization. Standardizing means the data is transmitted to have zero units and mean variance. This process is significant for DL models wherever it assists in ensuring that the characteristics are on a corresponding scale. It enhances the model convergence and training stabilities. The standard deviation and mean are evaluated for every parameter in the dataset. Standardization for every parameter (x) utilizing this formula:
Whereas \(\:Z\) characterizes the standardized value, \(\:x\) represents input data, \(\:\mu\:\) signifies the dataset mean, and \(\:\sigma\:\) symbolizes the dataset standard deviation. Standardizing data is an essential stage for DL models. It transmits every characteristic consistently, promoting efficient NN training methods.
IBFOA-based feature selection
Next, the section process was process implemented using IBFOA40. This model was chosen due to its capability to effectively explore and optimize the feature space by replicating the foraging behaviour of bacteria. Unlike conventional methods like Recursive Feature Elimination (RFE), IBFOA chooses the most relevant features and eliminates redundant or irrelevant ones, improving model performance. The global search capability of the algorithm ensures that it can avoid local optima, giving a more thorough exploration of the feature space. Furthermore, the ability of the IBFOA model to handle high-dimensional datasets with intrinsic relationships among features makes it a more versatile and efficient method related to simpler feature selection techniques. The result is mitigated dimensionality, faster training times, and better generalization, making IBFOA a superior choice for complex classification tasks. This optimization also improves model interpretability by focusing on key features. Figure 2 illustrates the IBFOA method.
The BFOA is a nature-inspired, population-based stochastic optimization algorithm modelled on the foraging behaviour of Escherichia coli (E. coli) bacteria. It operates through four phases: chemotaxis, swarming, reproduction, and elimination-dispersal, enabling robust search and optimization in complex environments.
Position representation
Each bacterium’s position in a \(\:D\)-dimensional search space at chemotactic step \(\:t\), reproduction step \(\:r\), and elimination-dispersal step \(\:e\) is defined as:
1. Chemotaxis
Chemotaxis allows bacteria to move in the search space by alternating between tumbles and swims, simulating a response to chemical gradients. The movement direction after a tumble is determined using a unit vector:
The new position after movement is calculated as:
Where \(\:{\varDelta\:}_{n}\) is a random direction vector, and \(\:C\left(i\right)\) is the chemotactic step size (run length unit), which controls movement extent.
2. Swarming
Swarming models the collective movement toward nutrient-rich areas, affecting the overall objective function with a swarming cost component \(\:{J}_{CC}\):
This term biases the movement toward favourable regions while maintaining diversity.
3. Reproduction
After a fixed number of chemotactic steps \(\:{N}_{c}\), bacteria are sorted by health, measured as the accumulated cost:
The least healthy half of the population is removed, and the healthier half reproduces, maintaining population size while promoting fitter individuals.
4. Elimination-dispersal
Bacteria are arbitrarily eliminated or dispersed to simulate sudden environmental changes with a low probability. This phase improves exploration and helps avoid local optima.
In the BFOA equations, \(\:{\theta\:}^{n}\left(t,\:r,\:e\right)\) denotes the position of the \(\:n\)-th bacterium at chemotactic step \(\:t\), reproduction step \(\:r\), and elimination-dispersal step \(\:e\) in a \(\:D\)-dimensional space. \(\:{\varDelta\:}_{n}\left(t,\:r,\:e\right)\) is an arbitrary direction vector used for movement, while \(\:C\left(i\right)\) specifies the chemotactic step size controlling the movement length. The swarming cost parameters \(\:{-d}_{attract}\), \(\:{\omega\:}_{attract}\), \(\:{h}_{repellant}\), \(\:{\omega\:}_{repellant}\) regulate attraction and repulsion effects between bacteria. \(\:J\left(i,\:t,\:r,\:e\right)\) is the objective function value for bacterium \(\:i\), and \(\:{J}_{health}^{i}\) sums its cost over chemotactic steps, reflecting its overall fitness. \(\:S\) is the population size, and \(\:{N}_{c}\) is the number of chemotactic steps before reproduction.
An improved BFOA (IBFOA) is utilized, which adjusts the chemotaxis process utilizing the sine cosine algorithm (SCA) for stimulation. The main development exists in changing the constant size of the step. \(\:\left(C\left(i\right)\right)\) of bacteria throughout chemotaxis. The traditional BFOA uses a static step size for bacterial movements in chemotaxis. It offers an adaptable step size described by Eq. (6):
A denotes the described constant, and \(\:i\) refers to a number of the chemotaxis. This modification presents a slow growth in the size of the step as the chemotaxis procedure develops. Furthermore, it integrates a method the SCA stimulates to make an arbitrary moving direction for the bacteria in chemotaxis. Equation (7) acquired from SCA makes arbitrary way vectors for every movement of the bacteria:
Here, \(\:{X}_{n}^{d}\left(t,\:r,e\right)\) and \(\:{P}^{d}\left(t,r,e\right)\) represent the present location of the \(\:nth\) bacterium and optimal solutions thus far at the \(\:tth\) chemotactic stage, \(\:eth\) elimination dispersal, and\(\:\:rth\) reproduction in the \(\:dth\) dimensions. \(\:{r}_{2},{r}_{3}\) and \(\:{r}_{4}\) represent random numbers amongst \(\:(0\), \(\:2\pi\:)\), and 1 1 designates the total value equivalent to SCA. The location of the \(\:nth\) bacterium after a tumble is provided by Eq. (8):
When the bacterium meets a greater food focus after a tumble Eq. (9), it endures swimming similarly for a pre-defined quantity of swim lengths, considering the focus improves. On the other hand, when the focus reduces, the bacterium makes other tumbles to discover novel directions. The rest of the IBFOA follows the normal BFOA procedure, with elimination dispersal and reproduction. This enhanced model intends to attain improved exploitation and exploration abilities compared to the new BFOA.
The FF employed in the IBFOA method is projected to balance the preferred feature amounts in all solutions (minimal) and the classification precision (maximal) gained by utilizing these chosen elements. Equation (10) indicates the FF to assess solutions.
Whereas \(\:{\gamma\:}_{R}\left(D\right)\) indicates the classifier rate of error of a specified classifier. \(\:\left|R\right|\:\)refers to the cardinality of the preferred subset, and \(\:\left|C\right|\) denotes total feature amounts in the dataset, \(\:\alpha\:\) and \(\:\beta\:\) are dual parameters corresponding to the position of classifier quality and subset length. ∈ [1,0] and \(\:\beta\:=1-\alpha\:.\).
D3QN-based classification process
In addition, the D3QN model is employed for the classification process41. This technique is chosen because it can effectively handle complex decision-making tasks by considering both state-action values and advantages. This dual approach assists the model in better differentiating between important and less significant actions, improving its ability to classify with higher accuracy. Unlike conventional classification models, D3QN integrates RL, enabling it to learn optimal strategies in dynamic environments where label distributions might vary. D3QN reduces the overestimation bias typically seen in standard Q-learning by utilizing two separate streams for value and advantage estimation, leading to more stable and reliable predictions. Moreover, D3QN is highly efficient in environments with large and continuous action spaces, making it particularly suitable for challenging classification problems. This ability of the method to continually improve through experience makes it a more adaptive and robust solution compared to conventional supervised learning techniques. Figure 3 represents the structure of the D3QN model.
DNNs produce essential developments in several artificial intelligence (AI) fields, primarily because of their robust data learning and estimation abilities. Combining RL and DNNs to make DRL is a novel and stimulating concept. The Deep Q-Network (DQN) model is an RL approach incorporating DNNs and Q‐learning. Its main goal is to resolve the composite RL problem in higher‐dimensional state spaces.
Whereas \(\:s\) denotes the space of states, \(\:a\) represents the space of action, \(\:r\) refers to the function of reward, \(\:\gamma\:\) signifies the factor of discount, and \(\:(s,\:a)\) characterizes the pair of state actions based on the rule \(\:\pi\:.\) During all time steps of the DQN method, the agents choose actions using greater Q-values to implement by detecting the present condition. Simultaneously, the agent supplies the observable phenomena in the knowledge replayed for future learning, and Eq. (3) displays the Q-value computation model. Nevertheless, utilizing the DQN method might overvalue the Q-value. Next, Double DQN (DDQN) has been established to define the action by the system of evaluation and using the targeted system to define action values, and the aim is computed as shown:
Whereas \(\:\theta\:\) and \(\:{\theta\:}^{{\prime\:}}\) signify parameters of the targeted system and the evaluation network. However, the state-value and action-value functions in standard DQN have limited independence, affecting performance. To address this, the duelling DQN architecture was introduced, which separates the action-value function into two components, namely state value and advantage, by modifying the DQN structure, as shown in Eq. (13). The advantage \(\:A\) and the state functions \(\:V\) are applied to the approximation the values of selecting an action for specified states and the beneficial value of the action, correspondingly. Using this method, the agent might control the benefits of choosing every action straight from the advantage and state function, thus enhancing the performance of the models after addressing a vast space of states.
Here, \(\:\theta\:\) denotes the system parameter of the public element, \(\:\alpha\:\) and \(\:\beta\:\) characterize the system parameters particular to the equivalent function’s particular portion, individually. Nevertheless, Eq. (14) can’t control the relative significance of the advantage \(\:A\) and the state functions \(\:V\) within the output. To direct this more precisely, the real usage of the equation is demonstrated:
The benefits of DDQN and duelling DQN are joined to give the D3QN approach. The standard structure of \(\:D3QN\) is equivalent to Dueling DQN, which varies from the Dueling DQN approach simply in computing the targeted value iteration upgrade. During this Dueling DQN method, the targeted value is calculated as demonstrated:
From Eq. (16), it is discovered that the maximization operation might have an overestimation problem. To resolve this issue, the D3QN applies the evaluation network to gain the action equivalent to the optimum value of the action in the following time slot condition. Formerly, it computes the action and the targeted value of implementation over the targeted system, as exposed in Eq. (16):
FLO- based hyperparameter tuning process
Finally, the hyperparameter tuning of the D3QN model is performed using the FLO approach42. This method is chosen for its capability to efficiently search the hyperparameter space using a bio-inspired algorithm that replicates the unique behaviours of frilled lizards. Unlike grid or random search, which can be computationally expensive and inefficient, FLO adapts to the problem, ensuring more intelligent exploration and exploitation of the hyperparameter space. The ability of the algorithm to escape local optima and explore diverse regions of the search space makes it more robust compared to conventional optimization methods like genetic algorithms or particle swarm optimization. The flexibility of the FLO model allows it to fine-tune complex models with greater precision, improving convergence rates and ultimately resulting in improved model performance. Moreover, the adaptive behaviour of the FLO method ensures that it performs well across various tasks without requiring extensive parameter adjustments, making it a more scalable and effectual solution for hyperparameter tuning. This results in optimized model performance with fewer computational resources. Figure 4 demonstrates the FLO methodology.
This paper specifies the inspiration source utilized in the growth and concept of FLO. Previously, the equivalent performance stages were modelled mathematically and applied to the optimization solution issues.
Initialization
The presented FLO model is a meta-heuristic approach considering FLs as its associates. FLO effectively finds near-optimum solutions for optimizer challenges by utilizing the searching abilities of its members inside the problem‐solving area. Every FL determines value allocations for the decision variables based on its specific position within the problem‐solving space. Therefore, all FL characterizes possible solutions, which are understood mathematically over a vector. Cooperatively, the frilled lizards establish the population of FLO that is mathematically defined as a matrix utilizing Eq. (17). The primary locations of the FLs inside the problem‐solving area are recognized by an arbitrary initialization utilizing Eq. (18):
Whereas \(\:X\) signifies the population matrix of FLO, \(\:{X}_{i}\) characterizes the \(\:ith\) FL (solution of the candidate), \(\:{x}_{i,d}\) signifies its \(\:dth\) dimensions within the searching space (decision variable), \(\:N\) provides the FL counts, \(\:m\) represents the decision variable counts, \(\:r\) symbolizes a number arbitrarily taken from the range \(\:\left[\text{0,1}\right],\) \(\:u{b}_{d}\) and\(\:\:l{b}_{d}\) mean an upper and lower limit on the \(\:dth\) decision variable, correspondingly. Consider that every FL characterizes the candidate solution for the problems, which is equivalent to every candidate solution, and the equivalent value of the objective function is computed for the problem. The group of fixed values of the objective function is characterized mathematically utilizing the vector provided in Eq. (19):
Now, \(\:F\) signifies the vector of the computed values of the objective function, and \(\:{F}_{i}\) Provides the calculated values of the objective function equivalent to the \(\:ith\) FL. The defined values of the objective function are suitable conditions for calculating the qualities of the population individuals (for example, the candidate’s solution).
Notably, the top assessed for the objective function value relates to the optimal population individuals (for example, the top solution of the candidate), and correspondingly, the poor assessed for the objective function value relates to the poor population individuals (namely the poor solution of the candidate). Meanwhile, in every FLO iteration, the location of the FLs is upgraded within the space of the solution, and novel values are additionally estimated for the function of the objective under concern. Therefore, in all iterations, the location of the finest individuals (namely, the best solution for the candidate) should be upgraded. Finally, the performance of Algorithm FLO, the best candidate solution gained in the iteration of the method, is considered the solution to the problems.
During all iterations of method FLO, the location of the FL in the problem-solving area experiences upgrading in dual different stages. Initially, the exploration stage mimics the movement of the frilled lizards near prey throughout hunting, designed to expand the searching space and discover novel possible solutions. This stage allows the algorithm to explore diverse regions of the solution space, assisting in detecting potential optimal areas. In the following exploitation phase, inspired by the lizard’s retreat to a treetop, the model refines these promising zones to improve solution quality and converge toward the global optimum.
Stage 1: Exploration (hunting tactic).
In this phase, FLO imitates the frilled lizard’s hunting behaviour to enhance global exploration. Each lizard (individual) forms a set of candidate prey based on peers with better objective values:
A new position is generated by moving toward a randomly chosen candidate prey:
If the new position improves the objective function value, it replaces the current one:
Stage 2: Exploitation (climbing the tree).
This phase imitates the lizard’s behaviour of climbing to a tree for safety, enhancing local search and exploitation. A new position is generated as:
The updated position is accepted if it improves the objective function:
In the above equations, \(\:{x}_{i,d}\) denotes the \(\:d\)-th dimension of the \(\:i\)-th frilled lizard (individual), \(\:N\) is the population size, and \(\:m\) is the number of decision variables. \(\:{F}_{i}\), \(\:{F}_{i}^{P1}\), and \(\:{\:F}_{i}^{P2}\) depict the objective function values at the current exploration and exploitation stages. \(\:S{P}_{i,d}\) is the \(\:d\)-th dimension of the selected prey for the \(\:i\)-th lizard, while \(\:r\) is a random number uniformly drawn from [0, 1]. \(\:I\) is a randomly chosen integer from the set {1, 2}, and \(\:t\) is the current iteration, with \(\:u{b}_{d}\) and \(\:l{b}_{d}\) being the upper and lower bounds of the \(\:d\)-th variable, respectively. \(\:{CP}_{i}\) specifies the candidate prey set for individual \(\:i\), containing all individuals with better objective values.
Fitness selection is a significant feature that affects performance in the FLO. The hyperparameter choice procedure contains the solution encoder model to evaluate the efficacy of the candidate solutions. In this work, the FLO considers precision the key state in designing the FF.
Here, TP symbolizes the true positive, and FP represents the false positive value.
Experimental analysis
The experimental results of the MDDoSFL-DRLFLO technique are confirmed under dual datasets such as CICIDIS 201743 and ToN-IoT44. The CICIDIS 2017 dataset contains 13,000 samples under six class labels, as illustrated in Table 1. The total number of features is 78, but only 35 have been selected.
Figure 5 determines the confusion matrix created by the MDDoSFL-DRLFLO approach on the CICIDIS 2017 dataset under different epoch counts. The results recognize that the MDDoSFL-DRLFLO method detects all class labels proficiently.
The classifier results of the MDDoSFL-DRLFLO approach are determined on the CICIDIS 2017 dataset under dissimilar epochs in Table 2; Fig. 6. The table values express that the MDDoSFL-DRLFLO approach adequately identified all the samples. On 500 epochs, the MDDoSFL-DRLFLO approach presents an average \(\:acc{u}_{y}\) of 96.99%, \(\:pre{c}_{n}\) of 90.90%, \(\:rec{a}_{l}\) of 90.49%, \(\:{F}_{measure}\) of 90.68%, \(\:{AUC}_{score}\:\)of 94.33%, and Kappa of 94.39%. Additionally, based on 1000 epochs, the MDDoSFL-DRLFLO technique gives an average \(\:acc{u}_{y}\) of 98.92%, \(\:pre{c}_{n}\) of 96.69%, \(\:rec{a}_{l}\) of 96.66%, \(\:{F}_{measure}\) of 96.67%, \(\:{AUC}_{score}\:\)of 98.00%, and Kappa of 98.07%. Also, dependent on 1500 epochs, the MDDoSFL-DRLFLO technique presents an average \(\:acc{u}_{y}\) of 98.31%, \(\:pre{c}_{n}\) of 94.87%, \(\:rec{a}_{l}\) of 94.74%, \(\:{F}_{measure}\) of 94.80%, \(\:{AUC}_{score}\:\)of 96.86%, and Kappa of 96.92%. Furthermore, on 2500 epochs, the MDDoSFL-DRLFLO model presents an average \(\:acc{u}_{y}\) of 97.99%, \(\:pre{c}_{n}\) of 93.90%, \(\:rec{a}_{l}\) of 93.71%, \(\:{F}_{measure}\) of 93.79%, \(\:{AUC}_{score}\:\)of 96.24%, and Kappa of 96.31%. Finally, concerning 3000 epochs, the MDDoSFL-DRLFLO model attains an average \(\:acc{u}_{y}\) of 98.24%, \(\:pre{c}_{n}\) of 94.62%, \(\:rec{a}_{l}\) of 94.49%, \(\:{F}_{measure}\) of 904.55%, \(\:{AUC}_{score}\:\)of 96.71%, and Kappa of 96.77%.
Figure 7 proves the classifier outcomes of the MDDoSFL-DRLFLO approach under the CICIDIS 2017 database. Figure 7a illustrates the accuracy of the investigation of the MDDoSFL-DRLFLO approach. The figure informs that the MDDoSFL-DRLFLO methodology gains rising values across rising epoch counts. Moreover, the improving validation above training illustrates that the MDDoSFL-DRLFLO methodology learns proficiently on the test data. Next, Fig. 7b exemplifies the loss analysis of the MDDoSFL-DRLFLO approach. The outcomes specify that the MDDoSFL-DRLFLO approach achieves adjacent training and validation loss values. It is examined that the MDDoSFL-DRLFLO approach learns competently on the test data. Figure 7c shows the PR curve, demonstrating superior performance across all class labels. Figure 7d shows the ROC investigation, signifying capable results using great ROC values for different classes.
Table 3; Fig. 8 represent the comparative investigation of the MDDoSFL-DRLFLO method CICIDIS 2017 dataset with the existing techniques45,46. The results emphasized that the RF, GoogLeNet, CNN, CNN + WDLSTM, SMOTE-RF, and SPE-GoogLeNet techniques have reported worse performance. Meanwhile, CNN-WDLSTM approaches have accomplished closer outcomes with \(\:pre{c}_{n}\), \(\:rec{a}_{l},\) \(\:acc{u}_{y},\:\)and \(\:{F}_{measure}\) of 95.37%, 95.51%, 97.70%, and 95.50%, respectively. Simultaneously, the MDDoSFL-DRLFLO method reported maximal performance with minimum \(\:pre{c}_{n}\), \(\:rec{a}_{l},\) \(\:acc{u}_{y},\:\)and \(\:{F}_{measure}\) of 96.69%, 96.66%, 98.92%, and 96.67%, respectively.
Table 4; Fig. 9 illustrate the computational time (CT) analysis of the MDDoSFL-DRLFLO approach with existing models. The MDDoSFL-DRLFLO approach demonstrated the fastest processing speed with a CT of 8.28 s, significantly outperforming others such as CNN + WDLSTM with 20.98 s and SMOTE-RF with 21.58 s. GoogLeNet showed improved efficiency at 14.19 s, while CNN-WDLSTM and SPE-GoogLeNet recorded 20.66 s and 21.36 s respectively. CNN alone took 18.68 s, and RF had a CT of 18.39 s. These values highlight the variation in computational efficiency across models, with the MDDoSFL-DRLFLO method presenting a substantial improvement in speed, supporting faster decision-making in intrusion detection systems while maintaining high classification performance.
Table 5; Fig. 10 specify the ablation study of the MDDoSFL-DRLFLO methodology with existing techniques. The ablation study on the CICIDS-2017 dataset evaluates the efficiency of different techniques by comparing key performance metrics including \(\:acc{u}_{y}\), \(\:pre{c}_{n}\), \(\:rec{a}_{l}\), and \(\:{F}_{measure}\). The MDDoSFL-DRLFLO methodology demonstrates superior performance across all metrics, achieving \(\:acc{u}_{y}\) of 98.92%, \(\:pre{c}_{n}\) of 96.69%, \(\:rec{a}_{l}\) of 96.66%, and \(\:{F}_{measure}\) of 96.67%. In comparison, D3QN recorded \(\:acc{u}_{y}\) of 98.38%, \(\:pre{c}_{n}\) of 96.09%, \(\:rec{a}_{l}\) of 96.03%, and \(\:{F}_{measure}\) of 95.88%, while FLO achieved \(\:acc{u}_{y}\) of 97.88%, \(\:pre{c}_{n}\) of 95.42%, \(\:rec{a}_{l}\) of 95.49%, and \(\:{F}_{measure}\) of 95.21%. The IBFOA method exhibit relatively lower metrics, with \(\:acc{u}_{y}\) of 97.16%, \(\:pre{c}_{n}\) of 94.89%, \(\:rec{a}_{l}\) of 94.97%, and \(\:{F}_{measure}\) of 94.46%. These results confirm that the MDDoSFL-DRLFLO model presents a more effective solution for intrusion detection by delivering consistently higher classification performance.
The ToN-IoT dataset contains 119,957 instances under nine classes, as depicted in Table 6. The total number of features is 75, but only 40 have been selected.
Figure 11 presents the classifier analysis of the MDDoSFL-DRLFLO approach on the ToN-IoT dataset. Figure 11a and b shows the confusion matrices with correct recognition and classification of all classes under 70%TRPH and 30%TSPH. Figure 11c determines the PR investigation, specifying higher performance through all class labels. Besides, Fig. 11d exemplifies the ROC study, indicating proficient results with great ROC values for dissimilar classes.
Table 7 denotes the intrusion detection of the MDDoSFL-DRLFLO approach on the ToN-IoT dataset under 70%TRPH and 30%TSPH.
Figure 12 demonstrates the average result of the MDDoSFL-DRLFLO approach on the ToN-IoT database under 70%TRPH and 30%TSPH. The outcomes imply that the MDDoSFL-DRLFLO methodology correctly recognized the samples. With 70%TRPH, the MDDoSFL-DRLFLO methodology presents an average \(\:acc{u}_{y}\), \(\:pre{c}_{n}\), \(\:rec{a}_{l}\), \(\:{F1}_{score}\) \(\:MCC\), and Kappa of 99.52%, 92.78%, 90.43%, 91.33%, 91.09%, and 91.15%, correspondingly. With 30%TSPH, the MDDoSFL-DRLFLO technique presents an average \(\:acc{u}_{y}\), \(\:pre{c}_{n}\), \(\:rec{a}_{l}\), \(\:{F1}_{score}\), \(\:MCC\), and Kappa of 99.51%, 94.88%, 91.03%, 92.41%, 92.29%, 92.35%, respectively.
Figure 13 illustrates the training (TRA) \(\:acc{u}_{y}\) and validation (VAL) \(\:acc{u}_{y}\) curve of the MDDoSFL-DRLFLO method under the ToN-IoT dataset. The \(\:acc{u}_{y}\:\)analysis is computed over the range of 0–25 epochs. The figure highlights that the TRA and VAL \(\:acc{u}_{y}\) values demonstrate rising tendencies, which informed the capacity of the MDDoSFL-DRLFLO approach with higher outcomes over various iterations. As well as, the TRA and VAL \(\:acc{u}_{y}\) leftovers adjacent across the epochs, which identifies lesser overfitting and demonstrates the better result of the MDDoSFL-DRLFLO approach, guaranteeing incessant prediction on unidentified samples.
Figure 14 displays the TRA loss (TRALOS) and VAL loss (VALLOS) analysis of the MDDoSFL-DRLFLO methodology under the ToN-IoT dataset. The loss values are computed across an interval of 0–25 epochs. It is denoted that the TRALOS and VALLOS values exhibit reducing tendencies, informing the ability of the MDDoSFL-DRLFLO model to balance an exchange between generalization and data fitting. The continuous reduction in loss values pledges the maximum outcome of the MDDoSFL-DRLFLO model and gradually tunes the prediction outcomes.
Table 8; Fig. 15 compare the MDDoSFL-DRLFLO approach to the ToN-IoT dataset with the existing methods. The results emphasized that the TP2SF, Densely-Resnet, DFF, DenseNet, XGBoost, and Naïve Bayes (NB) methodologies have reported lower performance. Meanwhile, ANN methods have reached closer outcomes with \(\:pre{c}_{n}\), \(\:rec{a}_{l},\) \(\:acc{u}_{y},\:\)and \(\:{F1}_{score}\) of 91.67%, 90.03%, 99.44%, and 90.58%, respectively. Likewise, the MDDoSFL-DRLFLO model exhibited superior performance with maximal \(\:pre{c}_{n}\), \(\:rec{a}_{l},\) \(\:acc{u}_{y},\:\)and \(\:{F1}_{score}\) of 92.78%, 90.43%, 99.52%, and 91.33%, correspondingly.
Table 9; Fig. 16 indicate the CT assessment of the MDDoSFL-DRLFLO model with the existing technique. The results highlight the efficiency of the MDDoSFL-DRLFLO model, which completes the task in 6.60 s, significantly faster than other models. The NB model follows with a time of 9.72 s, while DenseNet method and TP2SF Method record CTs of 10.10 and 10.28 s, respectively. XGBoost model takes 10.82 s, DFF needs 11.41 s, and the ANN approach completes in 11.30 s. The slowest model, Densely-Resnet, takes 11.87 s. These results emphasize the superiority of the MDDoSFL-DRLFLO technique in terms of computational efficiency, presenting quicker response times suitable for real-time intrusion detection in IoT environments.
Table 10; Fig. 17 denote the ablation study of the MDDoSFL-DRLFLO methodology on the ToN-IoT dataset. The ablation study on the TON-IoT dataset accentuates the superior classification performance of the MDDoSFL-DRLFLO methodology, which achieves an \(\:acc{u}_{y}\) of 99.52% along with a \(\:pre{c}_{n}\) of 92.78%, \(\:rec{a}_{l}\) of 90.43%, and \(\:{F1}_{score}\) of 91.33%. Compared to this, the D3QN model records an \(\:acc{u}_{y}\) of 98.90%, \(\:pre{c}_{n}\) of 92.08%, \(\:rec{a}_{l}\) of 89.79%, and \(\:{F1}_{score}\) of 90.70%. FLO follows with \(\:acc{u}_{y}\) of 98.40%, \(\:pre{c}_{n}\) of 91.58%, \(\:rec{a}_{l}\) of 89.14%, and \(\:{F1}_{score}\) of 90.01%. The IBFOA technique trails with an \(\:acc{u}_{y}\) of 97.77%, \(\:pre{c}_{n}\) of 90.78%, \(\:rec{a}_{l}\) of 88.37%, and \(\:{F1}_{score}\) of 89.39%. These results highlight the enhanced classification capability and performance of the MDDoSFL-DRLFLO model across all key evaluation metrics. This consistent improvement confirms the robustness and effectiveness of the MDDoSFL-DRLFLO model in handling complex intrusion patterns within the TON-IoT dataset, making it highly suitable for real-time security applications.
Conclusion
This manuscript develops an MDDoSFL-DRLFLO technique. The proposed MDDoSFL-DRLFLO model presents a collaborative FL approach to recognize and classify DDoS attacks quickly. At first, data normalization is performed using z-score normalization to standardize the input data within a specified range. Next, the feature selection process is implemented by using IBFOA. In addition, the D3QN approach is employed for the classification process. Eventually, the hyperparameter tuning of the D3QN model is performed by using the FLO approach. A wide range of experimental studies are implemented to ensure the enhanced performance of the MDDoSFL-DRLFLO method under the CICIDIS 2017 and ToN-IoT datasets. The performance validation of the MDDoSFL-DRLFLO technique portrayed a superior accuracy value of 99.52% on existing techniques under diverse evaluation measures. The limitations of the MDDoSFL-DRLFLO technique comprise the reliance on a specific dataset, which may affect the model’s generalizability to other datasets with different characteristics. Furthermore, noise or data quality issues could affect the model’s performance, as it may not fully handle noisy or incomplete data. The study also does not account for real-time adaptation in dynamic environments, where the data distribution might change over time. Another limitation is the computational complexity, which could become prohibitive for massive datasets or more complex scenarios. Future work could explore incorporating TL models to improve generalization across diverse domains. Additionally, investigating the application of the model in real-time systems could help adapt it for dynamic, evolving environments. Further algorithm optimization to mitigate computational costs while maintaining accuracy could make it more scalable for large-scale implementations.
Data availability
The data that support the findings of this study are openly available in the Kaggle repository at https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset and https://www.kaggle.com/datasets/dhoogla/cictoniot. The implementation code supporting the findings of this study is available at the following link: https://github.com/zgenz1537-code/DDoSCyberAttackFL.
References
Zhang, H., Hao, J. & Li, X. A method for deploying distributed denial of service attack defense strategies on edge servers using reinforcement learning. IEEE Access. 8, 78482–78491 (2020).
Li, K., Zhou, H., Tu, Z., Wang, W. & Zhang, H. Distributed network intrusion detection system in satellite-terrestrial integrated networks using federated learning. IEEE Access. 8, 214852–214865 (2020).
Li, B. et al. DeepFed: federated deep learning for intrusion detection in industrial cyber–physical systems. IEEE Trans. Industr. Inf. 17 (8), 5615–5624 (2020).
Cao, D., Chang, S., Lin, Z., Liu, G. & Sun, D. Understanding distributed poisoning attack in federated learning. In 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS). 233–239. (IEEE, 2019).
Rahman, S. A., Tout, H., Talhi, C. & Mourad, A. Internet of things intrusion detection: Centralized, on-device, or federated learning? IEEE Netw. 34 (6), 310–317 (2020).
Doku, R., Rawat, D. B. & Liu, C. July. Towards federated learning approach to determine data relevance in big data. In 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). 184–192. (IEEE, 2019).
Lu, Y., Huang, X., Zhang, K., Maharjan, S. & Zhang, Y. Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles. IEEE Trans. Veh. Technol. 69 (4), 4298–4311 (2020).
Jere, M. S., Farnan, T. & Koushanfar, F. A taxonomy of attacks on federated learning. IEEE Secur. Priv. 19 (2), 20–28 (2020).
Phan, T. V. & Park, M. Efficient distributed denial-of-service attack defense in SDN-based cloud. IEEE Access. 7, 18701–18714 (2019).
Praveen, S. P. et al. Investigating the efficacy of deep reinforcement learning models in detecting and mitigating cyber-attacks: A novel approach. J. Cybersecur. Inform. Manag. 14(1), 96–113 (2024).
Majeed, U., Hassan, S. S., Han, Z. & Hong, C. S. DAO-FL: Enabling Decentralized Input and Output Verification in Federated Learning with Decentralized Autonomous Organizations. (Authorea Preprints, 2024).
Ataiefard, F. & Hemmati, H. December. Gray-box adversarial attack of deep reinforcement learning-based trading agents. In 2023 International Conference on Machine Learning and Applications (ICMLA). 675–682. (IEEE, 2023).
Saveetha, D., Maragatham, G., Ponnusamy, V. & Zdravković, N. An Integrated Federated Machine Learning and Blockchain Framework with Optimal Miner Selection for Reliable DDOS Attack Detection. (IEEE Access, 2024).
Ahanger, T. A., Tariq, U., Dahan, F., Chaudhry, S. A. & Malik, Y. Securing IoT devices running PureOS from ransomware attacks: Leveraging hybrid machine learning techniques. Mathematics 11(11), 2481 (2023).
Khoa, T. V. et al. April. Real-time cyberattack detection with collaborative learning for blockchain networks. In 2024 IEEE Wireless Communications and Networking Conference (WCNC). 1–6. (IEEE, 2024).
Aljrees, T., Kumar, A., Singh, K. U. & Singh, T. Enhancing IoT security through a green and sustainable federated learning platform: Leveraging efficient encryption and the Quondam signature algorithm. Sensors 23(19), 8090 (2023).
Ismail, S., Moudoud, H., Dawoud, D. & Reza, H. Blockchain-Based Zero Trust Supply Chain Security Integrated with Deep Reinforcement Learning. (2024).
Wang, N. et al. Hybrid KD-NFT: A multi-layered NFT assisted robust knowledge distillation framework for internet of things. J. Inf. Secur. Appl. 75, 103483 (2023).
Najar, A. A. & Naik, S. M. Cyber-secure SDN: A CNN-based approach for efficient detection and mitigation of DDoS attacks. Comput. Secur. 139, 103716 (2024).
Afraji, D. M. A. A., Lloret, J. & Peñalver, L. Deep learning-driven defense strategies for mitigating DDoS attacks in cloud computing environments. Cyber Secur. Appl., 3, 100085 (2025).
Najar, A. A. A Robust DDoS intrusion detection system using convolutional neural network. Comput. Electr. Eng. 117, 109277 (2024).
Dilshad, M., Syed, M. H. & Rehman, S. Efficient distributed denial of service attack detection in internet of vehicles using Gini index feature selection and federated learning. Future Internet 17(1), 9 (2025).
Najar, A. A., Sugali, M. N., Lone, F. R. & Nazir, A. A novel CNN-based approach for detection and classification of DDoS attacks. Concurrency Computation: Pract. Experience. 36 (19), e8157 (2024).
Popli, M. S., Singh, R. P., Popli, N. K. & Mamun, M. A federated learning framework for enhanced data security and cyber intrusion detection in distributed network of underwater drones. (IEEE Access, 2025).
Najar, A. A. & Manohar Naik, S. AE-CIAM: A hybrid AI-enabled framework for low-rate DDoS attack detection in cloud computing. Cluster Comput. 28(2), 103 (2025).
Nomikos, N. et al. A Distributed trustable framework for AI-aided anomaly detection. Electronics 14(3), 410 (2025).
Najar, A. A. & Manohar Naik, S. DDoS attack detection using MLP and random forest algorithms. Int. J. Inform. Technol. 14 (5), 2317–2327 (2022).
Shaha, P. et al. A prevalent model-based on machine learning for identifying DRDoS attacks through features optimization technique. Stat. Optim. Inform. Comput. 13 (1), 409–433 (2025).
Saheed, Y. K., Omole, A. I. & Sabit, M. O. GA-mADAM-IIoT: A new lightweight threats detection in the industrial IoT via genetic algorithm with attention mechanism and LSTM on multivariate time series sensor data. Sens. Int. 6, 100297 (2025).
Saheed, Y. K. & Misra, S. A voting Gray Wolf optimizer-based ensemble learning models for intrusion detection in the internet of things. Int. J. Inf. Secur. 23 (3), 1557–1581 (2024).
Saheed, Y. K. & Misra, S. CPS-IoT-PPDNN: A new explainable privacy preserving DNN for resilient anomaly detection in cyber-physical systems-enabled IoT networks. Chaos Solit. Fract. 191, 115939 (2025).
Saheed, Y. K., Abdulganiyu, O. H., Majikumna, K. U., Mustapha, M. & Workneh, A. D. ResNet50-1D-CNN: A new lightweight resNet50-One-dimensional convolution neural network transfer learning-based approach for improved intrusion detection in cyber-physical systems. Int. J. Crit. Infrastruct. Protect. 45, 100674 (2024).
Oscar, F. et al. Cybersecurity approaches to IoT platforms in E-Healthcare systems: Artificial intelligence application. In AI-Driven Healthcare Cybersecurity and Privacy. 89–124. (IGI Global Scientific Publishing, 2025).
Saheed, Y. K. & Chukwuere, J. E. Xaiensembletl-iov: A new explainable artificial intelligence ensemble transfer learning for zero-day botnet attack detection in the internet of vehicles. Results Eng. 24, 103171 (2024).
Manoj, T., Makkithaya, K. & Narendra, V. G. A Blockchain-assisted trusted federated learning for smart agriculture. SN Comput. Sci. 6(3), 221 (2025).
Saheed, Y. K., Misra, S. & Chockalingam, S. Autoencoder via DCNN and LSTM models for intrusion detection in industrial control systems of critical infrastructures. In 2023 IEEE/ACM 4th International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS). 9–16. (IEEE, 2023).
Domb, M., Balaji, C. G., Menaka, S., Gayathri, A. & Joshi, S. Securing 5G networks by mitigating cybersecurity risks for transformative applications. Int. J. Interact. Mob. Technologies, 19(11), 227–255 (2025).
Saini, H. et al. A hybrid machine learning model with self-improved optimization algorithm for trust and privacy preservation in cloud environment. J. Cloud Comput. 13(1), 157 (2024).
Ikram, A. & Aslam, W. Enhancing intercropping yield predictability using optimally driven feedback neural network and loss functions. (IEEE Access, 2024).
Zaini, F. A., Sulaima, M. F., Razak, I. A. W. A., Othman, M. L. & Mokhlis, H. Improved bacterial foraging optimization algorithm with machine learning-driven short-term electricity load forecasting: A case study in Peninsular Malaysia. Algorithms 17(11), 510 (2024).
Feng, X., Han, J., Zhang, R., Xu, S. & Xia, H. Security defense strategy algorithm for Internet of Things based on deep reinforcement learning. High-Confid. Comput. 4(1), 100167 (2024).
Falahah, I. A. et al. Frilled Lizard Optimization: A Novel Bio-Inspired Optimizer for Solving Engineering Applications. Vol. 79. 3. (Computers, Materials & Continua, 2024).
https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset
Yang, H., Xu, J., Xiao, Y. & Hu, L. SPE-ACGAN: A resampling approach for class imbalance problem in network intrusion detection systems. Electronics 12(15), 3323 (2023).
Tareq, I., Elbagoury, B. M., El-Regaily, S. & El-Horbaty, E. S. M. Analysis of ton-iot, unw-nb15, and edge-iiot datasets using dl in cybersecurity for iot. Appl. Sci. 12(19), 9572 (2022).
Acknowledgements
This Project was funded by KAU Endowment (WAQF) at King Abdulaziz University, Jeddah, Saudi Arabia, under grant no. (WAQF: 188-145- 2024). The authors, therefore, acknowledge with thanks WAQF and the Deanship of Scientific Research (DSR) for technical and financial support.
Author information
Authors and Affiliations
Contributions
Mahmoud Ragab: Conceptualization, methodology development, formal analysis, project administrator. Bandar Alghamdi: Formal analysis, methodology development, validation, writing. Almuhannad S. Alorfi: Formal analysis, investigation, validation, visualization. Louai A. Maghrabi: Methodology development, experiment, formal analysis, writing. Diaa Hamed: Methodology, investigation, review and editing. Amal Alharbi: Conceptualization, software, experiment, investigation, visualization. Abdullah AL-Malaise AL-Ghamdi: Supervision, discussion, review and editing. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Maghrabi, L.A., Ragab, M., Alghamdi, B. et al. Mitigating distributed denial of service-based cyberattack in federated computing framework using deep reinforcement learning with frilled lizard algorithm. Sci Rep 15, 40197 (2025). https://doi.org/10.1038/s41598-025-23899-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-23899-8



















