Abstract
Cybersecurity has been defined as a vital part of the developments, which is mainly related to technology. Enlarged cybersecurity safeguards that the data remains safe. Cyberattacks like computer malware, denial-of-service (DoS) attacks, or unauthorized access led to severe damage and economic losses in large-scale systems. Cybersecurity includes decreasing the risk of mischievous computer, software, and network attacks. Novel techniques have been combined into emerging artificial intelligence (AI) that attains cybersecurity. ML is usually reflected as a sub-branch of AI, which is closely linked to data mining, computational statistics, and data science (DS) and mainly concentrates on generating computers to acquire data. Federated learning (FL) is one of the ML models that permits tackling cyberattack issues like security, data privacy, and access rights. This study proposes a Self-Attention Mechanism-Driven Federated Learning for Secure Cyberattack Detection with Crocodile Optimization Algorithm (SAMFL-SCDCOA) methodology. The main objective of the SAMFL-SCDCOA methodology is to provide an effective method for preventing cyberattacks in real time using FL and advanced optimization algorithms. Initially, the Z-score normalization is utilized to scale and standardize data to improve analysis consistency and accuracy. Furthermore, the feature selection (FS) process uses the crocodile optimization algorithm (COA) model. The proposed SAMFL-SCDCOA approach employs the gated recurrent unit with a self-attention (GRU-SA) model for the cybersecurity classification. Finally, the improved pelican optimization algorithm (IPOA) optimally adjusts the hyperparameter values of the GRU-SA model, leading to enhanced classification performance. A wide range of experiments has been accomplished to validate the performance of the SAMFL-SCDCOA technique under the CICIDS-2017 dataset. The comparison study of the SAMFL-SCDCOA technique emphasized a superior output of 99.04% over existing models.
Similar content being viewed by others
Introduction
With increasing dependence on the Internet of Things (IoT) and digitalization, numerous security incidents, namely zero-day attacks, malware attacks, DoS, phishing, or social engineering, have increased exponentially1. These threats are intended to collect consumer data like passwords or credit card numbers and distribute data without the consumer’s authorization. Malware is software that can damage systems and data2. It attacks individuals and companies, governments, and organizations comprising either military or civil infrastructures, which risk losing productive data and reputation. For example, in 2010, fewer than fifty million single malware executables were familiar to the security community3. Attacks and cybercrime may cause destructive economic losses and attack organizations and people. Cybersecurity is a group of processes and technologies to safeguard networks, programs, computers, and data from threats, unauthorized access, or damage4. Furthermore, cybersecurity protects computer systems and their data on perception and from harmful disruption or damage. Cybersecurity is enduring considerable changes in technology and its processes in the background of DS, and computing is driving the variations; a vital part of AI can perform a crucial role in finding the visions from data5.
ML can substantially modify the cybersecurity environment, and DS has led to a novel scientific model. The accessibility of advanced ML is being paid greater attention to. It is possible to remove complicated and useful data models utilizing massive datasets from a central position6. Even though various companies use highly trained ML approaches to grant information at low computational costs, multiple confidentiality and privacy concerns are not approached. FL is projected to be a favourable concept for ML. FL is an ML model that permits addressing concerns like security, access rights, and data privacy7. Conceptually, FL allows diverse gadgets to learn collective ML techniques without the requirement of data sharing with the centralized hub. Compared to the central learning concept, each gadget should load its data to the server for central learning and storage; FL can resolve confidentiality problems, decrease latency, and increase reliability and scalability8. These aspects are allowed by the reality that learning and computing challenges are extensively distributed around multiple gadgets in the system, furnished with edge computing ability, and are entirely dependable for broadcasting and model aggregation. FL is utilized in various regions like wireless and mobile networking, vehicle communication, and healthcare9. Several FL-based applications have recently been projected for autonomous driving, healthcare, keyword spotting, and mobile applications. FL developed a scalable production structure for autonomous study and ML through various gadgets and fields10. The key role of FL is to safeguard and promote effectiveness and privacy when analyzing these data scientists. Figure 1 depicts the structure of FL.
Architecture of FL.
This study proposes a Self-Attention Mechanism-Driven Federated Learning for Secure Cyberattack Detection with Crocodile Optimization Algorithm (SAMFL-SCDCOA) methodology. The main objective of the SAMFL-SCDCOA methodology is to provide an effective method for preventing cyberattacks in real time using FL and advanced optimization algorithms. Initially, the Z-score normalization is utilized to scale and standardize data to improve analysis consistency and accuracy. Furthermore, the feature selection (FS) process uses the crocodile optimization algorithm (COA) model. The proposed SAMFL-SCDCOA approach employs the gated recurrent unit with a self-attention (GRU-SA) model for the cybersecurity classification. Finally, the improved pelican optimization algorithm (IPOA) optimally adjusts the hyperparameter values of the GRU-SA model, leading to enhanced classification performance. A wide range of experiments has been accomplished to validate the performance of the SAMFL-SCDCOA technique under the CICIDS-2017 dataset. The key contribution of the SAMFL-SCDCOA technique is listed below.
-
The SAMFL-SCDCOA model utilizes Z-score normalization to pre-process the data, ensuring standardized input features. This approach improves the accuracy and consistency of the data by removing biases. It assists in improving the overall performance of the cybersecurity classification task.
-
The SAMFL-SCDCOA approach employs the COA technique for feature selection, concentrating on the most relevant features. Eliminating irrelevant or redundant data enhances the model’s efficiency, significantly improving its accuracy and processing speed.
-
The SAMFL-SCDCOA methodology implements the GRU-SA model for cybersecurity classification, incorporating GRU with self-attention mechanisms. This incorporation allows the model to effectively capture long-term dependencies and highlight critical features, enhancing its capability to classify cybersecurity threats more accurately and efficiently.
-
The SAMFL-SCDCOA method utilizes the IPOA technique to fine-tune the hyperparameters of the GRU-SA model. This optimization confirms that the model operates at peak efficiency, improving its classification accuracy. By adjusting key parameters, IPOA enables the model to attain superior performance in cybersecurity tasks.
-
The SAMFL-SCDCOA model integrates COA for feature selection and IPOA for hyperparameter tuning within a GRU-SA framework, giving a novel approach to improving cybersecurity classification. This integration ensures the selection of the most relevant features while optimizing the model’s parameters for optimum efficiency. The novelty is that COA and IPOA can be used to improve the model’s capability to handle complex cybersecurity tasks effectively.
Review of literature
Awan et al.11 projected an innovative FL structure, SecEdge, to improve real-world cybersecurity in mobile IoT settings. The SecEdge structure incorporates transformer-based techniques for effectively dealing with graph neural networks (GNN) and long-range dependency for relational data modelling, together with FL, to reduce latency and safeguard data privacy. Folino et al.12 projected a scalable vertical FL (SVFL) structure intended to handle the classification task in the area of cybersecurity. SVFL integrates vertical FL (VFL) with an extensible computing structure, allowing effective large-scale analysis and a sensitive cybersecurity database while keeping data privacy. The framework is malleable to multiple use cases, extendable for enhancing data volumes, and strong in adversarial and dynamic settings. The VFL model also guarantees a good trade-off between performance and privacy. Devarajan et al.13 developed an FL method that safeguards data confidentiality while allowing decentralized training through multiple AVs of an advanced object detection model, Yolov7-E. Grad-CAM visualization models are applied to demonstrate the FL approaches, and for better interpretability, semantical data is incorporated with object detection. This analysis also improves security over the FL technique. Bukhari et al.14 examined a cutting-edge deep hybrid learning method in an asynchronous FL structure to enhance cyber threats’ recognition and guarantee strong data confidentiality. The integration of GRU, LSTM, and CNN models presents an effectual solution for rapidly recognizing irregularities in IIoT sensor traffic. This model functions asynchronously, guaranteeing data remains localized to enhance security while evading the requirement for wide-ranging node synchronization. Wang et al.15 developed an IoT gadget trust management method depending on the blockchain (BC) and FL by leveraging FL to evaluate gadget reputation ratings while securing their confidentiality. The decentralization method succeeded in employing BC rather than a central server in FL. Moreover, a weighted aggregation method is presented depending on gadget features to attain a more accurate global method over the weighted aggregation of the local method in FL.
In16, an FL-based CTI framework that depends on information fusion (FL-CTIF) for safeguarding IIoT settings is projected. This work intends an extensive cyberattack database with upgraded FS employing IF to enhance the precision in recognizing cyber threats. Cao et al.17 introduce AP-CFL, an innovative clustered FL model which integrates affinity distribution to dynamically identify the clustering framework of clients without the necessity to pre-define the number of clusters. AP-CFL assesses the mean absolute differences of pairwise cosine correspondence to effectually cluster clients depending on connections in their data distributions. Moreover, a robust approach is presented to handle partial client participation using a date and time importance index. Doriguzzi-Corin and Siracusa18 introduced an adaptive FL method to DDoS attack detection (FLAD). Utilizing a current database of DDoS attacks, the model determines that FLAD outperforms advanced FL models in terms of convergence accuracy. Kathole et al.19 propose Solar Energy Prediction in IoT systems based on an optimized Complex-Valued Spatio-Temporal Graph Convolutional Neural Network (SEP-CVSGCNN-IoT) technique, utilizing Data-Adaptive Gaussian Average Filtering (DAGAF) model for data pre-processing, Nutcracker Optimization (NCO) method for feature selection, and Dipper Throated Optimization Algorithm (DTOA) approach to optimize the CVSGCNN weights for accurate solar energy prediction. Selvarajan et al.20 introduce a Cyber-Physical Systems (CPS) framework to reduce intrusions by distinguishing true and false data samples. The framework mitigates data loss using a learning technique with generators and discriminators. Gupta et al.21 propose a distributed IoT attack detection system utilizing CNN and FL, optimized by Siberian Tiger Optimization (STO), achieving high accuracy, recall, and precision despite latency challenges.
Kathole et al.22 present a secure federated cloud storage system utilizing hybrid attribute-based encryption (ABE) and permissioned BC for data confidentiality. It incorporates Hybrid Mexican Axolotl with the Energy Valley Optimizer (HMO-EVO) technique for optimal encryption key generation. It utilizes FL with Multi-scale Bi-Long Short-Term Memory and Gated Recurrent Unit (MBiLSTM-GRU) method for accurate disease prediction. Hossain et al.23 propose the Marine Radar Security with Personalized FL-based Intrusion Detection System (MRS-PFIDS) methodology to detect cyberattacks while ensuring data privacy using CNN and TensorFlow Federated in a non-IID data setting. Rabie et al.24 develop a security framework for IoT using decisive red fox (DRF) optimization and descriptive back propagated radial basis function (DBRF) classification to improve intrusion detection accuracy, speed, and error reduction. Ullah et al.25 introduce SecNet-FL-based Intrusion Detection System (FLIDS), a BC-based FL model for collaborative cyberattack detection in the Internet of Vehicles (IoV), using TOP-K node selection, Synthetic Minority Over-sampling Technique-Edited Nearest Neighbors (SMOTE-ENN), and Transformer networks to improve privacy, security, and accuracy. Prashanth et al.26 use optimal feature selection methods to improve IDS by choosing the optimum features from raw data collected through sensors for improved classification. Sun et al.27 propose the personalized federated cross-learning framework (pFedCross) technique for intrusion detection, addressing imbalanced and heterogeneous data distributions through a collaborative cross-aggregation algorithm and a gradient approximation α-fairness approach, enhancing model accuracy and mitigating loss in IoT security. Selvarajan et al.28 propose an improved machine learning (ML) model for attack prediction and classification in SCADA systems, utilizing mean-shift clustering for data grouping, genetically seeded flora optimization for feature selection, and Boltzmann ML for classification.
The existing studies on FL-based security frameworks for IoT and cyberattack detection have various limitations. Many studies depend on centralized approaches or concentrate on specific attack types, restricting their scalability and adaptability across various IoT environments. Additionally, incorporating FL with advanced ML techniques has not been fully optimized for handling imbalanced and non-IID data distributions. There is a lack of comprehensive evaluation of these models in dynamic and adversarial environments. Moreover, the models often do not fully address real-time requirements and latency issues, which are significant for time-sensitive IoT applications. Finally, while SMOTE-ENN improves data balance, their efficiency in large-scale, heterogeneous datasets remains underexplored.
Proposed methodology
This paper develops a novel SAMFL-SCDCOA model. The main objective of the model is to provide an effective method for preventing cyberattacks in real time using FL and advanced optimization models. The SAMFL-SCDCOA technique accomplishes this using z-score normalization, a COA-based FS process, attack classification using GRU-SA, and a parameter optimizer. Figure 2 depicts the workflow of the SAMFL-SCDCOA technique.
Workflow of SAMFL-SCDCOA model.
Data normalization: Z-score
At the primary stage, the data normalization is executed by the Z-score normalization to scale and standardize data to improve consistency and accuracy for analysis29. This is chosen for pre-processing as it standardizes the data, giving it a mean of zero and a standard deviation (SD) of one. This ensures that diverse-scale features do not disproportionately influence the model’s performance. By transforming the data into a consistent range, Z-score normalization assists the model converge faster and enhances the stability of optimization algorithms. Unlike min–max scaling, Z-score normalization is less sensitive to outliers, making it an ideal choice when dealing with datasets that may contain extreme values. This technique benefits models like GRU-SA, where the input data can have varied distributions. Overall, Z-score normalization is efficacious in improving model accuracy and consistency.
Z-score normalization is also known as standardization, which is a method employed to measure numerical data by converting it into a normal distribution with an SD of one and a mean of zero. In cybersecurity, it is vital for intrusion detection systems (IDS) and anomaly recognition, as it certifies that features with dissimilar measures do not excessively influence ML approaches. Transforming raw data into a uniform measure aids in perceiving outliers, which can specify potential cyber-attacks, such as malware activities or network intrusions. This model improves the performance of cybersecurity techniques by enhancing training stability, data consistency, and detection accuracy.
FS Process: COA
Then, the FS process is performed by COA30. This technique is chosen because it can effectively explore complex and high-dimensional search spaces. COA is inspired by the hunting behaviour of crocodiles, which assists it in avoiding local optima and finding globally optimal solutions. Unlike conventional feature selection techniques, such as recursive feature elimination or random forests (RF), COA presents a more effectual and robust search mechanism for massive and noisy datasets. It also needs fewer computational resources than exhaustive search methods, making it scalable for large datasets. COA is highly effective in choosing the most relevant features, enhancing model accuracy, and mitigating overfitting. Its adaptability and efficiency make it an ideal choice for feature selection in complex tasks like cybersecurity classification. Figure 3 illustrates the flow of the COA method.
Overall flow of the COA method.
The COA is a population-based optimizer method inspired by the crocodile’s predatory tendencies. It is recommended as a meta-heuristic inspired by nature for solving complex optimizer problems in many areas. The primary aim of COA is to imitate crocodile searching, which is notorious for its precision and efficiency in finding and capturing prey. To characterize promising solutions to an optimizer problem, the model initializes a set of candidate solutions inside the specified search area. All possible solutions fit in the range of possible parameters or arrangements. Iteratively increasing the optimizer process mimics the crocodile’s natural interactions and motions. This approach traverses the searching region and approaches an ideal solution by integrating exploitation and exploration strategies. According to a pre-determined objective function, COA evaluates the quality or acceptability of all possible solutions at all iterations. These functions calculate how much a solution gratifies an optimization problem’s needs or objectives. More significant goodness‐of‐fit numbers specify better potential and are thus more likely to be allocated for further study. COA gives tactics for exploitation and exploration, which are modelled after the crocodile’s searching behaviour.
whereas, \(\overrightarrow {ux} , \overrightarrow {uy}\)‐Modifiable parameters, \(A_{b}\)‐all particle’s position inside the larger‐odor area, \(A\)‐where the mould is positioned, \(A_{A} \& A_{B}\) randomly selected variables from the slime mould and \(M\)‐mass of the slime mould.
The following is the maximal limitation of \(p\):
Here, the enormous fitness benefit during each iteration is considered by \(GF\), and \(P(i)\) ranks the first part of the population. Equation (3) describes the vibration parameter.
The vibration parameter \(uy\) and the fitness weight \(w\) are balanced among exploitation and exploration. SmellIndex provides an ordered fitness value sequence (appearing for the most challenging tasks, arising for the small ones), as established by Eq. (6).
Equation (1) indicates that the optimal location \(A\) presently made should be applied to upgrade search agent \(A\)’s position. The location of the individual is adjusted by changing the values for \(uy\) and \(w\). People can build search routes at some angle, meaning they can discover important space in some direction by selecting dual random variables from slime mould. Therefore, the model may define the optimum response. It makes it possible for someone to hunt for the optimum solution for approaching food using the circular method of slime mould from all angles. It is furthermore intelligible to lengthen this concept to a hyper‐dimensional area. Equation (5) indicates that the study is completed on the negative and positive feedback regarding the slime mould’s vein width and food content. Meanwhile, the venous shrinkage uncertainty mode is characterized by module \(r\). The value of shrinkage frequency is preserved constant by utilizing the log to decrease the step at which numeric values vary. It imitates how slime mould can change its search designs regarding food quality.
Wrap food The area weight is greater after more food is nearby; the area weight is less once there is less food nearby; attention is directed to other regions, which must be considered. Equation (6), which is derived from the principle, gives the mathematical formulation for upgrading the location of the slime mould or covering the food items.
Here, WA and \(XA\) characterize the upper and lower limits of the presented model’s search space, whereas \(rand\) and \(r\) denote the random value in [0,1]
Crocodiles often show violent behaviour targeted at catching and taking benefit of nearby objectives once they come across promising prey. During this related vein, COA highlights solutions that display ability or are positioned in higher-fitness regions of the search area to highlight growth. Nevertheless, COA has an exploration module to pledge diversity and prevent an initial convergence to less‐than‐ideal solutions. Occasionally, crocodiles engage in novel areas of their environment to discover the best places to search or novel depleted prey. Like that, COA improves randomness and variation to the optimizer procedure, allowing the model to steer safe from local bests and discover new solutions. The amount of candidate solutions is iteratively attuned within the COA movement and updated mechanism, capturing the fitness value in consideration. Higher fitness solutions undergo a higher probability of surviving and turning over to the next generation, and variety and variation are included by mutation and crossover processes. The optimizer procedure continues in circles until an end requirement is fulfilled, like attaining maximum iteration counts, achieving a proper amount of solution quality, or diverging from a steady solution; the COA provides a more substantial and adjustable optimizer structure for measured things, drawing stimulation from crocodiles’ searching insights. COA is advantageous to dealing with a range of optimizer problems in numerous parts because it can effectively navigate composite search areas and affiliate to near‐optimum solutions by using exploitation and exploration strategies.
The fitness function (FF) reflects the classifier’s accuracy and the number of FSs. It exploits classification accuracy and reduces the set dimension of the selected features.
Here, \(ErrorRate\) denotes a classifier ratio of error utilizing the nominated features. \(ErrorRate\) is intended as the ratio of incorrect categorizations to the number of classifications made between 0 and 1. \(\#SF\) means the number of chosen features, and \(\#All\_F\) refers to the total count of features in the original data. \(\alpha\) is employed for controlling the significance of classifier quality and sub-set length.
Attack classification: GRU-SA model
For the cybersecurity classification method, the proposed SAMFL-SCDCOA approach utilizes the GRU-SA model31. This approach is chosen due to its capability to capture long-term dependencies in sequential data, which is crucial for cybersecurity tasks that involve time-series data, such as network traffic analysis. The integration of GRU allows for the effectual handling of sequences with lesser computational resources than conventional LSTM models, making it appropriate for real-time applications. The addition of SA improves the capability of the model to concentrate on significant features, enhancing its performance in detecting subtle patterns indicative of cyberattacks. This hybrid approach allows for more accurate classification, particularly in intrinsic and dynamic attack scenarios. Compared to simpler models like feedforward neural networks (FNNs) or decision trees (DTs), GRU-SA presents a better balance of computational efficiency and classification accuracy, making it ideal for cybersecurity applications. Its capability to learn from past data and highlight key features is particularly beneficial for dynamic and growing attack vectors. Figure 4 exhibits the structure of the GRU-SA model.
GRU-SA architecture.
This mechanism attains weight significance of input features over the learning procedure, so achieving the weighted average of numerous inputs is necessary to consider the model’s attention to dissimilar inputs in the time series. Its primary aim is to remove basic data from the input features, helping the method take and define dynamical features inside the data stream. This mechanism’s main benefit is its capability to concentrate on particular characteristics, successfully reducing bottlenecks in the data processing procedure. Furthermore, by determining direct contact between the decoder and encoder, this mechanism considerably alleviates the gradient vanishing problem. It includes handling input information, distributing and calculating attention scores, and the last-generation output. In detail, this mechanism initially accepts the input information, removes significant characteristics, concentrates on basic information, and finally makes the preferred output. This mechanism contains a sequence of steps comprising the computation of priority weights, the delivery of contextual vectors, and attention scores. These stages are performed sequentially: Initially, the attention scores are measured, then the weighting is established, and lastly, the contextual vectors are made. For alignment, this method utilizes the encoding of hidden layers (HLs) \({h}_{j}\) and the preceding decoding output \({S}_{t-1}\) to measure the score \({e}_{t,j}\) that calculates the alignment degree among all components of the input sequence and the output at location \(t.\)
As presented in Eq. (8), the attention scores are gained over the function commonly applied utilizing the feedforward NN. Afterwards, using the softmax function to the attention scores, the weighting \({a}_{t,j}\) is calculated, as defined in Eq. (9). The contextual vector \({c}_{t}\) is unique and measured as the weight amount of every encoding HL. The computation procedure was demonstrated in Eq. (10).
The design of this mechanism successfully decreases computational memory overhead and alleviates disturbance from noisy and redundant data. The GRU is an enhanced RNN method derived from the LSTM model, tailored to address the challenges conventional RNNs face after processing longer sequences and long‐range dependencies. However, the additional decreasing complexity of the model reduces the training parameter counts and enhances operational efficacy. The calculation model for the GRU-HL at time \(t\) is as shown:
During the above equations,\({ }r_{t}\) and \(z_{t}\) represent reset and update gate, individually, establishing to what extent \(h_{t}\) are upgraded and the maintenance level of \(h_{t - 1} .\tilde{h}_{t}\) refers to candidate HL, measured depending on the present input \(x_{t}\) and the HL \(h_{t - 1}\) from the preceding time step. \(h_{t}\) stands for the last HL, originated from the weight incorporation of \(h_{t - 1}\) and \(\tilde{h}_{t}\), established by the gate of update \(z_{t} .\) \(\sigma\) signifies the function of the sigmoid ⨀ and characterizes element‐to-element multiplication, and \(W_{z} ,\) \(U_{z} ,\) \(W_{r} ,\) \(U_{r} ,\) \(W_{h}\), and \(U_{h}\) exemplify the weighted matrices equivalent to the update, reset gates, and HL networking states.
Parameter optimizer: IPOA technique
Finally, the IPOA adjusts the hyperparameter values of the GRU-SA approach optimally, resulting in more excellent classification performance32. This technique is chosen because it can effectively explore the solution space and fine-tune hyperparameters for improved model performance. IPOA is specifically effectual in avoiding local minima, a common challenge in optimization tasks, by employing an improved search mechanism inspired by the pelican’s behaviour. Compared to conventional optimization techniques like Grid Search or Genetic Algorithms (GAs), IPOA gives faster convergence with more accurate results, making it appropriate for real-time cybersecurity applications. The capability of the technique to balance exploration and exploitation confirms that optimal hyperparameters are found without excessive computational costs. Moreover, the flexibility and robustness of the IPOA method allow it to handle the complex and high-dimensional search space typical in ML models, improving the GRU-SA model’s classification performance. Its computational efficiency and global search capabilities make it superior to other optimization techniques for fine-tuning parameters in cybersecurity systems. Figure 5 demonstrates the steps involved in the IPOA technique.
Steps involved in the IPOA approach.
Even though the conventional POA is a meta-heuristic model, it develops lower solution precision, slow convergence speed, and lack of stability, and it may simply drop into the local best solution, making it challenging to encounter the requirements to solve composite power system optimizer and schedule difficulties with many goals, non-linearity, and numerous limitations. To boost the precision and speed of the model, this paper enhances the POA by presenting the non-linear weighting feature, sparrow alert mechanism, and Cauchy variation approach that enhance the model’s solution precision and prevent the model from dropping into the local best solution. According to the above-mentioned optimization series of the model, this study offers an IPOA-based fusion of the Cauchy variation optimization and sparrow vigilance mechanism. This section comprehensively defines the IPOA’s principles and major development approaches.
Tent mapping‐based population initialization
Chaotic methods are considered by randomness, ergodicity, and regularity, and the series of chaotic made by chaotic methods can well initialize the dispersal of the first population of individuals. Consequently, Tent mapping was presented in the IPOA’s initialization process to improve the primary population travers ability.
whereas \(x_{i}\) refers to the location of the \(i th\) pelican; \(1_{b}\) and \(u_{b}\) denote the lower and upper bound of the variable boundaries; \(z_{i}\) signifies chaos sequence; and \(\alpha\) means constant typically selected as 0.3.
Presenting a non-linear weighting feature in the exploration stage
The global search and local optimizer capability of the coordination component heuristic model is the leading causes influencing the optimizer speed and precision of the model. Then, the upgrade of individual pelican location is closely associated with the present pelican location, and a non-linear inertia weighting feature has been applied to fine-tune the connection between the upgrade of pelican location and the data on the present pelican location. Like the optimization procedure growths, the \(\omega\) value gets advanced, the present pelican location additionally influences the upgrade of the best individual location, and the optimizer range of the model decreases, which assists the model in looking for the best solution.
Meanwhile, \(t\) signifies present iteration counts, and \(T\) denotes maximal iteration counts.
Presenting the Cauchy variation approach in the development stage
The Cauchy variation approach is presented during this pelican model development stage. After the fitness value of the pelican is greater than the population’s average fitness value, it represents that the present positions of pelicans are connected. Currently, this method is presented to improve the pelican’s diversities. The updated model for the new pelican position has been applied. The distribution function of the Cauchy variation is as shown:
whose equivalent density function is
The location-updated equation is
On the other hand, \(g\) denotes the scale parameter that contains a value superior to \({\text{zero}}\), and \(p\) refers to uniformly distributed random numbers among \(\left[ {0,1} \right].\)
Sparrow alert mechanism
The Sparrow Alert mechanism is presented to the POA’s development stage. Combining the pelican into this mechanism creates a quicker convergence speed for the POA. Pelicans neighbouring the group will transfer rapidly to the safer region to attain an improved location after they understand the danger, and pelicans positioned between the group will randomly walk to become nearer to another pelican. The location-updated equation is presented below:
Here \(X_{best,j}^{t}\) is signified as the \(j th\) size of the global optimum location after the iteration counts are \(t,\) \(\beta\) refers to step fine-tuning parameter following standard distribution with variance \({\text{one}}\) and mean \({\text{zero}}\); \(K\) signifies randomly formed number from the interval of [1, 1]; \(\varepsilon\) denotes non‐0 constant assigned to the dimensions of \(10^{ - 50} , \cdot f_{i}\) represents fitness value of the present individual; \(f_{g}\) symbolize value of global optimum individual fitness of present iteration; and \(f_{w}\) stands for value of global poor individual fitness of present iteration. The IPOA technique develops an FF to attain a more significant solution for the classifier. Here, the classification ratio of error minimization is measured as FF. Its mathematical formulation is given in Eq. (23).
Performance validation
The performance validation of the SAMFL-SCDCOA technique is examined under the CICIDS-2017 dataset33. This dataset contains 10,973 records under five classes, as depicted in Table 1. It has 78 attributes, but only 30 attributes are selected.
Figure 6 established the classifier outcomes of the SAMFL-SCDCOA approach on the CICIDS-2017 dataset. Figure 6a, b demonstrates the confusion matrices with correct recognition and classification of every class below 70%TRPH and 30%TSPH. Figure 6c shows the PR curve, representing superior performance over each class. Simultaneously, Fig. 6d demonstrates the ROC values, demonstrating proficient outcomes with better ROC analysis for different class labels.
CICIDS-2017 dataset (a, b) confusion matrix, (c, d) PR and ROC curves.
Table 2 and Fig. 7 represent the cyberattack detection of the SAMFL-SCDCOA technique on the CICIDS-2017 dataset. The outcome detailed that the SAMFL-SCDCOA technique has correctly classified all the dissimilar classes. Based on 70% TRPH, the proposed SAMFL-SCDCOA technique gains an average \(acc{u}_{y}\) of 99.04%, \(pre{c}_{n}\) of 97.45%, \(rec{a}_{l}\) of 97.55%, \({F}_{score}\) of 97.49%, and \({AUC}_{score}\) of 98.47%. Moreover, depending on 30% TSPH, the proposed SAMFL-SCDCOA model attains an average \(acc{u}_{y}\) of 99.14%, \(pre{c}_{n}\) of 97.68%, \(rec{a}_{l}\) of 97.85%, \({F}_{score}\) of 97.76%, and \({AUC}_{score}\) of 98.66%.
Average of SAMFL-SCDCOA technique under CICIDS-2017 dataset.
Figure 8 illustrates the training (TRA) and validation (VAL) \(acc{u}_{y}\) analysis of the SAMFL-SCDCOA methodology on the CICIDS-2017 dataset. The \(acc{u}_{y}\) analysis is calculated across the range of 0–25 epochs. The figure highlights that the TRA and VAL \(acc{u}_{y}\) analysis exhibits an increasing tendency, demonstrating the SAMFL-SCDCOA approach’s capacity for maximum outcome across multiple iterations.
\(Acc{u}_{y}\) curve of SAMFL-SCDCOA technique at CICIDS-2017 dataset.
Figure 9 shows the TRA loss (TRALOS) and VAL loss (VALLOS) curves of the SAMFL-SCDCOA approach on the CICIDS-2017 dataset. The loss values are computed across an interval of 0–25 epochs. The TRALOS and VALLOS values exemplify a decreasing trend, informing the capability of the SAMFL-SCDCOA approach in balancing a trade-off between generalization and data fitting.
Loss analysis of SAMFL-SCDCOA technique under the CICIDS-2017 dataset.
Table 3 and Fig. 10 inspect the comparative outcomes of the SAMFL-SCDCOA technique on the CICIDS-2017 dataset with the existing approaches24,25,34,35,36. The results emphasized that the RF, DT, KNN, AdaBoost, DBN-KELM, RNN, and Gradient Boosting methods have reported worse performance. Furthermore, the SAMFL-SCDCOA methodology reported better performance with maximum \(pre{c}_{n}\), \(rec{a}_{l},\) \(acc{u}_{y},\) and \({F}_{score}\) of 97.68%, 97.85%, 99.14%, and 97.76%, respectively.
Comparative analysis of SAMFL-SCDCOA methodology on the CICIDS-2017 dataset.
Table 4 and Fig. 11 demonstrate the computational time (CT) analysis of the SAMFL-SCDCOA technique compared to existing methods. Among the models, AdaBoost exhibit the fastest CT of 6.95 s, followed by SMOTE-ENN at 7.74 s. The SAMFL-SCDCOA model outperforms with the lowest CT of 5.07 s, illustrating its efficiency in processing the dataset. Other models like RF, DT, and K-Nearest Neighbors (KNN) take more time, ranging from 11.48 to 12.74 s. This suggests that the SAMFL-SCDCOA model is specifically optimized for faster processing while maintaining performance, making it appropriate for real-time intrusion detection scenarios.
CT evaluation of SAMFL-SCDCOA methodology with existing models on the CICIDS-2017 dataset.
Also, the proposed SAMFL-SCDCOA technique is examined under the UNSW-NB15 dataset37. This dataset contains 13,009 records under seven classes, as depicted in Table 5. There are 48 features in total, but only 33 features are selected.
Figure 12 donates the classifier outcomes of the SAMFL-SCDCOA model on the UNSW-NB15 dataset. Figure 12a, b displays the confusion matrices with perfect classification of every class label below 70%TRPH and 30%TSPH. Figure 12c shows the PR values, indicating superior performance over each class label. Simultaneously, Fig. 12d proves the ROC values, demonstrating capable results with more excellent ROC analysis for dissimilar classes.
UNSW-NB15 dataset (a, b) confusion matrix, (c, d) PR and ROC curves.
Table 6 and Fig. 13 signify the cyberattack detection of SAMFL-SCDCOA methodology on the UNSW-NB15 dataset. The outcome stated that the SAMFL-SCDCOA methodology has correctly classified all the dissimilar class labels. Based on 70% TRPH, the proposed SAMFL-SCDCOA methodology gains an average \(acc{u}_{y}\) of 99.00%, \(pre{c}_{n}\) of 95.50%, \(rec{a}_{l}\) of 88.74%, \({F}_{score}\) of 90.83%, and \({AUC}_{score}\) of 94.07%. In addition, depending on 30% TSPH, the proposed SAMFL-SCDCOA approach accomplished an average \(acc{u}_{y}\) of 99.04%, \(pre{c}_{n}\) of 95.13%, \(rec{a}_{l}\) of 89.77%, \({F}_{score}\) of 91.69%, and \({AUC}_{score}\) of 94.59%
Average of SAMFL-SCDCOA model at UNSW-NB15 dataset.
Figure 14 illustrates the TRA and VAL \(acc{u}_{y}\) analysis of the SAMFL-SCDCOA methodology on the UNSW-NB15 dataset. The \(acc{u}_{y}\) analysis is calculated within the range of 0–30 epochs. The figure highlights that the TRA and VAL \(acc{u}_{y}\) analysis exhibits a rising trend, which notifies the capacity of the SAMFL-SCDCOA technique with optimal performance across several iterations.
\(Acc{u}_{y}\) graph of SAMFL-SCDCOA model under UNSW-NB15 dataset.
In Fig. 15, the TRALOS and VALLOS analysis of the SAMFL-SCDCOA approach on the UNSW-NB15 dataset is exhibited. The loss values are computed over an interval of 0–30 epochs. The TRALOS and VALLOS values demonstrate a diminishing trend, informing the capability of the SAMFL-SCDCOA methodology in balancing a trade-off between data fitting and simplification.
Loss graph of SAMFL-SCDCOA method at UNSW-NB15 dataset.
Table 7 and Fig. 16 examine the comparative outcomes of the SAMFL-SCDCOA technique on the UNSW-NB15 dataset with the existing methods22,23,34,35,36. The results emphasized that the KNN, MLP, CNN, SVM, LSTM, DE-VIT, and DBN models have reported lower performance. Followed by the SAMFL-SCDCOA approach reported better performance with superior \(pre{c}_{n}\), \(rec{a}_{l},\) \(acc{u}_{y},\) and \({F}_{score}\) of 95.13%, 89.77%, 99.04%, and 91.69%, correspondingly.
Comparative analysis of SAMFL-SCDCOA model with existing approaches under the UNSW-NB15 dataset.
Table 8 and Fig. 17 demonstrate the CT evaluation of the SAMFL-SCDCOA technique with existing methods. The SAMFL-SCDCOA model depicts the fastest CT at 7.88 s, outperforming other methods. The DBN approach follows closely with 8.07 s, while the MLP and LSTM methods take 9.12 and 11.81 s, respectively. Models like KNN and MBiLSTM-GRU approaches exhibit higher CTs, reaching up to 19.77 and 19.07 s. This highlights the efficiency of the SAMFL-SCDCOA model in processing the UNSW-NB15 dataset quickly, making it ideal for real-time applications requiring rapid intrusion detection.
Comparative analysis of SAMFL-SCDCOA model with existing approaches under the UNSW-NB15 dataset.
Conclusion
In this paper, a novel SAMFL-SCDCOA methodology is developed. The main objective of the SAMFL-SCDCOA methodology is to provide an effective method for preventing cyberattacks in real time using FL and advanced optimization models. Initially, the data normalization stage is performed by utilizing the Z-score normalization to scale and standardize data to improve consistency and accuracy for analysis. Then, the FS process was done using the COA technique. The proposed SAMFL-SCDCOA approach designs a GRU-SA model for the cybersecurity classification method. Finally, the IPOA adjusts the hyperparameter values of the GRU-SA approach optimally, resulting in more excellent classification performance. A wide range of experiments has been accomplished to validate the performance of the SAMFL-SCDCOA technique under the CICIDS-2017 dataset. The comparison study of the SAMFL-SCDCOA technique emphasized a superior output of 99.04% over existing models. The limitations of the SAMFL-SCDCOA technique comprise the reliance on publicly available datasets, which may not fully capture the complexities and discrepancies of real-world cyberattacks. Furthermore, the model has not been evaluated in real-time or enterprise-level environments, which may affect its scalability and latency under heavy network traffic. There is also a lack of detailed analysis regarding the deployment challenges, such as hardware needs and integration with existing cybersecurity infrastructures. Future work should test the model in real-world settings, enhance scalability and latency, and expand its applicability to a broader range of cyberattack types across diverse industries.
Data availability
The data supporting this study’s findings are openly available in the Kaggle repository at https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset and https://www.kaggle.com/datasets/mrwellsdavid/unsw-nb15/data reference number33,37.
References
Ghimire, B. & Rawat, D. B. Recent advances on federated learning for cybersecurity and cybersecurity for federated learning for internet of things. IEEE Internet Things J. 9(11), 8229–8249 (2022).
Alazab, M. et al. Federated learning for cybersecurity: Concepts, challenges, and future directions. IEEE Trans. Ind. Inform. 18(5), 3501–3509 (2021).
Ferrag, M. A., Friha, O., Hamouda, D., Maglaras, L. & Janicke, H. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 10, 40281–40306 (2022).
Khramtsova, E., Hammerschmidt, C., Lagraa, S. & State, R. Federated learning for cyber security: SOC collaboration for malicious URL detection. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) 1316–1321 (IEEE, 2020).
Agrawal, S. et al. Federated learning for intrusion detection system: Concepts, challenges and future directions. Comput. Commun. 195, 346–361 (2022).
Zhang, T. et al. November. Federated learning for internet of things. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems 413–419.
Al Mallah, R., Badu-Marfo, G. & Farooq, B. Cybersecurity threats in connected and automated vehicles based federated learning systems. In 2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops) 13–18 (IEEE, 2021).
Campos, E. M. et al. Evaluating federated learning for intrusion detection in Internet of Things: Review and challenges. Comput. Netw. 203, 108661 (2022).
Gosselin, R., Vieu, L., Loukil, F. & Benoit, A. Privacy and security in federated learning: A survey. Appl. Sci. 12(19), 9901 (2022).
Han, Y., El-Hasnony, I. M. & Cai, W. Dragonfly algorithm with gated recurrent unit for cybersecurity in social networking. J. Cybersecur. Inf. Manag. 2, 75–88 (2021).
Awan, K. A. et al. SecEdge: A novel deep learning framework for real-time cybersecurity in mobile IoT environments. Heliyon 11(1) (2025).
Folino, F., Folino, G., Pisani, F.S., Sabatino, P. and Pontieri, L., 2024, March. A scalable vertical federated learning framework for analytics in the cybersecurity domain. In 2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (pp. 245–252). IEEE.
Devarajan, G. G. et al. Explainable federated learning based secure and transparent object detection model for autonomous vehicles. IEEE Trans. Consum. Electron. (2025).
Bukhari, S. M. S. et al. Enhancing cybersecurity in Edge IIoT networks: An asynchronous federated learning approach with a deep hybrid detection model. Internet Things 101252 (2024).
Wang, L., Li, Y. & Zuo, L. Trust management for IoT devices based on federated learning and blockchain. J. Supercomput. 81(1), 1–31 (2025).
Salim, M. M., El Azzaoui, A., Deng, X. & Park, J. H. FL-CTIF: A federated learning based CTI framework based on information fusion for secure IIoT. Inf. Fusion 102, 102074 (2024).
Cao, Y., Ma, J., He, Z. & Li, Y. AP-CFL: Clustered federated learning through dynamic clustering and adaptive participation in heterogeneous IoT. IEEE Internet Things J. (2025).
Doriguzzi-Corin, R. & Siracusa, D. FLAD: Adaptive federated learning for DDoS attack detection. Comput. Secur. 137, 103597 (2024).
Selvarajan, S. et al. Diagnostic behavior analysis of profuse data intrusions in cyber physical systems using adversarial learning techniques. Sci. Rep. 15(1), 7287 (2025).
Kathole, A. B., Jadhav, D., Vhatkar, K. N., Amol, S. & Gandhewar, N. Solar energy prediction in IoT system based optimized complex-valued spatio-temporal graph convolutional neural network. Knowl.-Based Syst. 304, 112400 (2024).
Gupta, B. B. et al. Distributed optimization for IoT attack detection using federated learning and Siberian Tiger optimizer. ICT Express (2025).
Kathole, A. B. et al. Secure federated cloud storage protection strategy using hybrid heuristic attribute-based encryption with permissioned blockchain. IEEE Access (2024).
Hossain, M. A., Hossain, M. D., Choupani, R. & Doǧdu, E. MRS-PFIDS: Federated learning driven detection of network intrusions in maritime radar systems. Int. J. Inf. Secur. 24(2), 1–19 (2025).
Rabie, O. B. J. et al. A novel IoT intrusion detection framework using decisive red fox optimization and descriptive back propagated radial basis function models. Sci. Rep. 14(1), 386 (2024).
Ullah, I., Deng, X., Pei, X., Mushtaq, H. & Khan, Z. Securing internet of vehicles: a blockchain-based federated learning approach for enhanced intrusion detection. Clust. Comput. 28(4), 256 (2025).
Prashanth, S. K., Shitharth, S., Praveen Kumar, B., Subedha, V. & Sangeetha, K. Optimal feature selection based on evolutionary algorithm for intrusion detection. SN Comput. Sci. 3(6), 439 (2022).
Sun, S., Zhou, L., Wang, Z. & Han, L. Robust intrusion detection based on personalized federated learning for IoT environment. Comput. Secur. 104442 (2025).
Selvarajan, S., Shaik, M., Ameerjohn, S. & Kannan, S. Mining of intrusion attack in SCADA network using clustering and genetically seeded flora-based optimal classification algorithm. IET Inf. Secur. 14(1), 1–11 (2020).
Peycheva, D., Li, L., Fewtrell, M., Silverwood, R. & Hardy, R. Mediation of the effect of prenatal maternal smoking on time to natural menopause in daughters by birthweight-for-gestational-age z-score and breastfeeding duration: Analysis of two UK birth cohorts born in 1958 and 1970. BMC Womens Health 25(1), 32 (2025).
Aparna, S. & Padmathilagam, V. Design and implementation of enhanced Maximum power point tracking for renewable applications using crocodile optimization Algorithm. Intell. Decis. Technol. 18724981241305884 (2025).
Bai, L., Zhao, R., Lin, S., Chai, Z. & Wang, X. Stress prediction for slopes based on the VMD-DBO-GRU-A model (2025).
Zou, H. et al. Optimal scheduling of multi-energy complementary systems based on an improved pelican algorithm. Energies 18(2), 365 (2025).
https://www.kaggle.com/datasets/chethuhn/network-intrusion-dataset
Kim, Y., Kim, J. & Kim, D. Hi-MLIC: Hierarchical multilayer lightweight intrusion classification for various intrusion scenarios. IEEE Access (2024).
He, K., Zhang, W., Zong, X. & Lian, L. Network intrusion detection based on feature image and deformable vision transformer classification. IEEE Access 12, 44335–44350 (2024).
Kim, T. & Pak, W. Early detection of network intrusions using a GAN-based one-class classifier. IEEE Access 10, 119357–119367 (2022).
Acknowledgments
The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through Large Research Project under grant number RGP2/224/46. Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R330), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. Researchers Supporting Project number (RSPD2025R830), King Saud University, Riyadh, Saudi Arabia. The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU-FFR-2025- 2847-04. The authors are thankful to the Deanship of Graduate Studies and Scientific Research at University of Bisha for supporting this work through the Fast-Track Research Support Program.
Author information
Authors and Affiliations
Contributions
Manal Abdullah Alohali: Conceptualization, methodology development, experiment, formal analysis, investigation, writing. Hatim Dafaalla: Formal analysis, investigation, validation, visualization, writing. Mohammed Baihan: Formal analysis, review and editing. Sultan Alahmari: Methodology, investigation. Othman Alrusaini: Review and editing. Ali Alqazzaz: Discussion, review and editing. Hanadi Alkhudhayr: Discussion, review and editing. Achraf Ben Miled: Conceptualization, methodology development, investigation, supervision, review and editing. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
This article contains no studies with human participants performed by any authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Alohali, M.A., Dafaalla, H., Baihan, M. et al. Leveraging self attention driven gated recurrent unit with crocodile optimization algorithm for cyberattack detection using federated learning framework. Sci Rep 15, 23805 (2025). https://doi.org/10.1038/s41598-025-99452-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-99452-4
Keywords
This article is cited by
-
Attention-driven time series prediction for personalized fitness recommendation using gated recurrent unit: a fusion technique
Progress in Artificial Intelligence (2026)



















