Abstract
The rapid expansion of Internet of Things (IoT) and Industrial Internet of Things (IIoT) networks has significantly increased the vulnerabilities of critical infrastructures to cyberattacks, posing substantial risks to both security and operational integrity. As these networks continue to grow, traditional intrusion detection systems (IDS) often fail to handle the massive volume, diversity, and sophistication of emerging threats, necessitating the development of more advanced solutions. This study introduces TACNet, a novel deep learning framework designed to enhance intrusion detection in IoT and IIoT environments. The primary objective of this work is to develop a robust model that not only detects a wide range of cyber threats but also adapts to the dynamic nature of these networks. The proposed TACNet architecture combines multi-scale Convolutional Neural Networks (CNN) for feature extraction at various granularities, Long Short-Term Memory (LSTM) networks to capture temporal dependencies in sequential network traffic, and temporal attention mechanisms to focus the model’s learning on the most informative time steps and features. This hybrid approach effectively addresses the challenges of both spatial and temporal data in network traffic, significantly improving model accuracy and interpretability. Experimental results demonstrate the effectiveness of TACNet, achieving accuracy rates from 98.56% to 99.98% on diverse datasets, including CICIDS 2018, DNN-EdgeIIoT, CIC IoT-DIAD 2024, TabularIoTAttack-2024, and N-BaIoT. These findings highlight superior performance of TACNet compared to traditional machine learning-based models, positioning it as a powerful solution for real-time intrusion detection in IoT and IIoT networks.
Introduction
This section introduces the research context, highlighting the growing significance of IoT and IIoT networks, the challenges in securing these systems, and the motivation for developing an advanced intrusion detection framework.
Background and motivation
The Internet of Things (IoT) and Industrial Internet of Things (IIoT) represent transformative technologies that interconnect a vast array of devices, enabling smarter operations across various sectors such as healthcare, manufacturing, transportation, and smart cities1,2,3. IoT refers to the network of everyday objects embedded with sensors and software that collect and exchange data, while IIoT focuses specifically on the industrial sector, integrating devices and machinery to optimize processes and improve efficiency4,5,6. However, as the adoption of IoT and IIoT devices grows, these systems become increasingly vulnerable to cyberattacks due to the sheer volume of interconnected devices, the diversity of communication protocols, and the lack of standardized security frameworks7,8. The lack of robust security measures in many IoT/IIoT deployments leaves them exposed to a variety of threats, including data breaches, denial of service attacks, and manipulation of critical systems. Consequently, Intrusion Detection Systems (IDS) have become indispensable for identifying and mitigating such threats in real-time9,10. By continuously monitoring network traffic and detecting anomalous behaviors, IDS are vital for safeguarding the integrity, confidentiality, and availability of IoT and IIoT networks, ensuring their operational reliability and resilience against cyber threats.
Problem statement
Traditional Intrusion Detection Systems (IDS) have long been the cornerstone of network security, but they face significant limitations when applied to the dynamic and complex environments of IoT and IIoT networks11,12. One of the primary shortcomings of traditional IDS is their inability to effectively handle the enormous volume of data generated by the vast number of interconnected devices13,14. These systems often struggle with the sheer scale and high-dimensional nature of network traffic, leading to delays in detection and analysis15. Additionally, traditional IDS are typically ill-equipped to detect novel and sophisticated attacks, as they often rely on predefined rules or signature-based methods, which fail to identify emerging threats or zero-day attacks16,17. Furthermore, traditional IDS face challenges in addressing class imbalance in network traffic, where benign activities vastly outnumber malicious ones, leading to high false-negative rates and a lack of sensitivity to rare but critical attacks18,19. This results in a limited ability to accurately detect and respond to evolving threats in real-time. To bridge these gaps, there is a pressing need for more advanced IDS techniques that leverage modern machine learning and deep learning approaches. Current research and industry practices are still evolving, and existing models often fall short in adapting to the continuously changing nature of cyber threats in IoT and IIoT networks, necessitating the development of more robust, scalable, and adaptive solutions.
Objective and scope of the research
The primary objective of this research is to develop a robust and adaptive Intrusion Detection System capable of effectively detecting a wide range of cyber threats in IoT and IIoT networks. This study proposes TACNet, a novel deep learning framework that combines multi-scale Convolutional Neural Networks (CNN) for feature extraction, Long Short-Term Memory (LSTM) networks for capturing temporal dependencies, and temporal attention mechanisms to focus the model on the most relevant time steps and features within network traffic data. By integrating these components, TACNet aims to address the challenges of both spatial and temporal data in network traffic, enhancing detection accuracy, interpretability, and robustness to new attack patterns. The scope of this research includes evaluating the proposed TACNet model on several diverse datasets, such as CICIDS 2018, DNN-EdgeIIoT, CIC IoT-DIAD 2024, TabularIoTAttack-2024, and N-BaIoT, to assess its performance across different IoT and IIoT environments. Through extensive experimentation, this study seeks to demonstrate TACNet’s superiority over traditional IDS approaches, highlighting its potential as a powerful solution for real-time intrusion detection in the face of evolving cyber threats.
Contributions of the research
This paper presents a comprehensive deep learning framework for intrusion detection in IoT and IIoT networks. The proposed model, TACNet, integrates advanced techniques to address the unique challenges of these environments, offering significant improvements in both accuracy and scalability. The key contributions of this research are as follows:
-
We developed a novel deep learning-based intrusion detection framework, TACNet, which combines multi-scale Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and temporal attention mechanisms.
-
We integrated CNNs for effective feature extraction across multiple temporal scales, improving the model’s ability to capture complex patterns in network traffic.
-
We implemented a temporal attention layer to prioritize the most informative time steps, enhancing the model’s focus on critical patterns in sequential data.
-
We employed class weight computation to balance the influence of underrepresented attack types, ensuring more accurate detection of rare attacks.
-
We conducted a comprehensive evaluation of TACNet on multiple benchmark datasets, including CICIDS 2018, DNN-EdgeIIoT, CIC IoT-DIAD 2024, TabularIoTAttack-2024, and N-BaIoT, validating its performance across diverse IoT and IIoT scenarios.
Structure of the paper
This paper is structured as follows: Sect. 2 presents a literature review of existing intrusion detection techniques in IoT and IIoT networks. Section 3 details the methodology behind the proposed TACNet framework, including its architecture and components. Section 4 discusses the experimental results and provides an analysis of TACNet’s performance. Finally, Sect. 5 concludes the paper and outlines potential directions for future research.
Related works
The field of Intrusion Detection Systems has evolved significantly over the years, with numerous approaches developed to address the growing complexity of IoT and IIoT networks. This section provides an overview of the existing IDS techniques and their limitations, particularly in the context of IoT and IIoT environments.
Conventional intrusion detection systems
Conventional IDS methods are broadly classified into two categories: signature-based and anomaly-based approaches. Signature-based IDS detect intrusions by matching network traffic patterns against known attack signatures. Einy et al.20 proposed a hybrid intrusion detection system combining signature-based and anomaly-based detection methods using fuzzy logic and metaheuristic feature selection. The authors employed Suricata IDS/IPS for signature detection and a neural network to optimize feature selection, achieving a detection accuracy of 96.11% across various attack types like SQL injection and DDoS. While the approach showed high accuracy, it faced limitations in terms of false positives, scalability for larger networks, and computational efficiency, particularly when dealing with high traffic or complex IoT environments. Sheikh et al. proposed a signature-based IDS tailored for the IoT environment, utilizing a pattern matching algorithm to detect known attacks from the NSL KDD dataset. The system achieved relatively low false-positive rates and effectively detected DoS and probe attacks, optimizing performance with DNA sequence encoding. However, the approach faces several limitations, including limited generalization to dynamic real-world IoT environments, scalability issues for high-traffic networks, and challenges with detecting novel attacks21. Bhavsar et al.22 proposed an advanced anomaly-based IDS for IoT networks using a Pearson Correlation Coefficient - Convolutional Neural Network (PCC-CNN) model, combining feature extraction with deep learning techniques to address high false-positive rates in traditional methods. The model achieved impressive results, outperforming traditional machine learning methods with a 99.89% detection accuracy on benchmark datasets like NSL-KDD and CICIDS-2017. However, the study highlights limitations in handling low-attack instances and does not address the computational efficiency of the model, which could pose challenges for real-time, large-scale IoT deployment. While highly effective for identifying known attacks, they struggle to detect novel or previously unseen threats. On the other hand, anomaly-based IDS build a model of normal network behavior and flag deviations as potential intrusions. These systems offer greater flexibility in detecting new types of attacks but often suffer from high false-positive rates, especially in dynamic environments like IoT and IIoT, where network behavior is inherently variable and diverse.
Machine learning in IDS
Machine learning (ML) has become a cornerstone in modern IDS, enabling systems to automatically learn patterns of normal and malicious behavior. Techniques such as decision trees, Support Vector Machines (SVM), XGBoost, and random forests have been widely applied to IDS tasks23,24,25. These models are capable of learning complex relationships within the data and can adapt to new attack patterns. Walling and Lodh26propose an AN-SFS technique to improve IDS for IoT networks. By dynamically adjusting the feature selection process based on local statistical properties, AN-SFS enhances detection accuracy, achieving 99.3% accuracy on NSL-KDD and 97.5% on UNSW-NB15 datasets. The authors combine this technique with Random Forest for effective feature selection, outperforming traditional methods like ANOVA and Pearson correlation. Kantharaju et al.27 present a machine learning-based IDS for IoT networks, utilizing a Self-Attention Progressive Generative Adversarial Network (SAPGAN) to enhance detection accuracy. The framework incorporates data preprocessing, feature selection using the modified War Strategy Optimization Algorithm (WSOA), and focuses on improving attack classification for security threats. However, ML models often face challenges when dealing with large and imbalanced datasets, a common issue in IoT and IIoT networks where benign traffic vastly outnumbers malicious traffic. This class imbalance can lead to biased predictions and poor detection performance for rare but critical attacks.
Deep learning in IDS
Deep learning approaches have emerged as powerful tools for IDS, particularly in handling the sequential and high-dimensional nature of IoT and IIoT network traffic28. Banaamah and Ahmad29 proposed using deep learning models, such as CNN, LSTM, and GRU, for intrusion detection in IoT networks. They found that the LSTM model outperformed both CNN and GRU in terms of accuracy. While deep learning showed promise in handling large datasets and complex attack patterns, the paper highlighted limitations including scalability issues, high computational costs, and the need for substantial labeled data for training. Dahou et al.30 proposed an intrusion detection system (IDS) for IoT networks combining CNNs for feature extraction with the Modified Reptile Search Algorithm (RSA) for feature selection. Evaluated on multiple datasets like KDDCup-99 and BoT-IoT, the model showed competitive performance, achieving high accuracy, precision, recall, and F1-score. However, the study acknowledged limitations of RSA, including its tendency for premature convergence and the need for problem-specific knowledge to optimize its effectiveness in larger, more complex datasets. Madhu et al.31 propose a Device-based Intrusion Detection System (DIDS) that focuses on identifying network intrusions through the prediction of unknown attacks, using deep learning techniques. Their approach aims to handle computational overhead, improve throughput, and reduce false alarm rates. They evaluate their model against standard algorithms, reporting a detection accuracy of 99%. However, the research highlights some limitations, particularly in handling unknown attacks and scalability in larger networks. Besides these, deep learning models for IDS in IoT are limited by scalability issues, difficulty in detecting novel attacks, reliance on large labeled datasets, and limitedness for dynamic and resource-constrained environments. Nandanwar and Katarya32 presents the AttackNet model, which combines CNN and GRU for botnet attack detection in IIoT environments. This model demonstrates performance, achieving an accuracy of 99.75% on the N_BaIoT dataset, surpassing state-of-the-art techniques. Similarly, the paper33 proposes a hybrid CNN-BILSTM model with transfer learning for botnet attack detection. It achieves a testing accuracy of 99.52% and outperforms existing models, especially on the N_BaIoT dataset. Furthermore, the paper34 introduces Cyber-Sentinet, which integrates SHAP for explainability and demonstrates a solid performance (97.46% accuracy) in protecting cyber-physical systems. Another notable contribution is IoT Security by the paper35 which reviews IoT security solutions and suggests a combination of deep learning techniques to enhance security. These studies contribute valuable insights but often lack a comprehensive hybrid approach that effectively handles both spatial and temporal attack patterns across diverse datasets. TACNet differs by integrating multi-scale CNNs, LSTMs, and attention mechanisms, providing a more robust and scalable solution for intrusion detection.
Attention mechanisms in IDS
Recent advancements in IDS have incorporated attention mechanisms to improve the model’s ability to focus on the most informative features and time steps in network traffic data36,37,38. Attention mechanisms allow the model to weigh the importance of different input features or temporal steps, enabling more accurate predictions. In IDS, this has proven useful for identifying subtle attack patterns that may be masked by the overwhelming volume of normal traffic. By directing the model’s focus to relevant parts of the data, attention mechanisms enhance detection accuracy, especially in cases where attacks are not easily distinguishable from benign behavior.
Some other approaches like collaborative learning, federated learning, and dynamic feature aggregation are also applied for intrusion detection. For instance, Khan et al.39 introduces a collaborative Spiking Recurrent Unit (SRU) network aimed at reducing communication overhead and improving feature explainability. While this method addresses efficiency, it does not fully tackle the complexity of real-time attack detection in large-scale IoT systems, which is where TACNet differentiates itself through the integration of multi-scale CNNs and temporal attention mechanisms. Similarly40, proposes a federated learning-based boosting framework to enhance detection in distributed IoT environments. However, it does not specifically address class imbalance or the detection of rare, stealthy attacks, which TACNet mitigates through class weight computation and stratified sampling. Additionally41, focuses on federated reinforcement learning for improving IoMT security and privacy but does not capture the dynamic nature of attack patterns in IoT networks. TACNet’s combination of multi-scale CNNs, LSTMs, and attention mechanisms allows it to better handle evolving attack scenarios across diverse datasets, thus offering a more comprehensive solution for real-time intrusion detection in IoT and IIoT environments. These recent studies contribute valuable insights, but TACNet’s ability to address both spatial and temporal data challenges positions it as a more robust solution in the context of evolving cyber threats.
Research gap
Despite significant progress in IDS, there remains a critical gap in existing approaches when it comes to efficiently handling both spatial and temporal patterns in IoT traffic. While traditional methods and machine learning models can capture some aspects of network behavior, they often fail to address the complex, time-varying nature of IoT and IIoT data. Additionally, there is a need for more interpretable and scalable models that can be deployed on resource-constrained edge devices without compromising performance. This research aims to bridge these gaps by developing a hybrid deep learning-based IDS that integrates multi-scale CNNs, LSTM networks, and attention mechanisms, addressing the unique challenges of IoT and IIoT environments. Table 1 summarizes the limitations of traditional IDS, machine learning models, existing deep learning models, and attention-based deep learning models, and how TACNet addresses these limitations through its advanced architecture.
Methodology
This section presents the detailed methodology of the proposed TACNet framework, including its architecture, data preprocessing steps, feature extraction, model components, training strategies, and evaluation metrics. The archjitectural diagram of the proposed TACNet is shown in Fig. 1.
Amount of data for different datasets: (a) DNN-EdgeIIoT, (b) CICIDS 2018, (c) TabularIoTAttack-2024, (d) N-BaIoT, (e) CIC IoT-DIAD 2024 used in this research after preprocessing
Dataset
The proposed approach is evaluated using five different datasets: CICIDS 2018, DNN-EdgeIIoT, CIC IoT-DIAD 2024, TabularIoTAttack-2024, and N-BaIoT.
The DNN-EdgeIIoT dataset42, comprising 223,912 records and 62 features, provides a comprehensive representation of various IoT and IIoT network attacks. It includes attack types such as DDoS attacks (UDP, ICMP, TCP, HTTP), SQL injection, password attacks, vulnerability scanners, XSS, MITM, and ransomware, among others. The Normal class accounts for the majority of the data, with 161,564 instances, while attack types like DDoS_UDP (12,157 instances), DDoS_ICMP (11,644 instances), and SQL_injection (5,120 instances) are also prominently represented. The dataset’s diversity in attack types, including port scanning, backdoor, and ransomware, makes it an ideal resource for evaluating the proposed intrusion detection system, enabling it to handle a wide range of network security threats in real-world IoT environments.
The CICIDS2018 dataset43 consists of 208,534 records and 80 features, representing a variety of attack types in IoT and IIoT networks. The dataset includes attack labels such as Benign (165,189 instances), FTP-BruteForce (19,336 instances), SSH-Bruteforce (18,759 instances), and DoS (5,250 instances). This dataset provides a balanced mixture of normal and attack traffic, with the majority of instances belonging to the Benign class, while attacks such as FTP-BruteForce, SSH-Bruteforce, and Denial of Service (DoS) are also well-represented. The CICIDS2018 dataset is widely used for testing intrusion detection systems, offering a comprehensive and diverse set of cyberattack patterns for evaluating the proposed model’s ability to classify and detect various network intrusions.
The N-BaIoT dataset44 consists of 989,230 rows and 116 features, capturing a variety of network traffic for IoT environments. The dataset contains several types of attacks, with the most significant being Mirai (652,100 instances) and Gafgyt (287,582 instances). The Benign class represents 49,548 instances, indicating normal network traffic. The N-BaIoT dataset is particularly valuable for testing intrusion detection systems, as it includes well-known IoT botnet attacks like Mirai and Gafgyt, which are commonly used in DDoS and botnet-based attacks. This diverse dataset helps in evaluating the effectiveness of intrusion detection systems in recognizing both normal traffic and a wide range of IoT-specific attack patterns.
The CIC IoT-DIAD 2024 dataset45 consists of 1,671,681 rows and 85 features, capturing a wide range of network traffic in IoT environments. It includes several attack types such as DoS (891,727 instances), DDoS (504,597 instances), Benign (183,595 instances), Spoofing (72,996 instances), SQLInjection (6,600 instances), Mirai (5,170 instances), BruteForce (3,619 instances), and XSS (3,377 instances). The dataset provides a balanced mix of normal and attack traffic, with DoS and DDoS attacks representing the majority of the attack instances. This dataset is highly valuable for evaluating intrusion detection systems due to its large size, variety of IoT-specific attacks, and the presence of common attack techniques like Mirai, SQL injection, and XSS, making it suitable for comprehensive testing of IoT security solutions.
The TabularIoTAttack-2024 dataset46 consists of multiple attack types. The dataset includes Benign samples (86,525 instances, 61.96%), which make up the largest portion, followed by Denial of Service (DoS) attacks (20,198 samples, 14.46%), Brute Force (18,151 samples, 13.00%), and Man-in-the-Middle (MITM) attacks (14,768 samples, 10.58%). This imbalance is typical in real-world IoT attack data, where benign traffic dominates. The diverse set of attack types enables the dataset to simulate realistic IoT network traffic and provides a challenging environment for model training and evaluation.
Data preprocessing
The preprocessing starting with the handling of missing values. Handling missing values in a dataset is a critical step in ensuring the quality and consistency of the data before applying machine learning models. In real-world network traffic data, such as IoT and IIoT, missing values can occur due to a variety of reasons, such as sensor failures, incomplete logs, or data corruption during transmission. In the case of numerical columns, missing values are imputed using the median value of the respective feature. The median is chosen over the mean because it is more robust to outliers, which are often present in network traffic data due to the sporadic nature of certain attacks. The median is defined as the middle value in a sorted list of numbers and is calculated as:
Where, x is the set of numerical values in the column, n is the total number of values in the column, and \(\:{x}_{\left(k\right)}\) is the value at the \(\:{k}^{th}\) position after sorting.
This approach ensures that missing values are replaced with a value that is representative of the central tendency of the data, while mitigating the influence of extreme outliers.
For categorical columns, missing values are replaced with the mode (most frequent value) of the column. The mode is the value that appears most frequently in the data and is a reasonable choice for filling missing categorical data, as it maintains the most common class distribution. The mode is computed as:
Where, y represents the set of categorical values in the column, Y is the set of all possible categories in the column, and \(\:count\left({y}_{i}\right)\) denotes the frequency of category \(\:{y}_{i}\).
This ensures that the missing categorical data is replaced by the most likely or common value, helping maintain the distribution and relationships between the classes.
After handling missing values, the next step is to prepare the data for deep learning by normalizing and standardizing the features. This ensures that all features are on a similar scale, preventing features with larger numerical ranges from dominating the model’s learning process.
Normalization rescales numerical values so that they lie within a specific range, typically between 0 and 1. This is particularly useful when the data includes features with different units or scales. The normalization process is defined by the following formula:
Where, x is the original value of the feature, \(\:{x}_{\text{min}\:}\)and \(\:{x}_{\text{max}\:}\)are the minimum and maximum values of the feature, respectively, and \(\:{x}_{norm}\) is the normalized value.
This approach ensures that all features are bounded between 0 and 1, allowing them to contribute equally to the model without one feature disproportionately affecting the training process.
Standardization, or Z-score normalization, transforms features to have zero mean and unit variance. Standardization is particularly important when features have different units (e.g., packet size in bytes vs. flow duration in seconds). The formula for standardization is as follows:
Where, x is the original feature value, is the mean of the feature, is the standard deviation of the feature, and is the standardized value.
By standardizing the features, the resulting distribution of each feature has a mean of 0 and a standard deviation of 1. This ensures that each feature is weighted equally during training, particularly important when using deep learning approaches.
In order for machine learning models to process categorical data, these features need to be encoded into numeric values. The most common method for encoding categorical features is Label Encoding. It assigns each unique category in a categorical feature a corresponding integer. The transformation is represented as follows:
Where, \(\:{y}_{i}\) represents a unique category in the categorical feature , \(\:encoded\:\left({y}_{i}\right)\) is the integer assigned to category \(\:{y}_{i}\).
This ensures that the categorical features are converted into numerical values that the model can process.
In network traffic data, especially in intrusion detection systems, the class distribution is often imbalanced, with benign traffic vastly outnumbering attack traffic. This imbalance can lead to poor model performance, particularly in detecting rare but critical attacks. To mitigate this issue, we apply stratified sampling and class weight computation during training.
To adjust for the class imbalance, we compute class weights inversely proportional to the frequency of each class. This gives higher importance to minority classes during training. The formula for computing class weights is:
Where, \(\:{w}_{c}\) is the weight assigned to class , N is the total number of samples in the dataset, and \(\:{n}_{c}\) is the number of samples in class .
This weighting scheme ensures that the model places greater emphasis on underrepresented attack classes, improving detection sensitivity for rare attacks. Figure 2 illustrates the distribution of data for each dataset used in the research after preprocessing, highlighting the diversity and scale of the datasets across different attack types and network environments.
Architectural diagram of the proposed TACNet approach.
Feature extraction using multi-scale CNN
Feature extraction is a crucial step in intrusion detection systems, particularly when dealing with network traffic data. In the TACNet architecture, CNNs are used to extract spatial features from the raw network traffic data. The multi-scale approach allows the model to capture information at varying granularities, enabling it to detect both fine-grained and more complex patterns in the data.
The multi-scale CNN employs convolutional layers with different kernel sizes, typically 3, 5, and 7. These kernels enable the model to capture spatial features at different scales or granularities, ensuring that both fine-grained local patterns and broader, higher-level patterns are learned from the data. The convolution operation at a given layer is defined by the following equation:
Where, \(\:{y}_{i,\:j}\) is the output feature map at position (, ), x is the input data (network traffic features), is the convolution kernel (filter), and are the dimensions of the kernel, and ( ∗ ) represents the convolution operation.
This process is applied for each kernel size (3, 5, and 7), generating different feature maps that capture information at varying scales. Each kernel focuses on a different spatial range within the data, ensuring that the model can identify both fine details (e.g., small traffic patterns) and broader characteristics (e.g., network flow characteristics).
By using multiple kernels with different sizes, the network can extract features at varying spatial resolutions. For example: Kernel size 3 captures fine-grained features, focusing on local patterns in the network traffic. Kernel size 5 captures medium-scale patterns that may span larger regions of the traffic data. Kernel size 7 captures even broader patterns, useful for detecting large-scale anomalies or network behavior that involves a larger context.
The outputs of these individual convolutions are then concatenated to form a unified feature map that contains spatial features learned at different granularities. The concatenation of the different feature maps can be mathematically expressed as:
Where, \(\:{f}_{3},{f}_{5}\) and \(\:{f}_{7}\) are the feature maps generated by the 3 × 3, 5 × 5, and 7 × 7 convolutional kernels, respectively. The symbol ∣ denotes the concatenation of the feature maps along the depth (channel) axis.
Once the feature maps are generated, Batch Normalization (BN) is applied to stabilize training and speed up convergence. BN is particularly useful in deep networks to mitigate issues such as internal covariate shift, where the distribution of activations changes during training. This ensures that the network is more stable and can learn more efficiently. Batch normalization is applied as follows:
Where, \(\:{\widehat{x}}_{i}\) is the normalized input, \(\:{x}_{i}\) is the input feature (before normalization), \(\:{\mu\:}_{B}\) is the mean of the mini-batch for feature , \(\:{\sigma\:}_{B}^{2}\) is the variance of the mini-batch for feature , is a small constant added to prevent division by zero, is a scaling factor, and is a shifting factor.
Batch Normalization helps in accelerating the training process by reducing the sensitivity to initialization and maintaining a stable distribution of the input data across layers. The inclusion of and allows the network to learn the optimal scaling and shifting of the normalized values.
To prevent overfitting, particularly when working with large datasets such as IoT and IIoT network traffic, Dropout is applied during training. Dropout randomly sets a fraction of the input units to zero at each update during training, effectively preventing the model from becoming overly reliant on any specific feature and improving generalization. The formulation for dropout is given by:
Where, \(\:{y}_{i}\) is the output of the neuron after applying dropout, \(\:{x}_{i}\) is the input to the neuron, p is the dropout rate (the probability of setting the neuron to zero).
Dropout ensures that the model does not overfit by forcing it to learn redundant representations, which improves its generalization ability. This process helps in making the model robust and adaptable to unseen network traffic patterns during inference.
Modeling temporal dependencies
LSTM networks are a specialized type of Recurrent Neural Networks (RNNs) designed to model sequential data by capturing long-term dependencies, making them highly effective in processing time-series data like network traffic. In the TACNet architecture, LSTM layers are integrated after the multi-scale convolutional layers to model the temporal dependencies inherent in network traffic data. LSTM is particularly useful in this context because it addresses the vanishing gradient problem that traditional RNNs struggle with, allowing the network to learn long-term relationships between sequential events.
The LSTM cell is designed to maintain a memory of previous inputs over time through a series of gates, allowing it to remember or forget certain information. The mathematical formulation of an LSTM cell consists of three primary gates: the forget gate, the input gate, and the output gate. These gates are responsible for regulating the flow of information within the cell.
The forget gate \(\:\left({f}_{t}\right)\) determines what proportion of the previous memory cell \(\:{C}_{t-1}\) should be carried forward to the next time step:
Where, \(\:\sigma\:\) is the sigmoid activation function, \(\:{W}_{f}\) is the weight matrix for the forget gate, \(\:{h}_{t-1}\) is the previous hidden state, \(\:{x}_{t}\) is the current input, \(\:{b}_{f}\) is the bias term for the forget gate.
The input gate \(\:\left({i}_{t}\right)\) controls how much of the current input \(\:{x}_{t}\) should be added to the memory cell:
The candidate memory \(\:{\stackrel{\sim}{C}}_{t}\) cell is created by passing the input through a tanh function to generate a set of new candidate values for the memory cell:
The memory cell update \(\:\left({C}_{t}\right)\) is then a combination of the previous memory cell and the new candidate values, controlled by the forget and input gates:
The output gate \(\:{o}_{t}\) controls how much of the memory cell should be output as the current hidden state \(\:{h}_{t}\):
Finally, the hidden state \(\:{h}_{t}\) is updated using the memory cell \(\:{C}_{t}\):
Where, \(\:{W}_{f},{W}_{i},{W}_{C},{W}_{o}\) are weight matrices for the forget, input, candidate memory, and output gates, respectively, \(\:{b}_{f},{b}_{i},{b}_{C},{b}_{o}\) are the bias terms for each gate, is the sigmoid function, tanh is the hyperbolic tangent function, and \(\:{h}_{t}\) is the output hidden state at time step .
These equations ensure that the LSTM network can selectively retain or forget information over long sequences, allowing it to capture the temporal dependencies and complex patterns inherent in sequential network traffic data.
The LSTM layer in TACNet plays a vital role in capturing temporal dependencies in network traffic, which is crucial for understanding long-term relationships between events in the sequence of network flows. Network traffic is inherently sequential, with attacks often developing over time or exhibiting patterns that evolve gradually. For example, a DoS attack may initially appear as a low-volume flood of traffic but may intensify over time, and an attack such as FTP-BruteForce may exhibit a series of attempts spread across time.
LSTM allows the model to retain information across time steps, making it capable of detecting these evolving attack patterns. In sequential learning, the model takes into account previous time steps to make predictions about the current step, thereby providing a deeper understanding of attack progressions and the temporal nature of network behaviors.
The ability of LSTM to remember long-term dependencies ensures that TACNet can effectively detect advanced persistent threats (APT) or attacks that evolve over an extended period, which would otherwise be missed by traditional methods. This capability is particularly important in IoT and IIoT environments, where attacks often unfold in a series of steps, requiring the model to maintain a memory of past events to recognize the ongoing threat.
The LSTM layer effectively processes the sequential input from the CNN feature maps, passing on the extracted temporal features while learning complex patterns that span across multiple time steps. This enables TACNet to identify attacks that might be hidden in short-term fluctuations but emerge as significant over a longer time horizon.
By combining these temporal capabilities with the spatial feature extraction of CNNs, TACNet is well-equipped to handle the dynamic and evolving nature of network traffic in intrusion detection tasks.
Temporal and channel attention mechanisms
In the TACNet architecture, both temporal and channel attention mechanisms are integrated to improve the model’s ability to focus on relevant time steps and features. The temporal attention is applied after the LSTM layer, assigning attention weights to each time step based on its relevance to the overall prediction. Channel attention, on the other hand, operates on the feature maps generated by the multi-scale CNN layers. It assigns attention weights to each channel by aggregating spatial information through Global Average Pooling and applying a sigmoid function to generate attention scores for each channel. These attention scores are then used to re-weight the feature maps, ensuring that the most important features are prioritized during the classification process. This dual attention mechanism enhances the model’s ability to capture both temporal dependencies and spatial patterns efficiently.
The temporal attention mechanism enables the model to focus on specific time steps that are most important for detecting an attack, effectively capturing the temporal dynamics of network traffic. In sequential data, not all time steps are equally informative, and the model can benefit from being able to assign varying levels of attention to different moments in the sequence.
The temporal attention mechanism works by applying an attention weight to each time step in the sequence, based on its relevance to the overall prediction. The weight for each time step is computed by applying a softmax function over the attention scores. The attention scores are learned via a trainable layer that computes a scalar value for each time step, reflecting its importance in the sequence.
The temporal attention score \(\:{\alpha\:}_{t}\) for each time step in the sequence can be computed as:
Where, \(\:{h}_{t}\) is the hidden state at time step produced by the previous layer, f(·) is a learnable function (usually a dense layer) that computes an attention score for each time step, \(\:{\alpha\:}_{t}\) is the attention weight assigned to the time step . The softmax function ensures that the attention weights sum to 1:
Once the attention scores are computed, the weighted output of the network for each time step is obtained by multiplying the attention score \(\:{\alpha\:}_{t}\) by the corresponding hidden state \(\:{h}_{t}\):
The weighted sequence of hidden states is then passed forward in the network, allowing the model to focus more on the most relevant time steps while reducing the impact of less informative time steps.
The temporal attention mechanism thus enables the model to prioritize important moments in the network traffic sequence, particularly those that signify anomalies or attacks, which can evolve or escalate over time.
In addition to temporal attention, channel attention is applied to assign different weights to the features (channels) extracted by the convolutional layers. This mechanism ensures that the model focuses on the most relevant features, improving its ability to distinguish between benign and malicious traffic.
The channel attention mechanism works by calculating an attention weight for each feature channel. The mechanism first computes a global descriptor of the feature map across the time dimension by aggregating the spatial information using pooling operations. Then, a learnable transformation assigns attention weights to each channel, based on its importance in the detection task.
The channel attention mechanism involves the following steps:
-
i)
Global Average Pooling: For each channel , we compute the global average pooling of the feature map \(\:{F}_{c}\) across the time dimension:
Where, \(\:{F}_{c,\:t}\) is the value of the feature map \(\:{F}_{c}\) at time step , is the total number of time steps, and \(\:gap\left({F}_{c}\right)\) is the aggregated representation of channel .
-
ii)
Attention Score Calculation: The pooled feature vector is passed through a dense layer (with a ReLU activation) to generate an attention score for each channel:
Where, \(\:{W}_{c}\) is the weight matrix for the channel attention layer, \(\:{b}_{c}\) is the bias term, sigmoid is the sigmoid activation function that produces attention scores in the range [0, 1].
-
iii)
Feature Re-weighting: The attention scores \(\:score\:\left({F}_{c}\right)\) are used to re-weight the feature map \(\:{F}_{c}\) so that more important features receive higher weights:
Where, \(\:{F}_{c}^{{\prime\:}}\) is the output feature map for channel after applying channel attention, \(\:score\left({F}_{c}\right)\) is the attention weight assigned to channel .
The channel attention mechanism effectively learns which features (or channels) are most important for the classification task, improving the model’s ability to focus on relevant patterns in the network traffic data.
Hybrid model integration
The TACNet architecture is a hybrid deep learning framework that integrates multiple advanced techniques to address the complex challenges posed by IoT and IIoT network traffic. The model combines multi-scale Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and attention mechanisms into a unified architecture. This hybrid approach enables TACNet to effectively capture both spatial and temporal features within the network traffic data. By leveraging CNNs, the model can extract local and global spatial patterns, while LSTMs model the sequential, time-varying nature of the data. The attention mechanisms further enhance the model by allowing it to focus on the most relevant features and time steps, improving its ability to detect attacks with precision and reducing false positives. Table 2 presents the various layers in TACNet, their output shapes, the number of parameters, and the layer connections. It provides a clear overview of how the network is structured, from the initial input layer to the final output layer, showcasing the multi-scale convolutional, LSTM, attention mechanisms, and other critical components in the model architecture.
The architecture begins with multi-scale CNN layers, where different kernel sizes (e.g., 3 × 3, 5 × 5, and 7 × 7) are used to capture various levels of spatial features. The CNN layers help to identify patterns in the raw network traffic data, such as packet size, flow duration, and protocol type, by applying filters that vary in size. These spatial features are then passed through LSTM layers, which capture temporal dependencies in the data. The LSTM layers allow the model to recognize sequential patterns in network behavior, such as attacks that evolve over time or those that involve multiple events spread across time.
To further enhance the model’s performance, temporal and channel attention mechanisms are incorporated. The temporal attention mechanism helps the model prioritize important time steps in the sequence, ensuring that the most critical moments, such as the escalation of an attack, receive more focus. Similarly, the channel attention mechanism assigns weights to each feature, helping the model determine which features are most relevant for classification. Together, these components enable TACNet to effectively detect both short-term anomalies and long-term attack patterns.
The data flow within TACNet is designed to ensure that the model can learn both spatial and temporal patterns from the IoT and IIoT network traffic. Initially, the raw network traffic data, which includes various features such as flow duration, packet sizes, and protocols, is fed into the network. The data then flows through the multi-scale CNN layers, where different filters of varying kernel sizes (3 × 3, 5 × 5, and 7 × 7) are applied to extract spatial features at different granularities. The use of multiple kernel sizes allows TACNet to capture both fine-grained and broader network behaviors, ensuring that it can detect both local and global patterns.
Next, the feature maps produced by the CNN layers are passed into the LSTM layers. These LSTM layers are essential for capturing the temporal dependencies of network traffic. Since network traffic is sequential in nature, the LSTM helps the model to understand how network behaviors evolve over time, which is crucial for detecting attacks that unfold progressively. The LSTM layers retain important information from previous time steps, allowing TACNet to model long-term attack patterns that may not be immediately obvious from individual data points.
After the LSTM layers, the data is passed through the temporal and channel attention mechanisms. The temporal attention mechanism assigns attention weights to each time step, enabling the model to focus more on critical moments that are indicative of an ongoing attack. The channel attention mechanism assigns weights to different features, allowing the model to prioritize important network characteristics, such as certain packet sizes or flow patterns, that are most relevant for detecting attacks. By adjusting the focus on both time steps and features, the attention mechanisms enhance the model’s ability to filter out irrelevant data and improve its overall detection performance.
Finally, the data passes through a fully connected dense layer, where a GlobalMaxPooling1D operation is applied to reduce the dimensionality of the feature maps. This pooled representation is then passed to another dense layer, followed by a softmax output layer, which classifies the input as either Benign or one of several attack types, such as DoS, FTP-BruteForce, or SSH-Bruteforce. This output layer provides the final classification, determining whether the network traffic is normal or indicative of an intrusion.
Model training and optimization
The training strategy for the TACNet model is designed to ensure robust learning, effective optimization, and high generalization performance on unseen data. To accomplish this, the Adam optimizer is used, which combines the benefits of momentum-based gradient descent and adaptive learning rates. Adam is particularly effective for training deep learning models, as it adjusts the learning rate for each parameter, ensuring faster convergence and reducing the risk of getting stuck in local minima. The update rule for the Adam optimizer is based on the following equations:
Where, \(\:{m}_{t}\) and \(\:{v}_{t}\) are the first and second moment estimates (momentum and velocity), \(\:{\beta\:}_{1}\) and \(\:{\beta\:}_{2}\) are decay rates for the moment estimates, \(\:{\nabla\:}_{\theta\:}J\left(\theta\:\right)\) is the gradient of the loss function with respect to the model parameters, \(\:\eta\:\) is the learning rate, 𝜀 is a small constant added to prevent division by zero, \(\:{\widehat{m}}_{t}\) and \(\:{\widehat{v}}_{t}\) are the bias-corrected moment estimates, \(\:{\theta\:}_{t}\) represents the model parameters at time step .
For the loss function, sparse categorical cross-entropy is used, which is appropriate for multi-class classification problems where the labels are integer-encoded (i.e., not one-hot encoded). This loss function computes the negative log probability of the correct class for each data point. The sparse categorical cross-entropy loss for a single data point can be defined as:
Where, \(\:{p}_{{y}_{i}}\) is the predicted probability for the true class \(\:{y}_{i}\) of the data point.
The use of sparse categorical cross-entropy is particularly useful in this research since the dataset involves multiple classes (benign and various types of attacks) and the model’s goal is to classify the input network traffic into one of these categories.
A critical challenge in intrusion detection systems is class imbalance, where benign traffic vastly outnumbers attack traffic. This imbalance can lead to biased model training, where the model may predict the majority class (benign traffic) well but fail to detect minority classes, such as rare attack types. To address this issue, class weighting is applied during training.
Class weights are computed inversely proportional to the frequency of each class in the dataset, ensuring that the model places more importance on detecting rare but critical attacks. The weight for each class is computed as:
Where, \(\:{w}_{c}\) is the weight for class c, is the total number of samples in the dataset, \(\:{n}_{c}\) is the number of samples in class .
These class weights are then used to adjust the loss function during training, so that the model penalizes misclassifications of minority classes more heavily. This allows the model to pay more attention to the underrepresented attack classes and improve its detection accuracy for those attacks.
To prevent overfitting, which is a common issue when training deep learning models on small or imbalanced datasets, early stopping is employed. Early stopping halts training when the model’s performance on the validation set does not improve after a specified number of epochs. This prevents the model from continuing to learn from the noise in the training data, thus improving its generalization ability. The stopping criterion is based on monitoring the validation loss, and training is halted if the validation loss does not improve for a specified number of epochs, known as the patience parameter.
The approach for monitoring early stopping is:
Where, \(\:val\_{loss}_{t}\) is the validation loss at epoch , k is the patience parameter, which determines the number of epochs without improvement before stopping the training.
In addition to early stopping, model check pointing is used to save the best-performing model during training. The model’s weights are saved whenever the validation loss improves, ensuring that the best model (i.e., the one with the lowest validation loss) is preserved and can be used for testing and deployment. This ensures that the model with the most optimal generalization ability is retained, preventing overfitting to the training data.
Algorithm 1 presents the TACNet framework for intrusion detection. The model integrates multi-scale CNNs for spatial feature extraction, LSTM layers to capture temporal dependencies, and temporal and channel attention mechanisms to focus on the most informative features and time steps. Training is performed using weighted sparse categorical cross-entropy to address class imbalance, with the Adam optimizer and early stopping to ensure robust convergence. This design allows TACNet to efficiently learn complex spatial-temporal patterns in network traffic, achieving high accuracy and robustness across multiple datasets.
TACNet: Temporal attention Conveolutional Network for IoT/IIoT Intrusion Detection.
The values for batch size (1024), dropout (0.3), learning rate, and LSTM units (128) were selected through a systematic hyperparameter tuning approach, involving grid search and random search methods to identify the most effective configurations for experiments. This approach was based on performance metrics across several validation datasets to optimize model accuracy and avoid overfitting.
Evaluation metrics
Evaluating the performance of TACNet is crucial to understanding how well the model detects network intrusions and distinguishes between benign traffic and various types of attacks. The effectiveness of an intrusion detection system (IDS) is often assessed using multiple evaluation metrics, as each provides insight into a different aspect of the model’s performance. In this research, the performance of TACNet is evaluated using accuracy, precision, recall, F1-score, confusion matrices, and others.
Accuracy is one of the most commonly used evaluation metrics in classification problems. It measures the overall proportion of correct predictions made by the model. However, accuracy alone may not provide sufficient insights when dealing with imbalanced datasets, where the majority class (benign traffic) may dominate the results.
Where, TP (True Positives): Number of correct predictions for the attack class. TN (True Negatives): Number of correct predictions for the benign class. FP (False Positives): Number of incorrect predictions where benign traffic is incorrectly classified as an attack. FN (False Negatives): Number of incorrect predictions where attacks are incorrectly classified as benign traffic.
Precision is an important metric for understanding how many of the instances predicted as positive (attacks) were actually positive. It measures the ability of the model to correctly identify attack traffic among all the instances it classified as attacks. High precision indicates that the model is very selective and does not label benign traffic as attacks. The formula for precision is:
Recall, also known as sensitivity or true positive rate, is the measure of how well the model identifies all actual attack instances. High recall means that the model is good at detecting attacks but may also result in more false positives (i.e., benign traffic misclassified as attacks).
The F1-score is the harmonic mean of precision and recall. It provides a balanced measure of the model’s performance, especially in the context of imbalanced datasets where both false positives and false negatives are critical. The F1-score helps to strike a balance between precision and recall, giving an overall indication of the model’s effectiveness. The formula for F1-score is:
The F1-score takes both false positives and false negatives into account, providing a more nuanced view of performance when there is an imbalance in the class distribution.
A confusion matrix is a powerful tool that provides a detailed breakdown of the model’s performance by showing the true positives, true negatives, false positives, and false negatives. The confusion matrix enables a more granular analysis of how well the model is distinguishing between different classes (benign traffic and various attack types).
Other metrics like AUC-ROC curve, training vs. validation accuracy, precision, and recall are also mentioned to provide a comprehensive evaluation of the model’s performance and its ability to generalize to unseen data.
Experimental results and analysis
This section will present the results of the experiments conducted to evaluate the performance of the TACNet model. It will provide a comprehensive analysis of the results across various benchmark datasets, comparing TACNet’s performance against baseline models and other state-of-the-art intrusion detection systems (IDS).
Experimental setup
The TACNet model was implemented and evaluated in a Google Colab environment, utilizing CPU hardware acceleration and Python 3 for the entire process. Key libraries such as TensorFlow and Keras were used to build and train the model, leveraging their powerful APIs for defining the architecture and optimizing the network. Essential deep learning components like Conv1D, LSTM, Dense, and Dropout were utilized for feature extraction, modeling temporal dependencies, and preventing overfitting. Scikit-learn was used for preprocessing tasks such as Label Encoding, class weighting, and train-test splitting, as well as for evaluating the model using metrics like accuracy, precision, recall, and F1-score. NumPy and Pandas were employed for data manipulation and preparation, including handling missing values, normalizing numerical features, and aggregating data. Matplotlib and Seaborn were used for visualizing the performance of the model, including training and validation curves, confusion matrices, and other evaluation metrics. The training process involved using the Adam optimizer with sparse categorical cross-entropy as the loss function, and callbacks like EarlyStopping and ModelCheckpoint were implemented to prevent overfitting and save the best-performing model. The model was trained for 20 epochs with a batch size of 1024, ensuring robust learning while managing large-scale datasets. The use of Google Colab’s cloud-based environment enabled efficient execution of the experiments, making it possible to handle large datasets like CICIDS 2018, DNN-EdgeIIoT, CIC IoT-DIAD 2024, TabularIoTAttack-2024, and N-BaIoT, while providing an accessible and reproducible platform for model training and evaluation. In the experimental setup, the dataset was split into 80% training, 20% testing, and a 20% validation subset. The train_test_split function from Scikit-learn was used to divide the data, and during training, 20% of the training data was reserved for validation to monitor the model’s performance on unseen data and prevent overfitting.
Experimental results
The classification report shown in Table 3 provides detailed metrics for each class, including precision, recall, F1-score for CICIDS 2018 dataset. The results demonstrate that TACNet achieves outstanding performance across all classes. For example, the Benign class achieves a precision of 0.9999, recall of 0.9998, and an F1-score of 0.9998, indicating very high accuracy in predicting benign traffic. Similarly, the attack classes (DoS, FTP-BruteForce, and SSH-Bruteforce) also exhibit high precision, recall, and F1-scores, with the model achieving near-perfect results, especially for FTP-BruteForce and SSH-Bruteforce, where the recall is 1.0000.
This classification report confirms that TACNet is highly effective in distinguishing between benign and malicious traffic, even for rare attack types. The balanced performance across different attack types suggests that ability of TACNet to handle class imbalance and evolving attack patterns is a significant strength, further demonstrating its potential for deployment in real-world IoT and IIoT environments.
The confusion matrix shown in Fig. 3 presents the detailed classification results for the test data. It displays the true and predicted labels for each class: Benign, DoS, FTP-BruteForce, and SSH-Bruteforce. The matrix indicates that the TACNet model performs exceptionally well in detecting both the benign and attack classes. For instance, the Benign class is predicted accurately with only a few misclassifications, while attack types such as DoS, FTP-BruteForce, and SSH-Bruteforce are also correctly identified, with minimal false positives and false negatives.
Confusion matrix of test data for CICIDS 2018 dataset.
The Accuracy vs. Loss plot, shown in Fig. 4, illustrates the training and validation accuracy and training and validation loss over the course of 20 epochs. As the plot demonstrates, the model quickly reaches high accuracy on the training set, with the training accuracy surpassing 99% within the first few epochs. The validation accuracy follows a similar trend, steadily increasing and approaching 99% by the 20th epoch.
Accuracy vs. Loss plot over the epochs for CICIDS 2018 dataset.
The Precision, Recall, and F1 Score curves shown in Fig. 5 depict the model’s performance on both training and validation data over the course of 20 epochs. The training precision, recall, and F1 score reach near-perfect values early in the training process, while the validation precision, recall, and F1 score also follow a similar trend, indicating that the model generalizes well to unseen data. The curves demonstrate that TACNet maintains high performance across these important metrics, with minimal fluctuations, showcasing its ability to learn from both benign and attack data efficiently. This is a positive outcome, indicating that the model effectively captures the key patterns in the network traffic without overfitting.
Precision, recall, and F1 score over the epochs for training and validation scores.
The Multi-Class ROC Curve in Fig. 6 presents the Receiver Operating Characteristic (ROC) for each class in the dataset: Benign, DoS, FTP-BruteForce, and SSH-Bruteforce. The AUC (Area Under the Curve) for each class is shown to be 1.0000, indicating that the model distinguishes each class (benign or attack) perfectly. The ROC curve, which plots the True Positive Rate (TPR) against the False Positive Rate (FPR), clearly shows that TACNet performs exceptionally well in terms of class discrimination, achieving perfect classification across all classes. This is a strong indicator of the model’s ability to handle multi-class classification tasks effectively.
Multi-class ROC curve for the test data.
The Precision, Recall, and F1 Score for each class (Benign, DoS, FTP-BruteForce, SSH-Bruteforce) are shown in Fig. 7 using a bar chart. This visual representation highlights the model’s consistent performance across all classes, with all metrics maintaining high values near 1. The precision, recall, and F1-score for each class (Benign, DoS, FTP-BruteForce, SSH-Bruteforce) are nearly identical, indicating that TACNet performs equally well at correctly identifying true positives (precision), capturing all relevant instances (recall), and balancing both (F1 score). The uniformity across all metrics further demonstrates that the model can effectively handle both common and rare attack types, ensuring high-quality predictions without significant trade-offs between precision and recall.
Precision, recall, and F1 score bar chart for the test data.
The Accuracy vs. Loss plot for the DNN-EdgeIIoT dataset in Fig. 8 shows the training and validation accuracy along with the training and validation loss over the course of 10 epochs. The plot demonstrates that the model quickly reaches high accuracy on the training set, with training accuracy stabilizing at close to 1. The validation accuracy also increases rapidly but remains slightly lower than the training accuracy, indicating some overfitting. However, the loss curves show a clear reduction in both training and validation loss, with the validation loss remaining slightly higher, suggesting the model generalizes well to the unseen data, though a small gap indicates room for improvement in generalization.
Accuracy vs. Loss plot over the Epochs for DNN-EdgeIIoT dataset.
The Accuracy vs. Loss plot for the N-BaIoT dataset shows in Fig. 9 that TACNet achieves near-perfect training accuracy very early in the training process, with the validation accuracy closely following. Both the training and validation loss decrease rapidly, especially within the first few epochs, indicating that the model quickly learns to distinguish between benign and attack traffic. The validation loss stabilizes at a low value, suggesting that TACNet generalizes well to unseen data without significant overfitting. The accuracy curves for both training and validation stay close to 1.0, emphasizing the model’s ability to make accurate predictions even on a diverse dataset like N-BaIoT.
Accuracy vs. Loss plot over the epochs for N-BaIoT dataset.
Figure 10 shows the accuracy and loss plots over 20 epochs for the TabularIoTAttack-2024 dataset. The accuracy plot demonstrates that the model achieves high accuracy, reaching nearly 99.98% during the training phase, with a slight gap between the training and validation accuracy curves, indicating minimal overfitting. In the loss plot, the train loss decreases sharply after the initial epochs and stabilizes near 0, while the validation loss follows a similar trend, maintaining a low value throughout the training, which reflects the model’s effective learning and generalization capability. This indicates that the model efficiently learns the patterns in the dataset without overfitting, achieving optimal performance across both training and validation phases.
Accuracy vs. Loss plot over the epochs for TabularIoTAttack-2024 dataset.
Comparative analysis
Figure 11 provides a comparative analysis of the training and validation performance metrics (accuracy, loss, precision, and recall) for multiple models: CNN, RNN, LSTM, and the TACNet approach. The CICIDS 2018 dataset has been utilized to present the comparison. The plots showcase how these models evolve over 20 epochs, with the x-axis representing the number of epochs and the y-axis displaying respective metric values.
Comparison of training and validation performance metrics across common deep learning and TACNet models.
The CNN, RNN, and LSTM models share the same input shape of (X_train.shape1, 1) and utilize Adam optimizer with a learning rate of 0.001. The loss function is Categorical Crossentropy with label smoothing of 0.1, and the batch size is set to 128, with 20 epochs for training. Each model uses ReLU activation for hidden layers and Softmax for the output layer. For the CNN model, the Conv1D layer has 64 filters with a kernel size of 3, followed by max pooling with a size of 2. Dropout of 0.3 is applied after each dense layer, which contains 64 neurons. The RNN model consists of two LSTM layers, with 64 units in the first and 32 units in the second. Dropout of 0.3 is applied after each LSTM layer, and the dense layer has 64 neurons. The LSTM model features a single LSTM layer with 64 units and dropout of 0.3 after the LSTM layer, with the dense layer also having 64 neurons.
In terms of accuracy, TACNet consistently outperforms the other models, reaching 99.98% accuracy by epoch 2 and maintaining this level throughout training. While CNN, RNN, and LSTM models also exhibit solid performance, they fail to achieve the same high levels of accuracy, with TACNet showing a clear advantage. Similarly, precision and recall metrics further highlight the strength of TACNet, as it achieves significantly higher values compared to the CNN, RNN, and LSTM models. This indicates TACNet’s superior ability to detect rare attack types with minimal false positives, making it more effective in IoT and IIoT environments.
The loss plots reveal that TACNet converges much faster, maintaining a consistently low validation loss, demonstrating its efficient learning and minimal overfitting. The LSTM and RNN models show some improvements in loss reduction over CNN, but none reach the performance of TACNet. TACNet’s advanced architecture, which combines multi-scale CNNs for spatial feature extraction and LSTM layers for capturing temporal dependencies, contributes to this rapid and stable convergence.
Finally, the precision and recall plots underscore TACNet’s robustness in identifying both common and rare attack types with high accuracy and low false positive rates. This is due to TACNet’s ability to adapt to dynamic network traffic patterns, leveraging both CNN and LSTM components along with attention mechanisms for more focused learning. By using class weighting to address data imbalances and early stopping to prevent overfitting, TACNet achieves superior performance, confirming its potential as an effective solution for real-time intrusion detection in IoT and IIoT networks.
The proposed approach in our paper utilizes class weighting as the primary technique to handle class imbalance mentioned in Table 4. In comparison to other methods such as SMOTE, ADASYN, and Focal Loss, our approach delivers highly competitive results. In contrast, class weighting maintains a simpler yet effective strategy by assigning higher weights to the minority class, ensuring balanced attention across attack classes without introducing synthetic data or complex hyperparameter tuning. SMOTE and ADASYN, although effective in generating synthetic data, face challenges such as overfitting and potential noise in synthetic samples, which can undermine performance. Focal Loss, while excelling in improving precision, requires fine-tuning, and the performance gain may not justify the complexity added. Our approach, class weighting, stands out by balancing simplicity and effectiveness, making it ideal for intrusion detection in IoT and IIoT environments, where high accuracy and robust handling of imbalanced datasets are essential.
Table 5 presents the performance of the proposed TACNet model across five different datasets: DNN-EdgeIIoT, N-BaIoT, CICIDS 2018, CIC IoT-DIAD 2024, and TabularIoTAttack-2024. The model’s performance is evaluated based on four key metrics: accuracy, precision, recall, and F1-score. For the DNN-EdgeIIoT dataset, with 223,912 rows and 62 features, TACNet achieved an impressive accuracy of 98.56%, with precision and recall at 99.08% and 98.56%, respectively, resulting in a strong F1-score of 98.70%. For the N-BaIoT dataset, consisting of 989,230 rows and 116 features, TACNet achieved near-perfect performance with 99.94% in accuracy, precision, recall, and F1-score across 3 classes. Similarly, the CICIDS 2018 dataset, with 208,534 rows and 80 features, demonstrated TACNet’s ability to perform exceptionally well, reaching 99.98% for accuracy, precision, recall, and F1-score across 4 classes. The CIC IoT-DIAD 2024 dataset also showed strong results with 99.97% in all performance metrics across 8 classes. Finally, the TabularIoTAttack-2024 dataset with 139,642 rows and 85 features achieved 99.74% in accuracy, precision, recall, and F1-score across 4 classes, confirming TACNet’s robustness across various IoT and IIoT environments.
These results highlight the versatility and robustness of the TACNet model in handling different datasets with varying feature dimensions and class distributions. The model’s consistent, outstanding performance across all datasets underscores its ability to effectively detect intrusions in real-world applications.
Table 6 presents a comparative analysis of the performance of the proposed TACNet model against various existing intrusion detection systems (IDS) and deep learning models. The table includes key metrics such as accuracy, precision, and recall across different models and datasets. Notably, the TACNet model outperforms all other approaches in all the metrics across five datasets, demonstrating its superior ability to accurately detect intrusions.
To justify the selection of kernel sizes 3, 5, and 7 in the multi-scale convolutional block, we conducted an additional ablation study evaluating alternative kernel combinations, including (1, 3, 5), (2, 4, 6), and fixed single-kernel architectures. Results showed that the 3–5–7 configuration consistently produced the highest F1-score and accuracy across all datasets (CICIDS2018, Edge-IIoT, DIAD-2024, TabularIoTAttack-2024, and N-BaIoT). Smaller kernels (1, 3, 5) captured fine-grained features but failed to model broader temporal patterns, while even-sized kernels (2, 4, 6) generated less stable gradients and reduced generalization performance. Larger odd kernels demonstrated better receptive-field diversity, enabling TACNet to capture short-, mid-, and long-range dependencies more effectively.
In comparison, traditional models like DNN, CNN, and LSTM typically achieve lower accuracy and precision values, with DNN models reaching around 90.79% accuracy and 92.90% for hybrid models. Other approaches like DL-BiLSTM and PCC-CNN also perform well but do not surpass the TACNet model in precision, recall, and overall detection effectiveness. The results from TACNet suggest that it significantly enhances the detection capability over traditional machine learning and deep learning-based models, making it a highly effective solution for intrusion detection in IoT and IIoT environments. This performance superiority confirms the model’s robustness and generalizability, making it a strong approach for real-time, large-scale network intrusion detection applications.
Adversarial attacks and statistical analysis
FGSM (Fast Gradient Sign Method) and PGD (Projected Gradient Descent) are two prominent adversarial attack techniques used to test the robustness of our proposed TACNet approach. FGSM generates adversarial examples by perturbing the input in the direction of the gradient of the loss function with respect to the input. The perturbation magnitude is controlled by a hyperparameter epsilon (ε), which makes it fast and efficient but generally less potent compared to iterative attacks. In contrast, PGD is an iterative and more powerful attack that applies small perturbations over multiple steps, improving its ability to deceive models, making it a stronger attack method than FGSM.
In our coding the FGSM and PGD attacks were implemented to evaluate the robustness of the model on the CICIDS 2018 dataset. The FGSM attack was applied using a gradient-based approach, where the adversarial examples were generated by perturbing the inputs based on the computed gradients. Similarly, the PGD attack was implemented iteratively, where the model’s input was repeatedly updated through the gradient descent method, ensuring the perturbations remained within a defined limit. The adversarial examples were then evaluated by the model, and the resulting precision, recall, accuracy, and F1-score metrics were calculated.
The results in the table Table 7 show that FGSM and PGD attacks caused a decrease in the model’s performance. The model achieved 0.9765 for all metrics under the FGSM attack, demonstrating strong resilience. However, the PGD attack caused a slight performance drop to 0.9650 for the same metrics, indicating that the model was more vulnerable to iterative, stronger attacks. The robustness of the model under FGSM and PGD emphasizes its effectiveness, but the slight degradation with PGD points to potential areas for improvement, such as exploring adversarial training methods to enhance resilience against more sophisticated adversarial techniques.
Table 8 summarizes the inference time and memory footprint of the model when evaluated on the CICIDS 2018 and TabularIoTAttack-2024 datasets. For the CICIDS 2018 dataset, the model took 82.0134 s for inference with a memory footprint of 0.5843 MB and a peak memory usage of 14.2532 MB. On the TabularIoTAttack-2024 dataset, the model demonstrated faster inference with a time of 59.3844 s, a lower memory footprint of 0.4104 MB, and a peak memory usage of 10.1482 MB.
These inference times are not reflective of per-sample inference costs, which would typically be much lower in a real-time, streaming traffic scenario. The reported values include overhead from handling large-scale data in a batch mode.
The results indicate that the model performs efficiently on both datasets in terms of inference time and memory consumption. Although the TabularIoTAttack-2024 dataset resulted in faster inference and lower memory usage, the performance on the CICIDS 2018 dataset still showed a reasonable trade-off between accuracy and computational efficiency. These metrics provide valuable insight into the model’s real-time deployment potential, particularly in IoT environments where resource efficiency is a critical factor.
Table 9 presents the results of both the Paired t-test and Wilcoxon Signed-Rank Test used to compare the performance metrics (precision, recall, F1-score, and accuracy) between CNN, LSTM, RNN, and TACNet models.
The statistical Paired t-test evaluates whether the differences in the metrics are statistically significant. The TACNet model outperforms CNN, RNN, and LSTM across all metrics (precision, recall, F1-score, and accuracy). For accuracy, the t-statistic between CNN and TACNet shows a significant difference with a p-value of 0.0011, indicating that TACNet is significantly superior. Similarly, TACNet demonstrates better results in precision, recall, and F1-score, with p-values well below the 0.05 threshold, showing its consistent dominance over the other models.
The Wilcoxon Signed-Rank Test, being non-parametric, corroborates the findings of the t-test by evaluating differences in the medians of the paired samples. The w-statistic for TACNet against CNN, RNN, and LSTM further supports its superior performance across all metrics. The results, particularly the p-values of 0.0000 for CNN vs. TACNet and 0.0050 for RNN vs. TACNet, suggest that TACNet outperforms the other models, with a significant edge in accuracy and overall performance.
The statistical analysis indicates that TACNet clearly outperforms CNN, RNN, and LSTM across all key performance metrics. The model achieves significantly higher accuracy, precision, recall, and F1-score, underscoring its robustness and ability to handle complex patterns in IoT and IIoT network traffic. TACNet’s combination of multi-scale CNNs, LSTM layers, and attention mechanisms makes it highly effective for intrusion detection, even in the presence of challenging network traffic. The results demonstrate the model’s superiority in both traditional and adversarial scenarios, reinforcing TACNet as the most powerful solution for real-time intrusion detection in dynamic IoT and IIoT environments.
Figure 12 presents an explanation of the model’s prediction for the TabularIoTAttack-2024 dataset using the LIME (Local Interpretable Model-agnostic Explanations) method. The output shows the model’s decision-making process for a specific instance in the test dataset, where it predicts “Benign” as the attack type with a confidence of 1.0, and assigns near-zero probabilities for other classes such as “Brute Force”, “DoS”, and “MITM”.
LIME explanation of model prediction for the TabularIoTAttack-2024 dataset, showing feature contributions for the “Benign” attack classification.
The LIME explainer provides visual insights into how the model made this prediction. The bars represent the feature contributions, with the features being plotted in decreasing order of importance. The feature values on the right-hand side indicate the specific values for each feature that contributed to the prediction. Notably, Feature 6 (with a value of 36961) stands out as the most significant feature in influencing the model’s classification decision. Features like Feature 19 and Feature 80 also play essential roles in determining the prediction.
This explanation is crucial for understanding how individual features influence the model’s decision and enhances the transparency of the TACNet approach. In real-world deployment, such interpretability is vital for model validation, ensuring that the model does not make decisions based on irrelevant or biased factors. This also emphasizes TACNet’s effectiveness in providing a highly accurate prediction with interpretability, which is essential for real-time intrusion detection systems in IoT environments.
The proposed TACNet model incorporates multi-scale Convolutional Neural Networks, Long Short-Term Memory layers, and attention mechanisms, making it computationally complex. The multi-scale CNN employs kernel sizes of 3, 5, and 7 to extract features at varying granularities, while the LSTM layer captures temporal dependencies. The temporal and channel attention mechanisms add further complexity by weighting important features and time steps, respectively. Each of these components introduces significant computational load due to the matrix multiplications and convolution operations. The use of LSTM, which stores long-term dependencies, increases the overall complexity, especially considering the large input sequences in the dataset.
The time complexity of the model is mainly determined by the operations in the CNN and LSTM layers. The CNN layer’s time complexity is \(\:O(n\cdot\:{k}^{2})\), where is the number of input samples and is the size of the convolution kernel. The LSTM layer’s time complexity is \(\:O(T\cdot\:{U}^{2})\), where is the sequence length and is the number of units in the LSTM. The overall time complexity for a forward pass through the model will be \(\:O(n\cdot\:{k}^{2}+T\cdot\:{U}^{2})\), with additional overhead from the attention mechanisms. As the model incorporates both CNN and LSTM layers, its time complexity grows with the size of the input data and the depth of the network.
The memory complexity of the TACNet model is impacted by the size of the input data, the number of layers, and the type of operations performed. The CNN layers, with their multiple kernel sizes, require storing activations for each layer, resulting in memory complexity of \(\:O(n\cdot\:{k}^{2})\). LSTM layers, which capture temporal dependencies, introduce memory complexity of \(\:O(T\cdot\:U)\), where is the sequence length and is the number of LSTM units. The attention mechanisms further add memory usage, with a complexity of \(\:O(T\cdot\:F)\), where is the number of features at each time step. Thus, the overall memory complexity of the TACNet model is \(\:O(n\cdot\:{k}^{2}+T\cdot\:U+T\cdot\:F)\), which becomes significant for large datasets and long sequences.
Training the TACNet model requires a high amount of computational resources due to its deep architecture. The batch size used is 1024, which increases both memory requirements and training time. Additionally, training over 10 epochs, combined with early stopping and class weight adjustments, further increases the model’s training time. The use of class weights helps to address class imbalance, adding a small computational overhead. The model’s training time is further affected by the size of the training dataset and the number of epochs, making it resource-intensive.
The inference time of the model is a critical consideration, especially for real-time applications. The inference time is measured after training, and for the CICIDS 2018 dataset, it took 82.01 s, while for the TabularIoTAttack-2024 dataset, it took 59.38 s. The model’s memory footprint during inference also varies, with the CICIDS 2018 dataset requiring 0.58 MB of memory, peaking at 14.25 MB, and the TabularIoTAttack-2024 dataset using 0.41 MB of memory, with a peak of 10.15 MB. These values reflect the substantial computational requirements of the TACNet model during inference, particularly when processing large datasets.
Ablation study
To assess the contribution of the multi-scale CNN in the TACNet model, we examined the performance of the model with and without this component. The multi-scale CNN utilizes different kernel sizes to capture spatial features at varying granularities. This allows the model to extract both fine-grained and broader patterns in network traffic data, which is crucial for identifying diverse attack types and benign patterns. Our experiments showed that when the multi-scale CNN was included, the model exhibited a significant improvement in accuracy, precision, and recall compared to models without this component. The ability to capture features at different scales provides TACNet with a more comprehensive understanding of the spatial characteristics in IoT/IIoT traffic, improving its detection capabilities, particularly for complex attack patterns that may manifest at different temporal or spatial scales.
The LSTM and attention mechanisms are essential components of TACNet that contribute significantly to modeling temporal dependencies and enhancing feature-specific attention. The LSTM layer helps capture long-term temporal dependencies in sequential data, which is vital for identifying attacks that evolve over time. In our ablation study, when LSTM was removed from the model, there was a noticeable decline in detection accuracy, especially for attacks that develop over multiple time steps.
To better contextualize TACNet’s computational costs, we conducted a comparison of TACNet’s training time, inference time, and memory footprint against standalone CNN and LSTM models. The results show that while TACNet’s inference time (59–82 s) is higher than standalone models due to its multi-scale CNN and LSTM components, these models individually achieve lower accuracy and fail to capture both spatial and temporal dependencies as effectively as TACNet.
For example, standalone CNN models exhibit faster inference but perform at a reduced accuracy with 95.67%, while LSTM models show improved temporal dependency capture but at the cost of longer training times and less effective spatial feature extraction. TACNet’s more complex architecture ensures a better accuracy-efficiency tradeoff, as shown by its consistently high accuracy across multiple datasets, with a slight increase in computational overhead.
As for memory usage, TACNet’s memory footprint (0.4–0.6 MB) and peak memory (10–14 MB) are comparable to simpler CNN and LSTM models, but slightly higher due to the additional complexity introduced by multi-scale feature extraction and the LSTM attention mechanisms. However, these values remain manageable for deployment in edge devices and resource-constrained environments.
Additionally, the attention mechanisms, including both temporal and channel attention, help prioritize the most relevant time steps and features during training, ensuring that the model focuses on the most informative parts of the data. When these attention layers were removed, the model’s precision and recall dropped, particularly for rare attacks where certain features and time steps are more critical than others. The attention mechanism effectively allows the model to weigh important features more heavily, leading to improved performance in classifying attacks accurately and reducing false positives and false negatives.
The TACNet model has demonstrated significant strengths across several aspects. One of the most notable strengths is its ability to achieve high accuracy and robustness in detecting a wide range of attack types, even in the presence of class imbalance. TACNet combines multi-scale CNN for feature extraction, LSTM for modeling temporal dependencies, and attention mechanisms to focus on the most relevant features and time steps. This hybrid approach enables the model to learn both spatial and temporal patterns from network traffic data, making it highly effective at detecting a variety of attacks in IoT and IIoT environments. Furthermore, the model excels in generalization, maintaining high performance across multiple benchmark datasets such as CICIDS 2018, DNN-EdgeIIoT, CIC IoT-DIAD 2024, TabularIoTAttack-2024, and N-BaIoT. TACNet’s ability to handle evolving attacks and rare attack types is a significant advantage over traditional machine learning models, making it a promising solution for real-time intrusion detection systems.
Conclusion
In this research, we proposed TACNet, a novel deep learning framework for intrusion detection in IoT and IIoT networks. Our approach integrates multi-scale CNN, LSTM networks, and temporal and channel attention mechanisms to effectively capture both spatial and temporal patterns in network traffic data. Through comprehensive evaluation on benchmark datasets such as CICIDS 2018, DNN-EdgeIIoT, CIC IoT-DIAD 2024, TabularIoTAttack-2024, and N-BaIoT, we demonstrated that TACNet outperforms traditional intrusion detection systems and existing deep learning models in terms of accuracy, precision, recall, F1-score, and other metrics. The inclusion of multi-scale feature extraction, temporal modeling, and attention mechanisms has proven to enhance the model’s ability to detect both common and rare attacks in real-time. The results underscore TACNet’s potential as a robust, scalable solution for intrusion detection in dynamic and evolving IoT environments. Finally TACNet offers a significant advancement in intrusion detection, combining state-of-the-art deep learning techniques with practical solutions for the complex nature of IoT and IIoT security.
The future direction of this research lies in enhancing TACNet’s scalability and efficiency for real-time deployment in large-scale IoT and IIoT environments, where computational resources may be limited. Further work could focus on optimizing the model’s performance through techniques such as model pruning, quantization, and the integration of edge computing to reduce latency. Additionally, incorporating federated learning could enable decentralized model training, ensuring privacy and security of sensitive data while continuously adapting the model to evolving attack patterns across distributed devices. These advancements will help TACNet maintain its robustness and effectiveness in real-world, dynamic environments.
Moreover, recent studies beyond the IDS domain highlight additional security and privacy risks which remain largely unaddressed in our current design. For instance, Rethinking Membership Inference Attacks Against Transfer Learning demonstrates that even when only a “student” model is exposed, adversaries can infer whether specific data records were used during training, underscoring a potential privacy leakage risk when deploying deep-learning–based models64. Similarly, works such as VULSEYE65: Detect Smart Contract Vulnerabilities via Stateful Directed Graybox Fuzzing and WAFBooster: Automatic Boosting of WAF Security Against Mutated Malicious Payloads illustrate that security-critical systems, such as smart contracts and web application firewalls, face attacks that evolve with stateful or adaptive strategies, which challenge static or purely pattern-based defenses66. These findings suggest that for real-world deployment of IDS in IoT/IIoT networks, TACNet could be vulnerable to advanced adversarial or privacy attacks or to adversaries evolving over time. Future work could focus on evaluating the privacy and adversarial robustness of our framework under membership inference attacks, adaptive adversarial payloads, or evolving attack patterns. Integrating defense mechanisms such as differential privacy, adversarial training, dynamic re-training, or continual learning would also be an essential area for further investigation to mitigate these potential risks.
Data availability
The datasets utilized in this research are publicly available. All the experimental code, including the implementation of the proposed TACNet model, training scripts, and evaluation procedures, has been uploaded to GitHub for replication purposes and can be accessed at: [https://github.com/Alamgir-JUST/TACNet/](https:/github.com/Alamgir-JUST/TACNet) .
References
Abdeljaber, H. A. M. et al. A novel ensemble learning approach for enhanced IoT attack detection: redefining security paradigms in connected systems. Human-centric Comput. Inform. Sci. 16 (0). https://doi.org/10.22967/HCIS.2026.16.001 ( 2026).
Afrin, S. et al. Industrial internet of things: Implementations, challenges, and potential solutions across various industries. Comput. Ind. 170, 104317. https://doi.org/10.1016/j.compind.2025.104317 (2025).
Md, A., Hossain & Islam, M. S. Ensuring network security with a robust intrusion detection system using ensemble-based machine learning. Array 19, 100306. https://doi.org/10.1016/j.array.2023.100306 (2023).
Ishtiaq, W. et al. CST-AFNet: A dual attention-based deep learning framework for intrusion detection in IoT networks. Array 27, 100501. https://doi.org/10.1016/j.array.2025.100501 (2025).
Ullah, I. et al. Integration of data science with the intelligent IoT (IIoT): current challenges and future perspectives. Digit. Commun. Networks. 11 (2), 280–298. https://doi.org/10.1016/j.dcan.2024.02.007 (2025).
Md, A., Hossain & Islam, M. S. A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection. Sci. Rep. 13 (1), 21207. https://doi.org/10.1038/s41598-023-48230-1 (2023).
Ahmed, S. F. et al. Apr., Forensics and security issues in the Internet of Things, Wireless Netw, 31, 4, 3431–3466, (2025). https://doi.org/10.1007/s11276-025-03942-2
Hossain, M. S., Hossain, M. A. & Islam, M. S. I-MPaFS: enhancing EDoS attack detection in cloud computing through a data-driven approach. J. Cloud Comp. 13 (1), 151. https://doi.org/10.1186/s13677-024-00699-5 (2024).
Hossain, M. A. Deep Q-learning intrusion detection system (DQ-IDS): A novel reinforcement learning approach for adaptive and self-learning cybersecurity. ICT Express. https://doi.org/10.1016/j.icte.2025.05.007 (2025).
Saif, S., Hossain, M. A. & Islam, M. S. IoT Security Fortification: Enhancing Cyber Threat Detection Through Feature Selection and Advanced Machine Learning, in 1st International Conference on Innovative Engineering Sciences and Technological Research (ICIESTR), Muscat, Oman: IEEE, M2024, 1–6. (2024). https://doi.org/10.1109/ICIESTR60916.2024.10798181
Manivannan, R. & Senthilkumar, S. Intrusion detection system for network security using novel adaptive recurrent neural network-Based Fox optimizer concept. Int. J. Comput. Intell. Syst. 18 (1), 37. https://doi.org/10.1007/s44196-025-00767-x (2025).
Md, A. et al. Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction. J. Big Data. 11 (1), 33. https://doi.org/10.1186/s40537-024-00886-w (2024).
Jemili, F., Jouini, K. & Korbaa, O. Detecting unknown intrusions from large heterogeneous data through ensemble learning. Intell. Syst. Appl. 25, 200465. https://doi.org/10.1016/j.iswa.2024.200465 (2025).
Hossain, M. A. Deep learning-based intrusion detection for IoT networks: a scalable and efficient approach, EURASIP J. on Info. Security,1, 28, (2025). https://doi.org/10.1186/s13635-025-00202-w
Sheikh, M. N. A. et al. SDN-Enabled IoT based transport layer DDoS attacks detection using RNNs. CMC 0 (0), 1–10. https://doi.org/10.32604/cmc.2025.065850 (2025).
Touré, A., Imine, Y., Semnont, A., Delot, T. & Gallais, A. A framework for detecting zero-day exploits in network flows. Comput. Netw. 248, 110476. https://doi.org/10.1016/j.comnet.2024.110476 (2024).
Hossain, M. A. & FED-GEM-CN A federated dual-CNN architecture with contrastive cross-attention for maritime radar intrusion detection. Array 27, 100456. https://doi.org/10.1016/j.array.2025.100456 (2025).
Le, T. T. H., Shin, Y., Kim, M. & Kim, H. Towards unbalanced multiclass intrusion detection with hybrid sampling methods and ensemble classification. Appl. Soft Comput. 157, 111517. https://doi.org/10.1016/j.asoc.2024.111517 (2024).
Ahmed, U. et al. Signature-based intrusion detection using machine learning and deep learning approaches empowered with fuzzy clustering. Sci. Rep. 15 (1), 1726. https://doi.org/10.1038/s41598-025-85866-7 (2025).
Einy, S., Oz, C. & Navaei, Y. D. The Anomaly- and Signature-Based IDS for Network Security Using Hybrid Inference Systems, Mathematical Problems in Engineering, 1–10, (2021). https://doi.org/10.1155/2021/6639714
Sheikh, N. U., Rahman, H., Vikram, S. & AlQahtani, H. A Lightweight Signature-Based IDS for IoT Environment, Nov. 12, Preprint at :arXiv:1811.04582. (2018). https://doi.org/10.48550/arXiv.1811.04582
Bhavsar, M., Roy, K., Kelly, J. & Olusola, O. Anomaly-based intrusion detection system for IoT application. Discov Internet Things. 3 (1), 5. https://doi.org/10.1007/s43926-023-00034-5 (2023).
Xu, B., Sun, L., Mao, X., Ding, R. & Liu, C. IoT Intrusion Detection System Based on Machine Learning, Electronics, 12, 20, 4289, (2023). https://doi.org/10.3390/electronics12204289
Chen, Y. W., Sheu, J. P., Kuo, Y. C. & Van Cuong, N. Design and Implementation of IoT DDoS Attacks Detection System based on Machine Learning, in 2020 European Conference on Networks and Communications (EuCNC), Dubrovnik, Croatia: IEEE, 122–127. (2020). https://doi.org/10.1109/EuCNC48522.2020.9200909
Walling, S. & Lodh, S. Network intrusion detection system for IoT security using machine learning and statistical based hybrid feature selection. Secur. Priv. 7 (6), e429. https://doi.org/10.1002/spy2.429 (2024).
Walling, S. & Lodh, S. Enhancing IoT intrusion detection through machine learning with AN-SFS: a novel approach to high performing adaptive feature selection. Discov Internet Things. 4 (1), 16. https://doi.org/10.1007/s43926-024-00074-5 (2024).
Kantharaju, V. et al. Machine learning based intrusion detection framework for detecting security attacks in internet of things. Sci. Rep. 14 (1), 30275. https://doi.org/10.1038/s41598-024-81535-3 (2024).
Asgharzadeh, H., Ghaffari, A., Masdari, M. & Gharehchopogh, F. S. An Intrusion Detection System on The Internet of Things Using Deep Learning and Multi-objective Enhanced Gorilla Troops Optimizer, J Bionic Eng 21, 5, 2658–2684, (2024). https://doi.org/10.1007/s42235-024-00575-7
Banaamah, A. M. & Ahmad, I. Intrusion Detection in IoT Using Deep Learning, Sensors, 22, 21, 8417, . (2022). https://doi.org/10.3390/s22218417
Dahou, A. et al. Intrusion Detection System for IoT Based on Deep Learning and Modified Reptile Search Algorithm, Computational Intelligence and Neuroscience, vol. pp. 1–15, June 2022, (2022). https://doi.org/10.1155/2022/6473507
Madhu, B., Venu Gopala Chari, M., Vankdothu, R., Silivery, A. K. & Aerranagula, V. Intrusion detection models for IOT networks via deep learning approaches. Measurement: Sens. 25, 100641. https://doi.org/10.1016/j.measen.2022.100641 (2023).
Nandanwar, H. & Katarya, R. Deep learning enabled intrusion detection system for industrial IOT environment. Expert Syst. Appl. 249, 123808. https://doi.org/10.1016/j.eswa.2024.123808 (2024).
Nandanwar, H. & Katarya, R. TL-BILSTM iot: transfer learning model for prediction of intrusion detection system in IoT environment. Int. J. Inf. Secur. 23 (2), 1251–1277. https://doi.org/10.1007/s10207-023-00787-8 (2024).
Nandanwar, H. & Katarya, R. Securing industry 5.0: an explainable deep learning model for intrusion detection in cyber-physical systems. Comput. Electr. Eng. 123, 110161. https://doi.org/10.1016/j.compeleceng.2025.110161 (2025).
Kauhsik, B., Nandanwar, H. & Katarya, R. IoT Security: A Deep Learning-Based Approach for Intrusion Detection and Prevention, in International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT), Bengaluru, India: IEEE, 2023, 1–7.https://doi.org/10.1109/EASCT59475.2023.10392490
Abdelhamid, S., Hegazy, I., Aref, M. & Roushdy, M. Attention-Driven transfer learning model for improved IoT intrusion detection. BDCC 8 (9), 116. https://doi.org/10.3390/bdcc8090116 (2024).
Phalaagae, P., Zungeru, A. M., Yahya, A., Sigweni, B. & Rajalakshmi, S. A hybrid CNN-LSTM model with attention mechanism for improved intrusion detection in wireless IoT sensor networks. IEEE Access. 13, 57322–57341. https://doi.org/10.1109/ACCESS.2025.3555861 (2025).
Abdelhamid, S., Hegazy, I., Aref, M. & Roushdy, M. Attention-Driven transfer learning model for improved IoT intrusion detection. BDCC 8 (9), 116. https://doi.org/10.3390/bdcc8090116 (2024).
Khan, I. A. et al. A novel collaborative SRU network with dynamic behaviour Aggregation, reduced communication overhead and explainable features. IEEE J. Biomed. Health Inf. 28 (6), 3228–3235. https://doi.org/10.1109/JBHI.2024.3352013 (2024).
Khan, I. A., Pi, D., Kamal, S., Alsuhaibani, M. & Alshammari, B. M. Federated-Boosting: A distributed and dynamic Boosting-Powered Cyber-Attack detection scheme for security and privacy of consumer IoT. IEEE Trans. Consumer Electron. 1–1. https://doi.org/10.1109/TCE.2024.3499942 (2024).
Khan, I. A. et al. Fed-Inforce-Fusion: A federated reinforcement-based fusion model for security and privacy protection of IoMT networks against cyber-attacks. Inform. Fusion. 101, 102002. https://doi.org/10.1016/j.inffus.2023.102002 (2024).
Ferrag, M. A. F., Friha, O. F., Hamouda, D. H., Maglaras, L. M. & Janicke, H. J. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications: centralized and federated Learning. IEEE dataport. https://doi.org/10.21227/MBC1-1H68
Sharafaldin, I., Habibi Lashkari, A. & Ghorbani, A. A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization:, in Proceedings of the 4th International Conference on Information Systems Security and Privacy, Funchal, Madeira, Portugal: SCITEPRESS - Science and Technology Publications, 108–116. (2018). https://doi.org/10.5220/0006639801080116
Meidan, Y. et al. N-BaIoT—Network-Based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput. 17 (3), 12–22. https://doi.org/10.1109/MPRV.2018.03367731 (2018).
Rabbani, M. et al. Device identification and anomaly detection in IoT environments. IEEE Internet Things J. 12 (10), 13625–13643. https://doi.org/10.1109/JIOT.2024.3522863 (2025).
Sasi, T., Lashkari, A. H., Lu, R., Xiong, P. & Iqbal, S. An efficient self attention-based 1D-CNN-LSTM network for IoT attack detection and identification using network traffic. J. Inform. Intell. 2949715924000763, S. https://doi.org/10.1016/j.jiixd.2024.09.001 (2024).
Awajan, A. A Novel Deep Learning-Based Intrusion Detection System for IoT Networks, Computers,12, 2, 34, (2023). https://doi.org/10.3390/computers12020034
Alharbi, A., Alosaimi, W., Alyami, H., Rauf, H. T. & Damaševičius, R. Botnet attack detection using local global best Bat algorithm for industrial internet of things. Electronics 10 (11), 1341. https://doi.org/10.3390/electronics10111341 (2021).
Wardana, A. A., Kołaczek, G., Warzyński, A. & Sukarno, P. Ensemble averaging deep neural network for botnet detection in heterogeneous internet of things devices. Sci. Rep. 14 (1), 3878. https://doi.org/10.1038/s41598-024-54438-6 (2024).
Alzahrani, M. Y. & Bamhdi, A. M. Hybrid deep-learning model to detect botnet attacks over internet of things environments. Soft Comput. 26 (16), 7721–7735. https://doi.org/10.1007/s00500-022-06750-4 (2022).
Hezam, A. A., Mostafa, S. A., Ramli, A. A., Mahdin, H. & Khalaf, B. A. Deep learning approach for detecting botnet attacks in IoT environment of multiple and heterogeneous sensors, in Advances in Cyber Security, vol. 1487, (eds Abdullah, N., Manickam, S. & Anbar, M.) in Communications in Computer and Information Science, 1487., Singapore: Springer Singapore, 317–328. doi: https://doi.org/10.1007/978-981-16-8059-5_19. (2021).
Hizal, S., Cavusoglu, U. & Akgun, D. A novel deep learning-based intrusion detection system for IoT DDoS security. Internet Things. 28, 101336. https://doi.org/10.1016/j.iot.2024.101336 (2024).
Abbas, S. et al. Evaluating deep learning variants for cyber-attacks detection and multi-class classification in IoT networks. PeerJ Comput. Sci. 10, e1793. https://doi.org/10.7717/peerj-cs.1793 (2024).
Elnakib, O., Shaaban, E., Mahmoud, M. & Emara, K. EIDM: deep learning model for IoT intrusion detection systems. J. Supercomput. 79 (12), 13241–13261. https://doi.org/10.1007/s11227-023-05197-0 (2023).
Neto, E. C. P. et al. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment, Sensors, 23, n13, A13, (2023). https://doi.org/10.3390/s23135941
Altunay, H. C. & Albayrak, Z. A hybrid CNN + LSTM-based intrusion detection system for industrial IoT networks. Eng. Sci. Technol. Int. J. 38, 101322. https://doi.org/10.1016/j.jestch.2022.101322 (2023).
Vishwakarma, M. & Kesswani, N. A deep neural network based real-time intrusion detection system for IoT. Decis. Analytics J. 5, 100142. https://doi.org/10.1016/j.dajour.2022.100142 (2022).
M, P. B. M. N. G. & Hema, M. S. Towards an effective deep learning-based intrusion detection system in the internet of things. Telematics Inf. Rep. 7, 100009. https://doi.org/10.1016/j.teler.2022.100009 (2022).
Chaganti, R., Suliman, W., Ravi, V. & Dua, A. Deep learning approach for SDN-Enabled intrusion detection system in IoT networks. Information 14 (1), 41. https://doi.org/10.3390/info14010041 (2023).
Khan, M. A. et al. A deep Learning-Based intrusion detection system for MQTT enabled IoT. Sensors 21, 7016. https://doi.org/10.3390/s21217016 (2021).
Afraji, D. M. A. A., Lloret, J. & Peñalver, L. An integrated hybrid deep learning framework for intrusion detection in IoT and IIoT networks using CNN-LSTM-GRU architecture. Computation 13 (9), 222. https://doi.org/10.3390/computation13090222 (2025).
Yang, K., Wang, J. & Li, M. An improved intrusion detection method for IIoT using attention mechanisms, BiGRU, and Inception-CNN. Sci. Rep. 14 (1), 19339. https://doi.org/10.1038/s41598-024-70094-2 (2024).
Anwer, R. W., Abrar, M., Ullah, M., Salam, A. & Ullah, F. Advanced intrusion detection in the industrial internet of things using federated learning and LSTM models. Ad Hoc Netw. 178, 103991. https://doi.org/10.1016/j.adhoc.2025.103991 (2025).
Wu, C. et al. Rethinking membership inference attacks against transfer learning. IEEE Trans. Inf. Forensic Secur. 19, 6441–6454. https://doi.org/10.1109/TIFS.2024.3413592 (2024).
Liang, R. et al. Vulseye: detect smart contract vulnerabilities via stateful directed graybox fuzzing. IEEE Trans. Inf. Forensic Secur. 20, 2157–2170. https://doi.org/10.1109/TIFS.2025.3537827 (2025).
Wu, C. et al. Mar., WAFBooster: Automatic boosting of WAF security against mutated malicious payloads, IEEE Trans. Dependable and Secure Comput., 22, 2, 1118–1131, (2025). https://doi.org/10.1109/TDSC.2024.3429271
Acknowledgements
The authors would like to express their sincere gratitude to Skill Morph Research Lab., Skill Morph, Dhaka, Bangladesh for their valuable support and guidance throughout this research. Their insights and resources were instrumental in the successful development and evaluation of the proposed TACNet framework.
Author information
Authors and Affiliations
Contributions
Kingkar Prosad Ghosh: Conceptualization, Data Preprocessing, Methodology, Model Evaluation, Comparative Analysis, Writing – Review & Editing.Mehedi Hasan: Data Preprocessing, Methodology, Hyperparameter Tuning, Writing – Review & Editing.Md. Towhidul Islam Robin: Conceptualization, Model Development, Expe rimental Setup, Writing – Review & Editing.Md. Alamgir Hossain: Conceptualization, Model Architecture Design, Data Preprocessing, Methodology, Visualization, Writing – Original Draft, Writing – Review & Editing.Md. Samiul Islam: Conceptualization, Hyperparameter Tuning, Validation, Experimental Analysis, Validation, Proofread the Manuscript, Supervision.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ghosh, K.P., Hasan, M., Robin, M.T.I. et al. A novel deep learning framework with temporal attention convolutional networks for intrusion detection in IoT and IIoT networks. Sci Rep 15, 44624 (2025). https://doi.org/10.1038/s41598-025-32697-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-32697-1












