KronNet a lightweight Kronecker enhanced feed forward neural network for efficient IoT intrusion detection

Ullah, Saeed; Wu, Junsheng; Kamal, Mian Muhammad; Saudagar, Abdul Khader Jilani

doi:10.1038/s41598-025-08921-3

Download PDF

Article
Open access
Published: 01 July 2025

KronNet a lightweight Kronecker enhanced feed forward neural network for efficient IoT intrusion detection

Saeed Ullah¹,
Junsheng Wu¹,
Mian Muhammad Kamal² &
…
Abdul Khader Jilani Saudagar³

Scientific Reports volume 15, Article number: 20850 (2025) Cite this article

2125 Accesses
5 Citations
Metrics details

Subjects

Abstract

The rapid expansion of Internet of Things (IoT) networks necessitates efficient intrusion detection systems (IDS) capable of operating within the stringent resource constraints of IoT devices. This study introduces KronNet, a lightweight feed-forward neural network enhanced with Kronecker product operations, designed for real-time IoT intrusion detection. KronNet leverages Gaussian Mixture Model (GMM)-based oversampling and a hybrid loss function combining Focal Loss and Cross-Entropy with adaptive class weighting to address class imbalance, ensuring robust detection across diverse attack types. Evaluated on the CICIoT2023 and BoT-IoT datasets, KronNet achieves exceptional performance, with accuracies of 99.01% and 99.91%, weighted F1-scores of 99.01% and 99.91%, and low false positive rates of 0.03% and 0.01%, respectively. The model operates with minimal computational overhead, utilizing 5,074 parameters (19.82 KB) for CICIoT2023 and 4,703 parameters (18.37 KB) for BoT-IoT, with inference times of 0.209 ms and 0.208 ms. Post-quantization, memory usage reduces to 4.96 KB and 4.59 KB, with negligible accuracy degradation (0.06% and 0.01% loss). Compared to state-of-the-art models, KronNet demonstrates up to 15,829× lower FLOPS and 12,010× faster inference, making it a highly efficient solution for edge deployment in resource-constrained IoT environments. This work advances IoT cybersecurity by delivering a scalable, accurate, and lightweight IDS capable of real-time threat detection.

Enhanced intrusion detection system IoT network security model by feed forward neural network and machine learning

Article Open access 15 October 2025

Elevating intrusion detection and security fortification in intelligent networks through cutting-edge machine learning paradigms

Article Open access 14 November 2025

BlockIntelChain: a blockchain-based cyber threat intelligence sharing architecture

Article Open access 05 December 2025

Introduction

The Internet of Things (IoT) has experienced an unprecedented surge in recent years, fundamentally reshaping the landscape of global connectivity. The forecast indicates that IoT devices worldwide will surpass 41 billion by 2030 while smart technology becomes embedded in homes and industries and critical infrastructure¹. The massive growth of IoT devices has brought outstanding convenience and efficiency but simultaneously created severe cybersecurity threats at an alarming rate. Also, IoT devices’ resource constraints make them vulnerable to DoS, DDoS, and other different attacks². The majority of IoT devices that manage personal or operational data lack advanced security features during their deployment³. Malicious actors target these weaknesses to convert devices into botnets for distributed attacks and use them for illicit cryptocurrency mining and ransomware attacks that hold devices hostage. The threats endanger personal privacy while creating major risks to network stability and integrity⁴.

Intrusion Detection Systems (IDSs) function as a primary defensive mechanism to monitor IoT networks for detecting malicious activity⁴. The traditional IDS methods depend on manual rule creation and attack signature development, which require extensive expertise and time-consuming processes. The massive volume of IoT traffic together with modern cyberattack complexity and adaptability make traditional security methods ineffective⁵.

Deep Learning (DL) has emerged as a transformative solution, offering advanced capabilities to bolster IDS performance. Techniques such as Deep Belief Networks (DBNs)⁶, Convolutional Neural Networks (CNNs)⁷, and Recurrent Neural Networks (RNNs)⁸ excel at extracting intricate patterns from raw network data, enabling precise anomaly detection and adaptability to evolving threats⁹. These sophisticated DL models are impractical for deployment on resource-constrained IoT devices, where processing power and storage are limited¹⁰. A need for DL framework to detect sophisticated multi-variant cyber threats, achieving high accuracy and speed efficiency in resource-constrained IoT environments¹¹. DL-based IDSs face a widespread problem: class imbalance. The model performance becomes skewed because benign traffic occurs much more frequently than rare attack types, resulting in elevated false-negative rates for critical minority classes^12,13. The imbalance leads models to prioritize accuracy for common traffic patterns while neglecting rare yet dangerous attack vectors¹⁴.

The problem has been addressed through various strategies, which include data augmentation methods to enhance minority class representation¹⁵, algorithmic adjustments to reduce learning biases¹⁶, ensemble approaches to leverage multiple model capabilities¹⁷, and specialized evaluation metrics to improve performance assessment¹⁸. The implementation of these methods to improve rare attack detection often leads to decreased precision in majority classes, which undermines system reliability. The trade-off presents a significant challenge for IoT systems because resource efficiency and consistent performance are essential non-negotiable factors¹⁹.

Our research is motivated by the need to develop an IDS that does not compromise on the detection of rare attack types in IoT networks while maintaining high accuracy on the common traffic patterns, exploiting the specific distributional properties of different attack categories. We plan to lead the development of lightweight, efficient DL solutions that are specifically designed to meet the computational requirements of IoT environments. To this end, we propose KronNet, a new lightweight framework that embeds Kronecker product operations into a simplified feed-forward neural network architecture. KronNet optimizes feature representation and computational efficiency, thus addressing both class imbalance and resource constraints in a single framework.

The primary contributions of this article are as follows:

1.
We propose KronNet, delivering exceptional detection accuracy—99.01% on CICIOT2023 with 5,074 parameters and 99.91% on BOT-IOT with 4,703 parameters—validated across diverse attack scenarios on these benchmark datasets.
2.
We implement Gaussian Mixture Model (GMM)-based oversampling, achieving a 32.15% F1-score improvement for minority classes, ensuring robust detection without synthetic bias.
3.
We utilize a hybrid loss function blending Focal Loss and Cross-Entropy, enhanced by adaptive class weighting, to prioritize rare attacks while maintaining high precision across all classes.
4.
We demonstrate outstanding efficiency, with inference times of 0.209 ms (CICIOT2023) and 0.208 ms (BOT-IOT), and post-quantization memory footprints of 4.96 KB and 4.59 KB, positioning KronNet as an ideal candidate for real-time IoT edge deployment.

The paper is structured as follows: section “Related work” reviews related work, section “Preliminaries” outlines preliminaries, section “Proposed methodology” details our methodology, section “Experiments and results” presents comprehensive experimental results, section “Discussion” discusses findings across datasets, and section “Conclusion and future work” offers conclusions.

Related work

Intrusion detection systems are essential for IoT network security, playing a crucial role in detecting malicious activities. With the proliferation of IoT devices and increasingly sophisticated attacks, there is a growing need for robust yet efficient IDS solutions. Deep learning has emerged as a promising approach to enhance detection effectiveness, though it faces unique challenges in IoT environments. In this section, we provide a concise overview of recent advances in deep learning-based intrusion detection systems for IoT, focusing on class imbalance and resource constraints.

Deep learning-based IDS for IOT

Recent research has demonstrated deep learning’s effectiveness in IoT intrusion detection. Numerous studies leverage deep learning techniques to develop sophisticated anomaly detection systems across various applications²⁰. These deep learning framework is employed in individual and Hybrid mode. Hybrid approaches combine supervised and unsupervised learning to leverage labeled data while discovering patterns in unlabeled data²¹. Narayan et al.²² proposed HDLBID, achieving 99.6% accuracy with CNN-BiLSTM and SMOTE-Tomek balancing, though neglecting energy efficiency crucial for IoT deployments. Feng et al.²³ introduced a two-level DDoS detection framework combining Rényi entropy with DCNN-LSTM models, but its computational requirements limit IoT applicability. Himanshu et al. Deploys CNN, BiLSTM, and transfer learning approach for detection model training and evaluation using N-BaIoT dataset²⁴. Racherla et al.²⁵ demonstrated lightweight LSTM networks with 96.8% accuracy and 1.49-second response time, though Raspberry Pi deployment revealed significant computational bottlenecks. Iram et al. Employs ConvLSTM2D with CUDA optimization for multi-vector IIoT threat detection, but faces computational complexity challenges and accuracy-speed trade-offs²⁶.

Kirubavathi and Nair²⁷ demonstrated hybrid deep learning potential with 98.45% accuracy using CNN-RNN, though high computational demands limit IoT practicality. Similarly, Khanday et al.²⁸ achieved 98-99% accuracy in DDoS detection using ANN and LSTM models but lacked quantification of computational overhead and resource utilization metrics critical for IoT deployment.

Class imbalance in IoT intrusion detection

Class imbalance significantly challenges IoT intrusion detection. Caihong et al.²⁹ addressed this with S2CGAN-IDS, enhancing minority attack detection while maintaining majority class accuracy, though lacking energy consumption analysis. Altaie and Hoomod³⁰ proposed a two-phase CNN-LSTM model achieving 98.78% accuracy on UNSW-NB15 with lower computational requirements than standalone architectures. Qaddos et al.³¹ presents hybrid CNN-GRU model with FW-SMOTE achieving 99%+ accuracy but requires significant computational resources, challenging IoT device constraints.

Traditional machine learning approaches have also been explored for addressing imbalance. Amgbara et al.³² investigated lightweight ML models for personal IoT security, finding they achieve reasonable detection accuracy with minimal computational overhead but struggle with complex attack patterns. Farooqi et al.³³ achieved 99.99% accuracy using Decision Trees with feature selection, demonstrating viability for lightweight approaches, though limited in detecting sophisticated attack patterns.

Resources efficiency in IoT security

Resource efficiency remains critical for IoT deployment. Wang et al.³⁴ achieved 99.44% accuracy with only 18.1KB memory using knowledge distillation, while Idrissi et al.³⁵ developed DL-HIDS with 2.704KB memory requirements. Himanshu et al. Introduces AttackNet, an adaptive CNN-GRU model achieved 99.75% accuracy for IIoT botnet detection and classification, outperforming state-of-the-art methods on N-BaIoT benchmark³⁶. Traditional ML approaches offer computational efficiency, with Azimjonov and Kim³⁷ achieving 99.81% accuracy using just 6 features with Linear SVMs, and Alwaisi et al.³⁸ demonstrating TinyML effectiveness with 96.9% accuracy using Decision Trees. K. Malik et al. Proposes lightweight one-class KNN achieving 98-99% F1-score with minimal computational overhead through 72% feature space reduction³⁹.

Jouhari and Guizani⁴⁰ developed a lightweight CNN-BiLSTM model achieving 96.91% accuracy with only 7,841 trainable parameters, though inference times of 3.8 s may be problematic for time-sensitive applications. Otokwala et al.⁴¹ combined feature selection with deep autoencoders, requiring only 2KB memory with 0.30s inference time, representing significant progress in model miniaturization, though lacking evaluation against advanced adversarial attacks. Himanshu et al.⁴² Proposed Explainable AI (XAI) enhances IDS transparency through interpretable decision-making, improving cybersecurity trust while using of SHAP for explainability is resource-intensive.

In another approach focusing on computational efficiency, Azimjonov and Kim⁴³ achieved 92.69% accuracy using Stochastic Gradient Descent Classifier with feature selection. While computationally efficient, these traditional approaches lack capability for automatically learning complex attack patterns, highlighting the need for optimized deep learning architectures that balance sophisticated threat detection with resource constraints.

Optimization and energy efficiency

Feature selection and model optimization effectively reduce computational overhead. Özer et al.⁴⁴ proposed optimal feature pairs from the Bot-IoT dataset, achieving 90% accuracy with minimal computational requirements, though overlooking deep learning possibilities. Himanshu et al. model Achieves 98.1% IDS accuracy with low latency and robust security against MITM, Sybil, and double-spending attacks⁴⁵. Yan et al.⁴⁶ introduced DGConv-IDS, reducing parameters from 37,121 to 10,785 while maintaining 99.83% accuracy on the CICIoT2023 dataset, though their similarity measurement method introduces computational overhead that limits real-time applicability.

Energy efficiency, often overlooked, was analyzed by Tekin et al.⁴⁷, demonstrating traditional models achieve competitive accuracy with significantly lower energy requirements than neural networks. However, their study focuses primarily on traditional ML algorithms, overlooking modern lightweight deep learning architectures. Anusha et al.⁴⁸ achieved 30% energy reduction and 25% better resource utilization through AI/ML integration, though lacking specific application to intrusion detection. Malik et al. Proposes Cu-(ConvLSTM2D-BLSTM) hybrid framework achieving 96-99% accuracy across IoT devices but requires time efficiency optimization under high request loads⁴⁹. Wang et al.⁵⁰ proposed DL-BiLSTM with dynamic quantization, achieving 99.67% accuracy while maintaining a small model size of 28.3KB, though not fully addressing model optimization during training.

Advanced optimization approaches include Gupta et al.⁵¹ ALO-CNN framework combining Random Forest feature selection with Ant Lion Optimization, achieving 97% accuracy, and Danquah et al.⁵² federated learning approach reducing computational complexity by 87.34% while achieving 99.93% accuracy. While these approaches effectively address computational efficiency, they lack consideration of real-time deployment constraints on resource-limited IoT devices.

Despite significant advances in lightweight intrusion detection for IoT, several critical gaps remain. First, most existing approaches achieving high accuracy do so at the cost of computational efficiency, with limited evaluation on actual IoT hardware. Second, approaches addressing class imbalance often introduce additional computational overhead during training or inference. Finally, there is insufficient attention to energy consumption metrics in evaluating model suitability for IoT deployment. In these cases, alternative approaches need to be considered to address the challenges posed by significant concerns about IDS in IoT systems.

Preliminaries

In this section, we outline the workflow of the proposed KronNet framework and the foundational techniques used in its development. The framework is proposed to address the challenges of class imbalance and resource constraints in IoT IDS, achieving high detection accuracy with low computational cost. Figure 1 illustrates the phase-wise workflow of the proposed model, which describes the process from data preparation to model training and evaluation in four distinct phases: Data Preprocessing (Phase-1), Class Balancing (Phase-2), Data Preparation (Phase-3), and KronNet Training and Evaluation (Phase-4).

IDS datasets

We apply the proposed framework to two recent IoT-specific datasets, CICIOT2023 and BOT-IOT. The datasets are chosen for their applicability to current IoT network traffic and varied attack profiles. The following are the descriptions of the datasets:

CICIOT2023 dataset

The CICIOT2023 dataset which Neto et al. (2023) introduced functions as a complete IoT traffic dataset for studying intrusion detection⁵³. Focuses on specific attacks while overlooking others, limiting detection scope⁵⁴. The dataset contains 1,237,396 instances after balancing which includes 41 features distributed across 34 attack classes that encompass DDoS attacks such as DDoS-TCP_Flood and DDoS-UDP_Flood as well as DoS attacks like DoS-SYN_Flood and other threats including MITM-ArpSpoofing, SqlInjection and XSS. Real-world IoT network behaviors make this dataset suitable for evaluating IDS in IoT environments.

BOT-IOT dataset

The BOT-IOT dataset which Koroniotis et al. (2019) created serves as a common resource for IoT security research⁵⁵. The balanced dataset contains 11,362,571 instances with 37 features distributed across 11 attack classes including DDoS (DDH, DDT) and reconnaissance attacks (Service_Scan, OS_Fingerprint) and additional attack types such as Data_Exfiltration and Keylogging. The dataset holds significant value because it focuses on botnet attacks against IoT networks while showing authentic attack scenarios.

Remark 1

The CICIOT2023 dataset provides attack classes in 34 categories and represents the current IoT threat environment because it was created recently. The BOT-IOT dataset provides 11.36 million instances of data from botnet attacks while being slightly older than CICIOT2023 and serves to enhance the evaluation with botnet-specific attack data. The proposed framework will be evaluated for its performance across different attack types and class distributions using these two datasets because NSL-KDD lacks IoT-specific traffic characteristics⁵⁶.

Data preprocessing

The preprocessing phase makes sure the datasets are ready for deep learning model training. It consists of the following tasks:

Data Cleaning and Handling Missing Values: The combined dataset (combined_df) is cleaned by filling missing values with zeros (combined_df.fillna(0)) and skewness removal through quantile transfer(unifom), ensuring no data points are lost while maintaining consistency.
Label Encoding: Categorical labels are encoded using a label encoder (label_encoder = label: idx for idx, label in enumerate(class_names)), converting class names (e.g., DDoS-TCP_Flood, Normal) into numerical indices for model compatibility.
Data Normalization: Features are normalized using the standardization formula in equation 1:
$$\begin{aligned} X_{\textrm{normalized}}=\left( X-\textrm{mean}\right) /\textrm{std} \end{aligned}$$
(1)
where X is the feature matrix and mean and std are computed along each feature dimension (np. nanmean(X, axis=0), np.nanstd(X, axis=0)). This step scales features to a zero mean and unit variance, mitigating the impact of varying feature scales on model training.

Class balancing techniques

Class imbalance is a significant challenge in IDS datasets, where benign traffic often dominates attack samples, leading to biased models. To address this, we employ:

GMM Oversampling with Gaussian Mixture Models: The gmm_oversampling function uses Gaussian Mixture Models (GMM) to generate synthetic samples for minority classes (GaussianMixture(n_components=1)). For each minority class, GMM fits the data distribution and samples new instances to match the majority class size (majority_size = max(class_counts.values())). This results in balanced datasets with shapes (1,237,396, 41) for CICIOT2023 (34 classes) and (11,362,571, 37) for BOT-IOT (11 classes). Unlike traditional methods like SMOTE, GMM-based oversampling captures the underlying data distribution more effectively, improving the quality of synthetic samples⁵⁷.

Data preparation

This phase prepares the balanced dataset for model training:

Dataset Splitting: The data is split into training, validation, and test sets using a 70-15-15 ratio (train_test_split with test_size=0.3, followed by a 50-50 split of the remaining data). Stratified sampling ensures proportional class representation across splits (stratify=y).
DataLoader Creation: The prepare_data function creates PyTorch DataLoader objects (TensorDataset, DataLoader) with a batch size of 512, enabling efficient batch processing during training (batch_size=512, shuffle=True).

Kronecker-enhanced neural networks

The KronNet model leverages feed-forward neural networks with Kronecker product enhancements to achieve lightweight and accurate intrusion detection:

Feed-forward neural architecture: The model employs a lightweight feed-forward architecture⁵⁸ with configurable hidden dimensions ([64, 32]) to balance computational efficiency and representational capacity. Each layer consists of linear transformations followed by ReLU activations and a dropout rate of 0.05 for regularization, supporting the lightweight design suitable for IoT environments.
Kronecker Product Enhancement: A key innovation in our approach is the Kronecker product operation (kronecker_product) that enhances feature interactions by transforming hidden representations into a matrix form and applying a learnable Kronecker matrix (kronecker_left, matmul(x_matrix, kronecker_left)). The technique cuts down the number of parameters while still being able to capture complex feature relationships⁵⁹ which makes it possible to model traffic patterns with fewer parameters than traditional architectures.
Hybrid Loss Function: The training process uses a hybrid loss combining Focal Loss and Cross-Entropy Loss (0.7 * focal_loss + 0.3 * ce_loss). Focal Loss (FocalLoss(gamma=2.0)) focuses on hard-to-classify samples, reducing class imbalance, while Cross-Entropy Loss ensures overall classification accuracy.
Quantization: After training, dynamic quantization reduces memory for edge deployment while maintaining accuracy.

Proposed methodology

The KronNet model presented in Fig. 2 represents a lightweight Kronecker feed-forward neural architecture for intrusion detection in IoT environments. The KronNet model uses feed-forward neural networks with Kronecker product operations to obtain high detection accuracy while keeping resource usage low. The design of this model makes it suitable for resource-limited IoT devices which need both detection performance and computational efficiency. The model architecture together with training process and intrusion detection algorithmic workflow are explained in this section after the Preliminaries section (section “Preliminaries”). The model receives additional optimization through quantization for practical deployment.

Balancing through GMM-based oversampling

The CICIOT2023 and BOT-IOT datasets require severe class balancing which KronNet addresses through GMM-based oversampling before training. The gmm_oversampling function implements this method to create synthetic minority class samples which balance the dataset for better feature learning across all classes.

The dataset contains features X which belong to $\mathbb {R}^{n \times d}$ ($X \in \mathbb {R}^{n \times d}$), where n is the number of samples and d is the feature dimension (41 for CICIOT2023, 37 for BOT-IOT) and labels $y \in \{0,1,\ldots ,C-1\}$ (C is the number of classes–34 for CICIOT2023, 11 for BOT-IOT), the process is as follows:

Compute the class counts $\{n_c\}_{c=0}^{C-1}$, where $n_c = |{i: y_i = c}|$, and identify the majority class size $n_{\max } = \max _{c} n_c$.

For each class $c \in \{0, 1, \ldots , C-1\}$:

Extract samples $X_c = \{x_i: y_i = c\}$, with $|X_c| = n_c$.
If $n_c < n_{\max }$ and $n_c > 0$, fit a GMM with k components ($k = 1$, n_components=1) on $X_c$: $\textrm{GMM}_c = \textrm{GaussianMixture}(X_c, k, \mathrm {random\_state} = 42)$
Generate $n_{\textrm{synthetic}} = n_{\max } - n_c$ synthetic samples: $X_{\textrm{synthetic}}, \_ = \textrm{GMM}_c.\textrm{sample}(n_{\textrm{synthetic}})$
Combine the original and synthetic samples: $X_c^{\prime } = X_c \cup X_{\textrm{synthetic}}$, with corresponding labels $y_c^{\prime } = [c] \times (n_c + n_{\textrm{synthetic}})$.

Aggregate all classes to form the balanced dataset $(X_{\textrm{balanced}}, y_{\textrm{balanced}})$, where each class has $n_{\max }$ samples.

Model architecture

The KronNet model is designed to balance detection performance with computational efficiency. Its architecture, implemented in the provided code, consists of the following components:

Input Layer: The model accepts input dimensions corresponding to the dataset features—41 for CICIOT2023 and 37 for BOT-IOT (input_dim = X.shape[1]).
Feed-Forward Hidden Layers: The core of the model comprises two hidden layers with dimensions [64, 32] (hidden_dims = [64, 32]). Each layer consists of:
- A linear transformation (nn.Linear) to project the input to the specified hidden dimension.
- A ReLU activation function (nn.ReLU) introduces nonlinearity to enhance feature representation.
- A dropout layer (nn.Dropout(0.05)) to prevent overfitting by randomly dropping 5% of the units during training. These layers extract hierarchical features from the input data, progressively reducing dimensionality while preserving critical patterns for classification.
Kronecker product enhancement: A key innovation in KronNet is the Kronecker product-inspired operation, which improves feature interactions while keeping the architecture lightweight. The KronNet architecture, illustrated in Figure 3, integrates a Kronecker product enhancement to achieve efficient feature interactions with minimal computational overhead. The hidden representation $h \in \mathbb {R}^d$ (where d is the output dimension of the last KronNet layer) is projected to a $2 \times 2$ matrix via a linear layer (self.to_kronecker), resulting in $X_{matrix} \in \mathbb {R}^{(b \times 2 \times 2)}$, where b is the batch size and kronecker_size = 2. This matrix is computed as in Eq. (2):
$$\begin{aligned} X_{matrix} = \text {reshape}(\text {Linear}(h), (b, 2, 2)) \end{aligned}$$
(2)
Instead of a traditional Kronecker product, the operation performs a matrix multiplication with a learnable Kronecker matrix $K_{left} \in \mathbb {R}^{(2 \times 2)}$ (kronecker_left), initialized as $K_{left} \sim \mathcal {N}(0, 1/\text {kronecker\_dim})$. The resulting output is presented in Eq. (3):
$$\begin{aligned} \text {Kron}_{out} = \text {matmul}(X_{matrix}, K_{left}) \end{aligned}$$
(3)
Where $\text {Kron}_{out} \in \mathbb {R}^{(b \times 2 \times 2)}$. This output is then flattened to a vector $\text {Kron}_{out} \in \mathbb {R}^{(b \times 4)}$, capturing complex feature interactions with minimal parameters. The flattened vector is passed to the classifier. This approach significantly reduces the parameter count compared to traditional architectures, which often require orders of magnitude more parameters to model similar interaction patterns, making KronNet highly efficient for resource-constrained IoT devices.
Output layer: The (self.classifier) linear layer transforms the Kronecker output into class numbers which amount to 34 for CICIOT2023 and 11 for BOT-IOT (num_classes = len(class_names)). The model generates raw scores which become probabilities during evaluation by applying softmax to the outputs (F.softmax(outputs, dim = 1)).

Training process

The training of KronNet is performed in two stages to address class imbalance and optimize performance:

Initial training (5 Epochs): The first stage is to train for 5 epochs (initial_epochs=5) to identify the underperforming classes. The model is evaluated on the validation set using the identify_underperforming_classes function, which computes F1-scores for each class. The classes with F1-scores less than 0.7 are marked and their corresponding class weights are updated (class_weights[idx] *= (1.0 + weight_factor)). This way, the model pays more attention to minority classes in the next training.
Full training (Up to 50 Epochs): The second stage continues training with the adjusted class weights for up to 50 epochs (max_epochs=50). The model uses a hybrid loss function combining Focal Loss and Cross-Entropy Loss (0.7 * focal_loss + 0.3 * ce_loss):
Focal Loss (FocalLoss(gamma=2.0)): It focuses on hard to classify samples by down-weighting easy examples, reducing the effect of class imbalance⁶⁰.
Cross-Entropy Loss (nn.CrossEntropyLoss): It is used to optimize the overall classification accuracy. The training process uses the Adam optimizer (lr=0.001, weight_decay=1e-5) with a StepLR scheduler (step_size=10, gamma=0.8) to adjust the learning rate. Early stopping is applied with a patience of 5 epochs to prevent overfitting, as seen in the BOT-IOT training, which stopped after 13 epochs.

This two-stage approach, combined with the GMM-based oversampling technique discussed in section “Balancing through GMM-based oversampling”, provides a comprehensive solution to the class imbalance problem in IoT intrusion detection. The dynamic adjustment of class weights during training ensures that the model maintains high detection rates for both common and rare attack patterns.

Algorithm for intrusion detection

Algorithm 1: KronNet Intrusion Detection Approach

Input: Preprocessed and balanced dataset $D'$, DataLoaders for training, validation, and test sets

Description:

Step 1 - Model Initialization:

Initialize KronNet with input_dim (41 for CICIOT2023, 37 for BOT-IOT), hidden_dims=[64, 32], num_classes (34 for CICIOT2023, 11 for BOT-IOT), and kronecker_size=2.
Move the model to the appropriate device (model.to(device)), utilizing CUDA if available (device = ’cuda’).

Step 2 - Two-Stage Training:

Initial Training:

Train for 5 epochs using the hybrid loss function (0.7 * focal_loss + 0.3 * ce_loss).
Evaluate on the validation set to identify underperforming classes (F1 < 0.7).
Adjust class weights dynamically to focus on minority classes.

Full Training:

Continue training with updated class weights for up to 50 epochs.
Use Adam optimizer (lr=0.001, weight_decay=1e-5) and StepLR scheduler (step_size=10, gamma=0.8).
Apply early stopping with patience of 5 epochs.

Step 3 - Evaluation:

Evaluate the trained model on the test set (calculate_metrics):

Compute metrics including accuracy, weighted precision, recall, F1-score, ROC-AUC, TPR, FPR, t-SNE, quantization and resource usage (e.g., parameters, Flops, memory, inference time).
Generate a classification report, confusion matrix, ROC Curve, Classwise distribution, and t-SNE plots.

Step 4 - Quantization:

Quantize the model using dynamic quantization (torch.quantization.quantize_dynamic, dtype=torch.qint8) to reduce its computational footprint.
Save the quantized model and metadata (torch.save(save_dict, ”iot_KronNet_quantized.pth”)).

Output: Classification output $y \in \{c_1, c_2, \ldots , c_m\}$, where $m=34$ for CICIOT2023 and $m=11$ for BOT-IOT, along with performance metrics.

Hyperparameters

Table 1 mention all the parameters tuned in the KronNet Model.

Table 1 Hyperparameters.

Subjects

Abstract

Similar content being viewed by others

Enhanced intrusion detection system IoT network security model by feed forward neural network and machine learning

Elevating intrusion detection and security fortification in intelligent networks through cutting-edge machine learning paradigms

BlockIntelChain: a blockchain-based cyber threat intelligence sharing architecture

Introduction

Related work

Deep learning-based IDS for IOT

Class imbalance in IoT intrusion detection

Resources efficiency in IoT security

Optimization and energy efficiency

Preliminaries

IDS datasets

CICIOT2023 dataset

BOT-IOT dataset

Remark 1

Data preprocessing

Class balancing techniques

Data preparation

Kronecker-enhanced neural networks

Proposed methodology

Balancing through GMM-based oversampling

Model architecture

Training process

Algorithm for intrusion detection

Hyperparameters

Experiments and results

System configuration and setup

Experimental analysis

Experiment 1: model convergence analysis

Experiment 2: classification performance analysis

Experiment 3: computational efficiency analysis

Experiment 4: impact of GMM-based oversampling

GMM-based oversampling: mode collapse and generative fidelity

Experiment 5: class-wise analysis

Experiment 6: discrimination capability analysis

Experiment 7: t-SNE visualization of model embeddings and explainability framework

Experiment 8: quantization impact analysis and edge deployment

Ablation study and generalization analysis

Generalizability of KronNet to other real-world dataset

Adversarial robustness evaluation of KronNet

Theoretical basis for Kronecker products in neural features

Overfitting analysis from synthetic balancing

Comparative analysis with state-of-the-art approaches

Discussion

Conclusion and future work

Data availibility

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links