Abstract
The rapid expansion of cloud-based Internet of Things (IoT) systems has intensified security challenges due to the large-scale transmission of sensitive data from resource-constrained devices to cloud infrastructures. Conventional cryptographic techniques often impose high computational and memory overhead. Consequently, there is a critical need for security frameworks that balance strong data protection with efficient resource utilization while supporting intelligent threat detection. This study proposes an integrated security framework that combines lightweight and hybrid cryptographic algorithms with machine learning (ML) models to secure IoT data transmission in cloud-based environments. Four encryption techniques, XOR, ChaCha20, AES, and a hybrid AES–RSA scheme, are systematically evaluated in terms of memory consumption, CPU usage, and overall resource efficiency using the Overall Resource Consumption Score (ORCS). Secure data transmission is simulated using the MQTT protocol, while ML-based intrusion detection is performed using Random Forest (RF), XGBoost, CatBoost, and ensemble classifiers. Experiments are conducted on two real-world IoT datasets, MQTTEEB-D and CIC IoT 2023 for IoT network traffic. On the MQTTEEB-D dataset, the hybrid AES–RSA scheme achieved a low memory usage of 0.126 KB per traffic with an ORCS of 0.56, while the voting ensemble classifier attained the highest detection accuracy of 92.68%. On the CIC IoT 2023 dataset, comprising 605,839 test records, the hybrid AES–RSA method required 0.374 KB per traffic and achieved an ORCS of 0.5425, whereas the voting ensemble model achieved an accuracy of 81.09%. The findings demonstrate that hybrid cryptography provides an effective balance between security and efficiency for cloud-based IoT systems, while ensemble ML models significantly enhance intrusion detection performance.
Similar content being viewed by others
Introduction
The rapid proliferation of the Internet of Things (IoT) and the expansion of cloud computing have significantly transformed various industries, enabling enhanced efficiency, scalability, and connectivity1,2. However, this technological advancement brings with it critical security challenges, particularly in ensuring the confidentiality, integrity, and authenticity of data transmitted between IoT devices and cloud environments3,4. The sensitivity of the data generated by IoT devices, ranging from healthcare information to personal data, demands robust security measures to mitigate the risk of cyberattacks. As such, cryptography has emerged as a fundamental tool in safeguarding IoT systems against these threats, ensuring the privacy and integrity of data during transmission and storage5.
In IoT ecosystems, where devices are often constrained by limited computational resources, traditional cryptographic algorithms designed for powerful computing environments are not suitable due to their high memory and processing power requirements6. Lightweight cryptography provides an approach to encryption that ensures the security of IoT systems while minimizing the impact on the devices’ limited resources, such as memory and CPU power7. These constraints present a significant challenge, as encryption is necessary for securing data before transmission over potentially insecure networks.
Despite significant advancements in IoT infrastructure, securing large-scale IoT systems remains a major challenge due to the heterogeneity of devices, constrained computational resources, and continuous data transmission to cloud environments8. Modern IoT infrastructures generate massive volumes of sensitive data that are frequently transmitted over public or semi-trusted networks, making them attractive targets for cyberattacks. While cloud platforms offer scalability and computational power, they also increase the attack surface, particularly when data encryption and intrusion detection mechanisms are not optimized for resource-limited IoT devices9. Therefore, there is a growing need for security solutions that are specifically designed for cloud-based IoT infrastructures, ensuring strong data protection while maintaining low memory and CPU overhead.
Existing IoT security approaches focus on either cryptography or intrusion detection, but not both. Traditional cryptographic algorithms are often too resource-intensive for lightweight IoT devices. Meanwhile, machine learning–based intrusion detection systems require access to unencrypted data, compromising confidentiality. Many studies use small datasets and fail to assess encryption’s impact on memory and CPU, or consider secure transmission protocols like MQTT10. This highlights a gap in comprehensive frameworks that evaluate encryption, secure cloud transmission, and intelligent attack detection under realistic IoT constraints.
In cloud-based IoT networks, data generated by IoT devices is often transmitted to the cloud for processing, storage, and analysis. Therefore, ensuring the confidentiality and integrity of data while it is being transmitted between IoT devices and the cloud is a critical issue11. Furthermore, the integration of ML algorithms into IoT environments can enhance security by enabling intelligent monitoring of network traffic, detecting anomalies, and identifying potential threats. However, the performance of these ML models is directly influenced by the efficiency of the underlying cryptographic algorithms. This study aims to address the lack of unified security frameworks in cloud-based IoT environments by proposing a framework that integrates lightweight and hybrid cryptographic algorithms (XOR, ChaCha20, AES, and AES-RSA) with ensemble machine learning models. Subsequently assessing their performance regarding resource utilization (memory and CPU consumption) and security effectiveness.
In the first part of this research, a comprehensive analysis is conducted on the MQTTEEB-D and CIC IoT 2023 datasets, which contain real-world IoT network traffic data12,13. The encryption process is a first phase in this study, then integration of ML models such as Random Forest (RF), eXtreme Gradient Boosting (XGBoost), CatBoost, and ensemble models, are used in mitigating potential security threats in IoT networks. Developing effective malware detection techniques in IoT networks is important to ensure the privacy and security of IoT devices, networks, and end users14,15. Additionally, explores the integration of the MQTT protocol for secure data transmission from IoT devices to the cloud. MQTT is a lightweight messaging protocol commonly used in IoT applications, offering efficient and reliable communication between devices16,17.
This work present end-to-end security framework that simultaneously integrates lightweight and hybrid encryption, secure cloud transmission via MQTT, and ensemble machine learning-based intrusion detection within a single pipeline. Specifically, the contribution of this study is threefold.
-
A composite metric, the Overall Resource Consumption Score (ORCS), is introduced to jointly quantify memory and CPU trade-offs across four encryption algorithms (XOR, ChaCha20, AES, and hybrid AES–RSA), enabling objective, resource-aware algorithm selection for constrained IoT devices.
-
The MQTT protocol is incorporated as a realistic cloud transmission layer applied after encryption, closely simulating real-world IoT deployment conditions.
-
The ensemble ML classifiers are trained and evaluated on encrypted IoT traffic rather than raw network data, reflecting the operational reality in which detection systems must function alongside active encryption.
-
Experiments are conducted on two real-world IoT datasets, MQTTEEB-D and CIC IoT 2023, confirming that the proposed framework generalizes across different IoT traffic scenarios.
The rest of paper structure as follows: Sect. 2 provides a review of the related work. Section 3 outlines the methodology employed in the study. Section 4 presents the results of the research and the discussion. Finally, Sect. 5 concludes the paper by summarizing the key contributions and suggesting directions for future research.
Related work
The idea of hybrid cryptography, which combines encryption methods with AI, is very important for IoT environments that are based in the cloud. The AES and ChaCha20 algorithms were selected based on their NIST certification, specifically the FIPS PUB 197 standard for AES, ensuring their robustness and trustworthiness in modern cryptographic applications, including IoT18. For example, in the study19, proposed a healthcare IoT framework that uses fog computing and a hybrid mathematical model that combines Elliptic Curve Cryptography (ECC) and Proxy Re-encryption (PR) with the Enhanced Salp Swarm Algorithm (ESSA). This cuts processing time from 60 milliseconds to 18 milliseconds and increases reliability from 25% to 3%.
On the other hand, using ML to manage cryptographic keys is a big step forward for IoT security. K. Karimunda et al.20, introduced a security framework that integrated the MQTT-based IoT protocol and device communication. A hybrid approach combining elliptic curve cryptography to ensure message confidentiality through encryption and ANNs utilized for anomaly detection and classification. Their framework achieved an accuracy of 90.38% in detecting and classifying anomalous using MQTTset dataset. The limitation is in lack of specific dataset and encryption method that suitable to embedded in IoT devices.
H. Nagarajan, et al.21, proposed an AI-driven cryptographic framework to enhance security in smart cloud environments. The framework combined symmetric, asymmetric, and homomorphic encryption methods to counter emerging risks and ensure data integrity, confidentiality, and operational efficiency. AI models are trained to detect threats in real-time to identify anomalies and address new vulnerabilities. The AI-driven achieved a high accuracy of 93.8% and encryption throughput of 520.7 operations per second.
B. Duc Manh et al.22, proposes a privacy-preserving model employing a combination of AI and homomorphic encryption. AI-driven placed at blockchain nodes, data from blockchain nodes is encrypted using homomorphic encryption before being sent to a cloud and training by DNN. The proposed method achieved detection accuracy nearly identical to unencrypted approaches, with a gap of approximately 0.01. Homomorphic encryption generally introduces significant computational overhead compared to operations on unencrypted data.
Darshan Ingle and Divyanka Ingle23, proposed a novel model called BC-Trans Network ensuring a robust and tamper-proof authentication mechanism. Fully homomorphic encryption was employed on CSE-CIC-IDS2018 dataset and transformer model. The model achieved an accuracy of 99.25%, a precision of 99.53%, a recall of 99.32%, and an F1 score of 99.59%, with detection times of 225.3 s, but the detection for binary normal and abnormal attacks classification.
T. Aljrees et al.24, proposed a paradigm that combines efficient data encryption, the quondam signature algorithm, and federated learning to enhance IoT security. The proposed scheme optimizes time complexity through a synergy of offline phase computations and online phase signature generation. The execution time taken for the proposed encryption algorithm is 0.034 in seconds. The limitations of this study are that it did not address the scalability problems in Internet of Things networks, and the practical challenges in implementing encryption and data transmission across diverse Internet of Things environments.
S. Selvarajan et al.25, introduced a model that operated in data authentication and attack prevention using a lightweight blockchain algorithm, and attack classification using an AI mechanism. Consensus proof-of-work to ensure privacy and sprinter neural network to predict and classify attacks using NSL-KDD, DS2OS, and BOT-IoT, and UNSW-NB15 datasets. The proposed AILBSM framework reduced execution time, achieving a processing time of 0.6 s. The model achieved an overall classification accuracy of 99.8%. The limitation was still for encryption on the IoT devices levels.
M. Elkhodr et al.26, proposed an AI-driven orchestration, advanced cryptographic techniques. The model combined classical and post-quantum algorithms with digital twins using a Markov model and hash-based signature scheme. Simulation results showed processing impact under 0.05% and memory usage under 0.1%, threat detection rates between 85% and 99%. Quantum-resistant cryptography not practically implemented in the simulations due to the absence of mature quantum-computing simulation tools.
M. Jarin et al.27, proposed the use of Elliptic Curve Cryptography (ECC) for encrypting cloud data and transmission, employing the NSL-KDD dataset and ML models such as LightGBM and Random Forest (RF). For binary classification tasks, LightGBM achieved an accuracy of 98.71%, while RF achieved 98.51%. A limitation of the study is that it only evaluates binary classification tasks, without incorporating IoT-embedded algorithms or cloud simulations.
R. Yuvarani and R. Mahaveerakannan28, employed a hybrid cryptographic combining symmetric (AES, Blowfish, Twofish) and asymmetric (ECC, RSA) encryption algorithms to secure IoT cloud banking environments. The proposed algorithm is evaluated through experimental simulations on an IoT cloud banking environment, with analysis against various attacks. The approach achieved 25% improvement in encryption throughput and 30% reduction in computational overhead versus standalone algorithms, but there was no AI training.
N. KASHYAP et al.29, implemented a hybrid cryptosystem (ECC + AES) on Raspberry Pi for secure IoT data transmission to the cloud. The methodology combined ECC for secure key exchange AES for fast encryption and file uploads to AWS S3 buckets, demonstrating improved encryption. They achieved faster encryption mechanism compared to previous algorithms. Testing appears focused only on Raspberry Pi; generalizability to other IoT devices is unclear.
K. S. Prasad et al.30, introduced the CASAE-POADMA methodology by integrated attention-based stacked autoencoders (ASAE) with a Pelican Optimization Algorithm (POA) for the detection and mitigation of cyberattacks. The results validated on benchmark datasets, demonstrate an impressive 99.50% accuracy in detecting and mitigating attacks. However, the validation is limited to these datasets, and there is no real-world IoT network testing included. Table 1 presents the literature summary with key details on cryptographic algorithms, AI techniques for the related studies.
The main gap this research, specifically, as evidenced in the related work, prior studies exhibit at least one of three critical limitations: (a) encryption algorithms evaluated in isolation without realistic transmission simulation, e.g27,29, leading to overestimated performance; (b) ML-based detection systems operating on unencrypted data, e.g21,24, which compromises confidentiality in real deployments; or (c) evaluation on small, single datasets without cross-dataset validation in23, limiting generalizability. The present work directly addresses all three limitations simultaneously.
As shown in Table 2, no prior work simultaneously addresses all five dimensions of the proposed framework. This multi-dimensional integration constitutes the core research contribution of this study.
Methodology
This section outlines the methodology followed in this study to enhance cryptography algorithms for securing IoT data in cloud environments and ML. The proposed approach involves several key steps, as shown in the Fig. 1. The study utilizes two primary IoT datasets, MQTTEEB-D and CIC IoT 2023 datasets. The data is preprocessed by cleaning and handling any missing values to ensure the quality of the data. The selection of cryptographic algorithms and ML models in this study follows a principled design rationale grounded in three criteria: (1) computational suitability for resource-constrained IoT devices, (2) cryptographic strength as recognized by NIST standards, and (3) representativeness across a spectrum from lightweight to hybrid approaches. XOR was selected as a minimal-overhead baseline representing ultra-lightweight encryption. ChaCha20 was selected as a NIST-recommended stream cipher proven effective in TLS protocols for low-power environments. AES was included as the dominant symmetric standard (FIPS PUB 197). Hybrid AES-RSA was included to evaluate the trade-off of combining symmetric efficiency with asymmetric key protection. The encrypted data is transmitted to the cloud using the MQTT protocol, a lightweight messaging protocol ideal for IoT applications. Similarly, RF, XGBoost, and CatBoost were selected as complementary ensemble approaches with demonstrated strength in network intrusion detection. The study also evaluates the memory and CPU usage of the encryption algorithms to assess their efficiency in IoT environments. The resource consumption is measured using metrics such as ORCS, which combines memory and CPU usage into a single value. The architecture proposed in this study is motivated by a critical gap in the existing literature: most prior works either evaluate encryption algorithms in isolation without considering real transmission protocols, or apply ML-based intrusion detection on unencrypted traffic without accounting for the confidentiality requirements of operational IoT deployments.
The proposed methodology for enhancing the encryption algorithms of the IoT cloud-based and machine learning.
Research assumptions
The methodology is based on the following explicit assumptions: (1) IoT devices are assumed to be capable of executing symmetric encryption at the packet level, with memory budgets below 1 MB per operation, consistent with mid-range constrained devices such as ESP32 and Arduino Mega. (2) Network transmission is assumed to occur over a public broker (test.mosquitto.org, port 1883) without TLS transport-layer security, so that the encryption layer evaluated in this study represents the sole confidentiality mechanism, this reflects a realistic worst-case IoT deployment scenario. (3) Feature distributions in the MQTTEEB-D and CIC IoT 2023 datasets are assumed to be representative of real-world MQTT-based IoT traffic, consistent with the original dataset publications12,13. (4) CPU and memory measurements are assumed to reflect single-process execution without background load interference, controlled by repeating measurements three times and averaging results.
Dataset description
In this study, two recent datasets in cybersecurity and IoT applications were used, where they were encrypted and simulated to be sent to a cloud environment that had been previously trained on ML models.
MQTTEEB-D dataset
The MQTTEEB-D dataset is produced at the International University of Rabat (UIR), Morocco, to improve IDS for Message Queuing Telemetry Transport-based IoT networks12. The data is used for enabling meticulous observation of network traffic anomalies, including DoS, Slow DoS targeting IoT environments (SlowITe), Malformed data injection, brute force, and MQTT publish flooding. It contains 222,813 rows and 13 features, including timestamp, TCP flags, TCP time delta, TCP length, etc., as shown in Table 3.
CIC IoT dataset 2023 dataset
The CIC IoT 2023 dataset is a large test for IoT intrusion detection systems. It was made from a network of 105 devices13. There are 46 million records of normal traffic and 33 different attacks that belong to seven types of attack, such as DDoS, DoS, reconnaissance, web-based, brute-force, spoofing, and the Mirai botnet. There are 46 labeled network flow features in the dataset, provided in both raw PCAP and pre-extracted CSV formats. In this study, we applied our proposed methods to all seven attacks, but on partial records due to their huge size. Table 4 provides an overall description of the dataset for all types of attacks and the subset used in this study. Approximately three million records from this database were trained to the capacity of the hardware used in this study. This number is also very large when comparing the encryption algorithms before sending them to the cloud.
Dataset preprocessing and feature selection
This section discusses the preprocessing and feature selection processes applied to the two datasets used in the study. The preprocessing process includes data cleaning, handling null values, converting data to appropriate formats, and identifying the most relevant features using ML techniques.
Initially, the null values within each column were checked, and appropriate actions were then taken to handle the null values31. In the MQTTEEB Dataset, columns containing values in text format, such as `mqtt_conflags` and `mqtt_hdrflags`, were converted from decimal text format to integer format. The `timestamp` column was handled and converted to the `datetime` type. In the CIC IoT 2023 Dataset, the infinite values in the `Rate` column were handled and replaced with the average value. Attack_Type column, which contains attack types as text values, these values were encoded into numbers using the LabelEncoder from the Sklearn library.
For the feature selection, the RF classifier model was applied to extract the importance of features. This model allows for the identification of the most influential features in classification and prediction. After training the model on the data, feature importance was extracted using the model.feature_importances32.
The top 10 and 30 features were displayed based on their ranking in the MQTTEEB dataset and the CIC IoT 2023 dataset, respectively, as shown in Figs. 2 and 3.
The most important of the features that were selected by the RF classifier on the MQTTEEB dataset.
The most important of the features that were selected by the RF classifier on the CIC IoT 2023 dataset.
Dataset encryption
This section of the study addresses dataset encryption as a necessary measure to protect sensitive information in network and application data related to the IoT. Data encryption involves converting information into a format that can only be read by authorized individuals or systems, using appropriate encryption algorithms to ensure data confidentiality and security before transmission to the cloud environment33.
In the MQTTEEB-D dataset, features suitable for encryption were identified based on their sensitivity, as shown in Table 5. These features contain sensitive information that could reveal details about the communication state or the protocols used for communication between devices.
For the CIC IoT 2023 dataset, the appropriate features for encryption were also identified, as shown in Table 6.
The encryption algorithms used vary depending on security and speed requirements, as well as the resources available in IoT environments. In this study, a range of algorithms were used to secure data, providing a balance between security and efficiency. These algorithms include XOR, ChaCha20, and AES, as well as a hybrid algorithm combining AES and RSA, which offers both high security and efficiency.
Light IoT encrypt (XOR)
The XOR algorithm was used to encrypt data in this study. XOR is one of the simplest encryption algorithms and can be used in low-power environments such as IoT devices. XOR is applied to data using a fixed key to encrypt the information34.
ChaCha20 encryption algorithm
ChaCha20 is a modern, high-performance stream cipher that encrypts data using an Add-Rotate-XOR (ARX) based approach. It is widely used for its speed and strong security, particularly in applications like the TLS protocol for securing internet communications35. ChaCha20 uses a 256-bit (32-byte) key and a 126-bit (16-byte) nonce. ChaCha20 is faster and more secure in low-power hardware environments compared to traditional algorithms like AES.
AES
AES is a symmetric encryption algorithm used for securing data, meaning it uses the same key to encrypt and decrypt information36. It works by processing data in 128-bit blocks and uses a key size of 128, 192, or 256 bits. AES is highly efficient and widely used globally for applications ranging from file and device encryption to secure communication and wireless networks.
A Hybrid AES and RSA algorithm
The hybrid algorithm used combines AES and RSA encryption. AES is used to encrypt the actual data due to its efficiency in fast encryption. The AES key is then encrypted using RSA to ensure the key’s security. This solution offers a balance between speed and security and is effective in IoT environments that may contain devices with limited resources37.
Initially, a random AES key is generated and used to encrypt the data in CBC mode. Next, an RSA key pair is generated, with the public key used to encrypt the AES key. Upon receiving the encrypted data, the AES key is decrypted using the private key. This method ensures data security by combining AES for data encryption and RSA for AES key protection. Algorithm 1 shows the flow diagram of the hybrid encryption algorithm using AES and RSA encryption.
A hybrid AES and RSA.
The framework optimizes lightweight and hybrid encryption for IoT resource constraints through a tiered selection strategy guided by the ORCS metric. Moving from XOR to hybrid AES-RSA increases memory-per-row by only 2.79× on MQTTEEB-D (0.0452 → 0.126 KB) while gaining asymmetric key protection and forward secrecy through ephemeral AES session key generation. This security gain justifies the resource cost for any deployment handling sensitive IoT data. The specific trade-offs across all four algorithms are quantified in Sect. 4 via ORCS, enabling objective resource-aware algorithm selection without exceeding the memory budgets of mid-range IoT devices.
Encryption algorithms hyperparameters
The key hyperparameters for the encryption algorithms used in this study are outlined in Table 7. Each algorithm has specific configurations, such as key sizes, operation modes, padding schemes, and nonce values, that affect both the security and efficiency of data encryption. By specifying these hyperparameters, we ensure that the encryption methods are applied consistently and can be reproduced in future studies or implementations.
Dataset splitting
The datasets used in the study were divided into training and testing sets using the traditional 80/20 ratio. 80% of the data was used for training, while 20% was allocated for testing, with the aim of measuring the model’s performance. Table 8 shows the division of the MQTTEEB dataset, where 80% of the data is allocated for training and 20% for testing, with the data distributed according to the type of attack. The table includes various types of attacks, such as legitimate attacks, DOS attacks, DDoS attacks, etc., along with the number of actions for each type, as well as the amount of data allocated for training and testing.
Table 9 shows the breakdown of the CIC IoT 2023 dataset, which contains a wide range of attack types such as DDoS, Brute Force, Mirai, and others. This dataset is further divided into 80% for training and 20% for testing. The table details the number of moves for each attack type and the number of moves allocated to the training and testing groups.
The 80/20 train-test split was adopted as a standard partition ratio widely validated in the intrusion detection literature38. Stratified splitting was applied to preserve the attack-type class distribution in both partitions, as shown in Tables 8 and 9, ensuring that minority attack classes remain represented in the test set and preventing optimistic bias in accuracy estimation.
AI models
This study reviews a range of AI models used to analyze data and classify attacks in IoT environments. RF, XGBoost, CatBoost, and ensemble voting classifier models, along with an ensemble bagging classifier, were employed to enhance prediction accuracy.
Random forest
A random forest is an ensemble learning method that combines multiple decision trees to create a more accurate and stable model for tasks like classification and regression39. It works by training each decision tree on a random subset of the data, and each tree makes its own prediction. The final prediction is then determined by the majority vote for the classification of all the individual tree predictions.
XgBoost classifier
XGBoost, or eXtreme Gradient Boosting, is an open-source machine learning library for creating efficient gradient-boosted decision trees for classification and regression40. It is known for its speed, scalability, and accuracy, and it improves upon standard gradient boosting by using parallel processing, regularization, and optimized tree-building methods.
CatBoost classifier
CatBoost, short for categorical boosting, is an open-source gradient boosting algorithm developed by Yandex41. It is designed to handle a wide range of machine learning tasks, including classification, regression, and ranking, with a particular strength in dealing with categorical features.
Ensemble voting classifier
Ensemble voting is an ML technique where multiple models are trained on the same data, and their individual predictions are combined to make a final prediction42. This method leverages the “wisdom of crowds” to improve accuracy by averaging out errors from individual models and is used for both classification and regression problems43. The final prediction is determined by a voting scheme, with the most common being hard voting and soft voting. The voting classifier model was used as an ensemble learning tool to improve classification accuracy by combining the three previous ML models. This model is based on the idea of combining several separate models to classify data with soft voting.
Ensemble bagging classifier
Bagging or bootstrap aggregating is an ensemble learning technique that improves ML model stability and accuracy by training multiple models on different subsets of the training data44,45. It works by creating these subsets through bootstrap sampling, training a model on each subset, and then combining their predictions through majority voting for classification to reduce variance and prevent overfitting. In these experiments, apply RF as a base classifier, then pass it to the bagging classifier.
ML models hyperparameters
The hyperparameters used for the ML models evaluated and fine-tuned with specific settings to optimize performance. These hyperparameters, such as the number of estimators, learning rate, depth, and sample size, play a crucial role in controlling model complexity, improving accuracy, and ensuring robust performance across different tasks. By detailing these hyperparameters, we provide a transparent view of the model configuration and ensure reproducibility of the study as shown in Table 10.
Sending data via Cloud-MQTT
MQTT was used as a means of transferring data between devices and cloud servers46. MQTT is a lightweight and efficient protocol used in IoT systems, enabling easy message sending and receiving between interconnected devices on a network47,48. MQTT was used to transfer encrypted data between two or more devices, ensuring data security during transmission across networks.
The connection was established using the paho.mqtt.client library, which is compatible with the MQTT protocol and supports client-broker connections. In this experiment, test.mosquitto.org was used as the MQTT server with port 1883, the default port for MQTT protocols. Publish encrypted data via MQTT using the client.publish() function to the specified subject IoT data. Figure 4 shows the flow chart diagram for sending IoT data, encryption process, receiving, and predicting in the Cloud-MQTT.
The flow chart diagram of the proposed method for sending encrypted IoT data, connecting to the MQTT cloud, receiving and decrypting, AI predicting, and analyzing the network traffic.
The MQTT transmission was configured with the following parameters: QoS level 0 (at-most-once delivery) was used to reflect the low-latency, best-effort delivery characteristic of typical IoT sensor data. The payload for each MQTT message consisted of a single encrypted row serialized as a byte string. The broker test.mosquitto.org was used for simulation purposes; it is a publicly available test broker that does not guarantee message persistence, which is appropriate for experimental evaluation but should be replaced with a secured private broker in production deployments.
When integrated with the hybrid AES-RSA scheme, MQTT communication gains three security properties: (1) payload confidentiality through AES-256-CBC encryption, (2) key integrity through RSA-2048-OAEP protection of the AES session key, and (3) forward secrecy through ephemeral AES key generation per session.
Evaluation
The evaluation of the proposed AI models aims to measure their performance accuracy using a set of indicators and metrics, as in our previous work49. These metrics contribute to a proper understanding of pattern recognition.
Additionally, the study evaluated encryption algorithms used to ensure data security during transmission over cloud networks, particularly in IoT environments, focusing on measuring memory consumption and CPU usage.
Performance metrics
The performance of the AI models is measured by the common methods, such as the confusion matrix, accuracy, recall, precision, and F1-score.
The confusion matrix is a table showing correct and incorrect predictions, broken down into True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
Accuracy: The proportion of all correct predictions, as it is determined by the following formula
Recall: Measures actual positive cases correctly identified, which is important when false negatives are costly.
Precision: Measures correct positive predictions, which is important when false positives are costly.
F1-Score: The harmonic mean of precision and recall, useful for imbalanced datasets.
Encryption algorithms metrics for resource consumption
A set of encryption algorithms was implemented to secure data before it was sent to the cloud. The goal of using these algorithms was to ensure data security during transmission across the network, especially in IoT environments that may have limited resources. However, the importance of these algorithms extends beyond data security; resource consumption during encryption must also be measured.
The primary metrics used to assess resource consumption are memory and CPU usage, which were measured using built-in code tools. The memory_usage() library was used to measure memory consumption before and after encryption. This measurement allows for determining the algorithm’s impact on memory usage. The difference between the maximum and minimum memory consumption was calculated. For measuring CPU usage, initial CPU usage is determined before encryption is executed using psutil.cpu_percent(), and then the encryption is performed using the proposed algorithms that are viewed in Sect. 3.3. Finally, the final CPU usage is measured to calculate the difference between the usage before and after the process50.
-
Overall Resource Consumption Score (ORCS).
Memory and CPU usage are measured in different units, like MiB for memory and percentage for CPU. So, it needs to make these values on the same scale51. This makes sure that both parameters have the same effect on the total resource use. Convert the usage into a scale from 0 to 1, with 0 representing the lowest usage and 1 representing the highest, by using the following formula:
Upon normalizing both memory and CPU utilization, combine these two metrics to formulate a singular ORCS by computing it as a weighted average using the specified method52,53.
To ensure measurement reliability and reproducibility, resource consumption was recorded as follows: (1) Prior to each encryption run, a garbage collection call (gc.collect()) was invoked to clear residual memory allocations. (2) CPU percentage was sampled using psutil.cpu_percent() with an interval of 0.1 s before and immediately after the encryption call, with the CPU delta computed as the difference. (3) Memory was measured using the memory_profiler library’s memory_usage() function with a sampling interval of 0.01 s, capturing peak memory during the encryption process. (4) Each measurement was repeated three times per algorithm per dataset, and the mean value was reported. (5) All experiments were conducted on the hardware described in Sect. 3.7 with no concurrent processes running to minimize measurement noise.
The McNemar’s statistical significance testing
To assess whether the accuracy differences between classifiers are statistically significant rather than attributable to random variation in test set partitioning, McNemar’s test was applied. McNemar’s test is a non-parametric statistical test designed specifically for comparing two classifiers on the same test set, making it the standard method for paired classifier comparison in machine learning evaluation54.
For two classifiers A and B evaluated on the same n test instances, a 2 × 2 contingency table is constructed.
A Correct | B Correct | B Wrong |
|---|---|---|
a | b | |
A Wrong | c | d |
Where, a is the instances correctly classified by both A and B, b is the instances correctly classified by A but not B, c is the instances correctly classified by B but not A, and d is instances misclassified by both A and B.
Only the discordant cells b and c are informative for the test. The null hypothesis H₀ states that both classifiers have the same error rate, i.e., b = c. The McNemar test statistic with continuity correction (Yates’ correction) is computed as:
UnderH₀,thisstatisticfollowsachi-squareddistributionwithonedegreeoffreedom(χ²₁).Thep-valueisobtainedas:
where F is the cumulative distribution function of the chi-squared distribution. The null hypothesis is rejected at significance level α = 0.05, when p < 0.05, indicating that the observed accuracy difference between the two classifiers is statistically significant. In this study, the voting ensemble classifier is designated as the reference classifier and compared against all other classifiers on each dataset separately.
Results and discussion
This section presents the evaluation of the proposed ML models’ performance and cryptographic algorithms applied to the MQTTEEB and CIC IoT 2023 datasets. The study utilized a combination of AI models to detect and classify various attack types in IoT networks. Additionally, the impact of different encryption algorithms on data security and resource usage was assessed to understand how they influence the overall performance of IoT systems. The results indicate that the combination of ML with encryption algorithms contributes to improving both the detection accuracy and the security of IoT systems. This section also evaluates the computational efficiency of various encryption algorithms, such as XOR, ChaCha20, AES, and the hybrid AES-RSA, in terms of memory consumption and CPU usage, shedding light on their suitability for resource-constrained IoT environments.
The results of AI models and cryptographic analysis on the MQTTEEB dataset
Analysis of memory and CPU usage for cryptographic Algorithms
The results of evaluating the memory and CPU usage for different encryption algorithms are presented and tested using the testing set with 44,514 rows and 10 columns, as shown in Table 11. The resource consumption was measured based on memory usage, memory usage per traffic (per row), CPU usage, and the combined metric for the overall resource consumption score.
The memory consumption disparity between XOR with 1.97 MiB and AES with 23.80 MiB on MQTTEEB-D, a factor of approximately 12×, can be attributed to fundamental differences in algorithmic architecture. XOR operates as a stateless bitwise operation with O (1) memory complexity per byte, requiring no internal state maintenance beyond the key. AES-CBC, by contrast, maintains an expanding internal state across 128-bit block boundaries, generates and stores initialization vectors (IVs) per block, and applies four rounds of SubBytes, ShiftRows, MixColumns, and AddRoundKey transformations — each requiring intermediate buffer allocation. This explains why AES memory scales super-linearly with dataset size, as observed when comparing MQTTEEB-D with 23.80 MiB for 44,514 rows and to CIC IoT 2023 with 857.23 MiB for 605,839 rows, yielding memory-per-row values of 0.547 KB and 1.441 KB respectively, a 2.6× increase in per-row cost as dataset size grows, suggesting non-trivial memory fragmentation overhead at scale.
ML performance for training and testing on the cloud
The performance of five different ML models, including RF, XGBoost, CatBoost, voting ensemble, and bagging classifier, was evaluated using the MQTTEEB dataset after sending the encrypted IoT traffic to the cloud, as shown in Table 12.
The superiority of the voting ensemble (92.68%) over individual classifiers RF (92.40%), XGBoost, and CatBoost on MQTTEEB-D is consistent with ensemble learning theory: by combining the probability outputs of three diverse base learners through soft voting, the ensemble averages out individual classifier errors, particularly on borderline instances near decision boundaries. The marginal accuracy gap between the voting ensemble and Bagging Classifier (92.68% vs. 92.44%) suggests that the diversity introduced by combining heterogeneous classifiers (RF + XGBoost + CatBoost) provides only a modest advantage over homogeneous bagging (multiple RF instances) on this dataset, likely because MQTTEEB-D’s relatively small feature space (10 selected features) limits the diversity potential of heterogeneous ensembles.
Examining the confusion matrices in Fig. 5, the dominant true positive class (21,049 for voting ensemble) corresponds to the largest class in the dataset (legitimate traffic), confirming that high overall accuracy is partially driven by class imbalance. The false negative patterns, instances of attacks misclassified as legitimate, are more diagnostically informative: the confusion matrix shows that SlowITe and Malformed data injection attacks account for the majority of misclassifications, which is attributable to their behavioral similarity to legitimate traffic patterns in terms of TCP timing and flag distributions. This finding suggests that future work should focus feature engineering specifically on temporal inter-packet gap features to better discriminate slow-rate attacks.
The confusion matrix of the ML models’ prediction on the MQTTEEB dataset: (a) Bagging classifier model, and (b) Voting ensemble classifier.
The results of AI models and cryptographic analysis on the CIC IoT 2023 dataset
Analysis of memory and CPU usage for cryptographic Algorithms
The second experiment uses the CIC IoT 2023 dataset, which has 605,839 rows and 30 columns, as a test set to see how well the four proposed encryption algorithms work. As observed in Table 13, the XOR-based light IoT encrypt algorithm is the least demanding in terms of memory usage, consuming only 89.46 MiB of memory for the entire dataset, which translates to 0.150 KB per row and showing minimal CPU usage (4.49%). However, its security is weaker compared to the other algorithms. ChaCha20 offers better security but requires more memory (413.51 MiB, 0.696 KB per row) and has a slightly reduced CPU usage (−2.09%). AES consumes the most memory (857.23 MiB, 1.441 KB per row), making it less suitable for resource-constrained environments. The hybrid AES-RSA scheme’s relatively low memory footprint (5.53 MiB on MQTTEEB-D, 221.21 MiB on CIC IoT 2023) despite its cryptographic complexity is explained by its architectural design: RSA operations are applied only to the AES session key (a fixed 32-byte object), not to the data payload itself. Therefore, RSA’s computationally expensive modular exponentiation is performed once per session rather than per row, and its memory cost is amortized across the entire dataset. This finding has a direct practical implication: hybrid AES-RSA is more memory-scalable than standalone AES for large IoT traffic volumes, contrary to the intuitive assumption that adding asymmetric encryption always increases overhead.
When comparing these algorithms based on the ORCS, which normalizes both memory and CPU usage into a single metric, the hybrid AES-RSA algorithm performs well with a score of 0.5425. This suggests it provides a reasonable compromise between security and resource efficiency, making it suitable for more robust IoT systems that can handle higher resource consumption.
ML performance for training and testing on the cloud
The performance of the proposed ML models on the CIC IoT 2023 dataset is presented in Table 14. Among the models, XGBoost achieves the highest accuracy with 80.94% and recall of 80.94%, followed closely by the voting ensemble model, which scores 81.09% in accuracy and recall. The voting ensemble model also performs well in precision of 81.76% and F1-score of 80.76%. The RF shows slightly lower results, with accuracy at 80.35%, precision at 80.92%, and an F1-score of 80.04%. CatBoost achieves the lowest performance, with accuracy at 80.39%, precision at 81.09%, and an F1-score of 80.01%, but it still performs competitively. Overall, the results suggest that XGBoost and the voting ensemble model provide the best balance of accuracy, recall, and precision.
The accuracy drops from 92.68% on MQTTEEB-D to 81.09% on CIC IoT 2023 across all classifiers is analytically significant and attributable to three compounding factors rather than model weakness: First, dataset complexity: CIC IoT 2023 contains 33 distinct attack types across 7 categories, compared to 5 attack types in MQTTEEB-D. The larger class space increases inter-class confusion probability, particularly among structurally similar attack families such as DDoS subtypes and DoS variants. Second, encrypted feature degradation: the CIC IoT 2023 dataset uses 30 selected features (vs. 10 in MQTTEEB-D), meaning more features underwent encryption transformation. The byte-to-float re-encoding of encrypted values introduces information loss that is proportionally larger when more features are encrypted, as correlation structures between features are disrupted by independent encryption of each column. Third, class imbalance severity: as shown in Table 8, the CIC IoT 2023 subset contains highly imbalanced attack distributions, Mirai botnet traffic constitutes a disproportionate share of records, causing classifiers to optimize for majority classes at the expense of rare attack detection. This is reflected in the gap between accuracy (81.09%) and F1-score (80.76%) for the voting ensemble, indicating that minority class recall is lower than overall accuracy suggests. Crucially, despite these challenges, the fact that all classifiers maintain accuracy above 80% on encrypted traffic from 605,839 test records demonstrates that the proposed framework’s encryption layer does not catastrophically degrade ML detection capability, a key validation of the framework’s practical viability.
The confusion matrices shown in Fig. 6 display the performance of the best two ensemble classifiers, bagging and voting models, on the CIC IoT 2023 dataset. Both models were assessed based on their predictions across various attack categories, including DDoS, DoS, reconnaissance, spoofing, Mirai, and web-based attacks.
The confusion matrix of the ML models’ prediction on the CIC IoT 2023 dataset: (a) Bagging classifier model, and (b) Voting ensemble classifier.
A cross-dataset comparison reveals a consistent pattern: the relative ranking of classifiers is preserved across both datasets (voting ensemble consistently ranks first or second), which confirms that the performance differences are driven by dataset characteristics rather than model instability. This rank consistency across structurally different datasets, one with 13 features and 222 K records, another with 46 features and 46 M records provides empirical evidence that the proposed framework generalizes beyond a single benchmark, directly addressing a common limitation in prior IoT security literature where single-dataset evaluation limits generalizability claims.
Memory and CPU consumption discussion
The results suggest that for IoT environments, which typically have strict limitations on both memory and processing power, XOR is the most suitable algorithm due to its minimal resource consumption. However, when higher security is required, ChaCha20 provides a good balance, offering better security than XOR without a significant increase in resource consumption. On the other hand, AES and Hybrid AES + RSA are best suited for more powerful systems, where both memory and CPU resources are more plentiful as shown in Fig. 7. The memory consumption per row and the ORCS metrics clearly show that XOR is the most resource-efficient, making it ideal for systems with limited capabilities. ChaCha20, Hybrid AES, and RSA offer a good balance between security and resource consumption, making them suitable for systems that require high security with reasonable resource efficiency.
These algorithms can be used in scenarios where security is the top priority, and the trade-offs in resource usage are acceptable. However, in IoT devices, these algorithms may be suitable for embedding and encrypting transmitted data row by row, as the size of each data traffic is proportional to the resources of the IoT devices. Alternatively, the data can be transmitted in batches, or processing can be offloaded to more powerful devices, such as in edge computing.
Comparison of encryption algorithms’ performance on IoT and cloud data, displaying memory usage, memory per traffic, and ORCS for the encrypted algorithms.
From a practical deployment perspective, the results suggest a tiered recommendation for IoT system designers: for ultra-constrained devices (memory < 100KB, e.g., Arduino Uno class), XOR encryption combined with the RF classifier provides the minimum viable security configuration with negligible resource overhead. For mid-range devices (memory 256KB–1 MB, e.g., ESP32 class), ChaCha20 with the voting ensemble provides significantly stronger cryptographic security at a 2× memory cost over XORs, with equivalent detection accuracy. For gateway-level devices or edge nodes with memory more than 4 MB, hybrid AES-RSA with the voting ensemble represents the optimal configuration, providing asymmetric key protection, forward secrecy, and the highest detection accuracy, at an ORCS of 0.56 which remains within acceptable bounds for gateway-class hardware.
Regarding resilience against common IoT attacks: at the cryptographic layer, AES-CBC random IV generation prevents replay attacks since identical plaintexts produce distinct ciphertexts across sessions, and RSA-OAEP padding prevents chosen-ciphertext attacks. At the ML detection layer, the ensemble classifiers were trained on datasets explicitly containing flooding (DDoS, MQTT publish flooding), spoofing (DNS and ARP spoofing in CIC IoT 2023), and brute force attacks, achieving detection accuracies of 92.68% and 81.09% respectively. Formal adversarial testing against man-in-the-middle attacks at the network layer represents an important direction for future work.
Scalability analysis
The two experimental datasets differ in test set size by a factor of 13.6 × (44,514 vs. 605,839 rows), enabling a quantitative scalability analysis of the encryption algorithms as shown Table 15. The scaling factor for ChaCha20 (7.12×) is notably higher than the dataset size ratio (13.6×), suggesting that ChaCha20’s nonce management overhead scales non-linearly with record count, an important finding for large-scale IoT deployments not previously reported in the literature. Hybrid AES-RSA exhibits the most favorable scaling behavior among security-grade algorithms (2.97× memory growth for 13.6× data growth), further supporting its selection as the recommended algorithm for production IoT systems.
For deployments involving thousands of concurrent IoT devices, the hybrid AES-RSA architecture scales horizontally: each device maintains its own ephemeral AES session key encrypted with the cloud’s RSA public key, meaning RSA operations execute exactly once per device session regardless of data volume. The per-device overhead therefore remains constant, and cloud-side decryption scales linearly with active sessions rather than traffic volume. Key management in production would require a lightweight Public Key Infrastructure (PKI) appropriate for IoT environments, which adds operational complexity beyond this study’s scope.
Statistical significance analysis
MQTTEEB-D dataset (n = 44,514)
McNemar’s test results on MQTTEEB-D reveal a nuanced pattern of statistical significance as shown in Table 16. The voting ensemble (VE) significantly outperforms XGBoost (χ²=5.9550, p = 0.0147) and CatBoost (χ²=40.4419, p < 0.0001) at the α = 0.05 level, confirming that these accuracy improvements are not due to chance. However, the differences against RF (χ²=2.3153, p = 0.1281) and Bagging (χ²=1.6965, p = 0.1927) do not reach statistical significance, indicating that the voting ensemble, RF, and Bagging classifier perform at statistically equivalent levels on this dataset. This finding refines the interpretation of the accuracy rankings in Table 12: while the voting ensemble achieves the highest numerical accuracy (92.68%), its practical advantage is statistically meaningful only over XGBoost and CatBoost. The equivalence with RF and Bagging suggests that on the relatively compact MQTTEEB-D feature space (10 features), ensemble diversity between heterogeneous classifiers is insufficient to produce a statistically separable advantage over simpler ensemble methods.
CIC IoT 2023 dataset (n = 605,839)
On the CIC IoT 2023 dataset, Table 17 shows the McNemar’s test reveals that the voting ensemble significantly outperforms RF (χ²=85.9905, p < 0.0001) and CatBoost (χ²=77.0351, p < 0.0001), both with very large effect sizes driven by the large test set (n = 605,839). The comparison against XGBoost, however, does not reach significance (χ²=3.5840, p = 0.0583), marginally exceeding the α = 0.05 threshold. This result indicates that on complex multi-class encrypted IoT traffic, XGBoost and the voting ensemble are statistically equivalent in detection performance, and the voting ensemble’s marginal numerical advantage (81.09% vs. 80.94%, Δ = 0.15%) does not constitute a practically meaningful difference. The large χ² values for RF and CatBoost comparisons (85.99 and 77.04 respectively) reflect the statistical power afforded by the large test set: even modest accuracy differences of 0.7% become highly significant when evaluated over 600 K+ instances. This should be interpreted as evidence of consistent directional advantage rather than large practical effect.
Comparison of the proposed work with related previous studies
This section provides a comparison between the proposed framework and existing studies on IoT security, focusing on cryptographic algorithms and AI models used for threat detection as shown in Table 18. In comparison, the proposed framework integrates various cryptographic algorithms with a voting ensemble AI model. This approach achieved 92.68% accuracy and 0.126 KB memory usage per traffic, showcasing a good balance between security and resource efficiency, particularly suitable for IoT environments. The results suggest that the proposed work provides a highly efficient and effective solution for securing IoT networks while maintaining low resource consumption.
Limitations and future work
Although the study achieved good results in both encryption and machine learning training, there are some limitations that consider for the research. One of the main limitations is the reliance on pre-existing IoT datasets, which may not fully reflect the real-world environment of edge devices. The account for the energy consumption of the encryption algorithms and models is calculated based on data after collected from IoT sensors, not deployed on IoT devices directly. In the future, we aim to integrate encryption algorithms directly into IoT devices and transfer the encrypted data to the cloud for training, which will enhance the system’s ability to handle live data and provide better performance in real-world applications.
Additionally, we plan to implement higher-level authentication algorithms in the cloud, such as consensus mechanisms55, to improve security and transparency in data handling. We also intend to explore the use of federated learning in cloud environments56, allowing for training on distributed data without the need to transfer sensitive data from IoT devices, thereby enhancing security and improving efficiency. Moreover, this study does not evaluate network-layer metrics such as latency, throughput, or packet loss under concurrent multi-device conditions, as the focus is on resource usage and detection accuracy. Resilience to man-in-the-middle attacks and large-scale key management are left for future work.
Conclusion
This study addressed a clearly identified gap in IoT security research, the lack of a unified framework that simultaneously integrates encryption, secure cloud transmission, and ML-based intrusion detection under realistic resource constraints. The proposed framework successfully bridged this gap by evaluating four encryption algorithms via the novel ORCS metric, simulating real-world data transmission through MQTT, and training ensemble ML classifiers on encrypted IoT traffic across two large-scale real-world datasets. The hybrid AES–RSA scheme demonstrated the best balance between security and resource efficiency, achieving an ORCS of 0.56 and memory usage of only 0.126 KB per traffic on MQTTEEB-D, while the voting ensemble classifier attained the highest intrusion detection accuracy of 92.68% on MQTTEEB-D and 81.09% on CIC IoT 2023. These results confirm that combining hybrid cryptography with ensemble ML models provides an effective and efficient security solution for cloud-based IoT systems. The primary limitation of this study is the reliance on pre-collected datasets rather than live IoT hardware deployment, meaning that on-device energy consumption was not directly measured. Future work will focus on embedding the encryption algorithms directly into IoT hardware to validate real-world energy metrics, exploring federated learning to enable privacy-preserving distributed model training, integrating cloud-side consensus authentication mechanisms, and applying class-imbalance mitigation techniques to further reduce false positive rates in intrusion detection.
Data availability
The database is publicly accessible at the following link: [https://data.mendeley.com/datasets/jfttfjn6tr/1] and[https://www.unb.ca/cic/datasets/iotdataset-2023.html].
References
Zeadally, S., Das, A. K. & Sklavos, N. Cryptographic technologies and protocol standards for Internet of Things. Internet of Things 14, 100075. https://doi.org/10.1016/j.iot.2019.100075 (2021).
Nazir, A. et al. Collaborative threat intelligence: Enhancing IoT security through blockchain and machine learning integration. Journal of King Saud University - Computer and Information Sciences 36(2), 101939. https://doi.org/10.1016/j.jksuci.2024.101939 (2024).
Nadji, B. Data Security, Integrity, and Protection. In Data, Security, and Trust in Smart Cities 59–83 (2024). https://doi.org/10.1007/978-3-031-61117-9_4.
Tawfik, M., Al-Zidi, N. M., Alsellami, B., Al-Hejri, A. M. & Nimbhore, S. Internet of Things-Based Middleware Against Cyber-Attacks on Smart Homes using Software-Defined Networking and Deep Learning, in 2021 2nd International Conference on Computational Methods in Science & Technology (ICCMST), IEEE,2021,7–13. https://doi.org/10.1109/ICCMST54943.2021.00014
Mousavi, S. K., Ghaffari, A., Besharat, S. & Afshari, H. Security of internet of things based on cryptographic algorithms: A survey. Wireless Networks 27(2), 1515–1555. https://doi.org/10.1007/s11276-020-02535-5 (2021).
Pandey, S. & Bhushan, B. Recent lightweight cryptography (LWC) based security advances for resource-constrained IoT networks. Wirel. Networks 30(4), 2987–3026. https://doi.org/10.1007/s11276-024-03714-4 (2024).
Thakor, V. A., Razzaque, M. A. & Khandaker, M. R. A. Lightweight cryptography algorithms for resource-constrained IoT devices: A review, comparison and research opportunities. IEEE Access 9, 28177–28193. https://doi.org/10.1109/ACCESS.2021.3052867 (2021).
Islam, M. S. et al. Blockchain-enabled cybersecurity provision for scalable heterogeneous network: A comprehensive survey. Computer Modeling in Engineering & Sciences 138(1), 43–123. https://doi.org/10.32604/cmes.2023.028687 (2024).
Cheikh, I., Roy, S., Sabir, E. & Aouami, R. Energy, scalability, data and security in massive IoT: Current landscape and future directions. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2026.3655796 (2026).
Cherbal, S., Zier, A., Hebal, S., Louail, L. & Annane, B. Security in internet of things: A review on approaches based on blockchain, machine learning, cryptography, and quantum computing. J. Supercomput. 80(3), 3738–3816. https://doi.org/10.1007/s11227-023-05616-2 (2024).
Singh, N., Buyya, R. & Kim, H. Securing cloud-based Internet of Things: Challenges and mitigations. Sensors 25(1), 79. https://doi.org/10.3390/s25010079 (2024).
Aqachtoul, M., et al. “MQTTEEB-D: A real-world iot cybersecurity dataset for AI-# in MQTT networks,” Mendeley Data. V1, https://doi.org/10.17632/jfttfjn6tr.1 (2025).
Neto, E. C. et al. CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors 23(13), 5941. https://doi.org/10.3390/s23135941 (2023).
Nazir, A. et al. Advancing IoT security: A systematic review of machine learning approaches for the detection of IoT botnets, J. King Saud Univ. - Comput. Inf. Sci.,35,10,101820,2023. https://doi.org/10.1016/j.jksuci.2023.101820
Alnafesh, K. H., Al-Hejri, A. M. & Jagtap, D. S. Exploring the Potential of Ensemble and XAI for Effective IoT Multiclass Malware Detection, in 4th International Conference on Automation, Computing and Renewable Systems (ICACRS), IEEE,2025,149–155., IEEE,2025,149–155. (2025). https://doi.org/10.1109/ICACRS67045.2025.11324159
Alluhaidan, A. S. D. & andPrabu End-to-End Encryption in Resource-Constrained IoT Device. IEEE Access. 11, 70040–70051. https://doi.org/10.1109/ACCESS.2023.3292829 (2023).
Al-Hejri, A. M., Al-Zidi, N. M., Tawfik, M., Albakhrani, A. & Sable, A. H. A Facilitation System for Arabic Foreigners in India Using the Web and Android System, in 8th International Conference on Advanced Computing and Communication Systems (ICACCS), IEEE,2022,239–245., IEEE,2022,239–245. (2022). https://doi.org/10.1109/ICACCS54159.2022.9785022
Preethi, R. et al. Balancing Security and Energy Efficiency for IoT Devices Using Hybrid Cryptographic Schemes Merging Traditional and Post-Quantum Cryptography, in IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), IEEE,2025,372–377., IEEE,2025,372–377. (2025). https://doi.org/10.1109/I2CACIS65476.2025.11101274
Corthis, B., Ramesh, G., García-Torres, M. & Ruíz, R. Effective identification and authentication of healthcare IoT using fog computing with hybrid cryptographic algorithm. Symmetry 16(6), 726. https://doi.org/10.3390/sym16060726 (2024).
Karimunda, K. et al. Machine Learning-Assisted Cryptographic Security: A Novel ECC-ANN Framework for MQTT-Based IoT Device Communication. Computation ,13,10,227,2025, https://doi.org/10.3390/computation13100227
Nagarajan,Alagarsundaram, H., Sitaraman, S. R., Gattupalli, K., Bhavana Harish, V. S. & Gollavilli and B. J, AI-Driven Cryptographic Framework for Smart Industrial Cloud Security, in 8th International Conference on Computing Methodologies and Communication (ICCMC), IEEE,2025,1823–1829., IEEE,2025,1823–1829. (2025). https://doi.org/10.1109/ICCMC65190.2025.11140686
Duc Manh, B. et al. Privacy-preserving cyberattack detection in blockchain-based IoT systems using AI and homomorphic encryption. IEEE Internet Things J. 12(11), 16478–16492. https://doi.org/10.1109/JIOT.2025.3535792 (2025).
Ingle, D. & Ingle, D. An enhanced blockchain based security and attack detection using transformer in IOT-Cloud network. J. Adv. Res. Appl. Sci. Eng. Technol. 31(2), 142–156. https://doi.org/10.37934/araset.31.2.142156 (2023).
Aljrees, T., Kumar, A., Singh, K. U. & Singh, T. Enhancing IoT security through a green and sustainable federated learning platform: Leveraging efficient encryption and the quondam signature algorithm. Sensors 23(19), 8090. https://doi.org/10.3390/s23198090 (2023).
Selvarajan, S. et al. An artificial intelligence lightweight blockchain security model for security and privacy in IIoT systems. J. Cloud Comput. 12(1), 38. https://doi.org/10.1186/s13677-023-00412-y (2023).
Elkhodr, M. An AI-driven framework for integrated security and privacy in Internet of Things using quantum-resistant blockchain. Future Internet 17(6), 246. https://doi.org/10.3390/fi17060246 (2025).
Jarin, M., Mishu, M. H., Dipu, A. J. M. R. H. & Mostafizur Rahaman, A. S. M. A Lightweight Solution to Intrusion Detection and Non-intrusive Data Encryption, 235–247. (2023). https://doi.org/10.1007/978-981-99-5881-8_19
Yuvarani, R. & Mahaveerakannan, R. Enhanching Data Secure IoT Based Cloud Architecture for Banking Transactions Systems with Expertise in Cryptographic Algorithms, in 6th International Conference on Data Intelligence and Cognitive Informatics (ICDICI), IEEE,2025,413–421., IEEE,2025,413–421. (2025). https://doi.org/10.1109/ICDICI66477.2025.11134903
KASHYAP, N., SINHA, S. & KANSAL, V. Security of IoT Device and its Data Transmission on AWS Cloud by Using Hybrid Cryptosystem of ECC and AES. Int. J. Comput. Exp. Sci. Eng. 11, 1. https://doi.org/10.22399/ijcesen.838 (2025).
Prasad, K. S. et al. Augmenting cybersecurity through attention based stacked autoencoder with optimization algorithm for detection and mitigation of attacks on IoT assisted networks. Sci. Rep. 14(1), 30833. https://doi.org/10.1038/s41598-024-81162-y (2024).
García, S., Luengo, J. & Herrera, F. Data Preprocessing in Data Mining,72. In Intelligent Systems Reference Library,72 (Springer International Publishing, 2015). https://doi.org/10.1007/978-3-319-10247-4.
Alduailij, M. et al. Machine-learning-based DDoS attack detection using mutual information and random forest feature importance method. Symmetry 14(6), 1095. https://doi.org/10.3390/sym14061095 (2022).
A New Approach to Key Generation. Using 3D Chaotic Systems in Hybrid Lightweight IoT Based Cloud Security Algorithms. Int. J. Innov. Res. Technol. 12 https://doi.org/10.64643/IJIRTV12I8-191503-459 (2026).
Singh, S., Sharma, K., Moon, S. Y. & Park, J. H. Advanced lightweight encryption algorithms for IoT devices: Survey, challenges and solutions. J. Ambient Intell. Humaniz. Comput. 15(2), 1625–1642. https://doi.org/10.1007/s12652-017-0494-4 (2024).
Upadhyay,Sudhakar, N. K. & Kumar, V. Efficient Cryptographic Configurations and Lightweight Communication Protocols for Secure Smart Home Systems, in 3rd International Conference on Communication, Security, and Artificial Intelligence (ICCSAI), IEEE,2025,657–662., IEEE,2025,657–662. (2025). https://doi.org/10.1109/ICCSAI64074.2025.11063770
Nechvatal, J. et al. Report on the development of the Advanced Encryption Standard (AES). J. Res. Natl. Inst. Stand. Technol. 106(3), 511. https://doi.org/10.6028/jres.106.023 (2001).
Qasem, R. M. A., Thorat, S. B. & Motiram, B. M. A New Approach to Key Generation Using 3D Chaotic Systems in Hybrid Lightweight IoT Based Cloud Security Algorithms. Int. J. Innov. Res. Technol. 12 https://doi.org/10.64643/IJIRTV12I8-191503-459 (2026).
Kamal, H. & Mashaly, M. Advanced hybrid transformer-CNN deep learning model for effective intrusion detection systems with class imbalance mitigation using resampling techniques. Future Internet 16(12), 481. https://doi.org/10.3390/fi16120481 (2024).
Biau, G. & Scornet, E. A random forest guided tour. TEST 25(2), 197–227. https://doi.org/10.1007/s11749-016-0481-7 (2016).
Chen, T. & Guestrin, C. XGBoost, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM,2016,785–794. https://doi.org/10.1145/2939672.2939785
Prokhorenkova, A. Liudmila and Gusev, Gleb and Vorobev, Aleksandr and Dorogush, Anna Veronika and Gulin, CatBoost: unbiased boosting with categorical features. Adv Neural Inf. Process. Syst, 31, (2018).
Alsulami, A. G. et al. Predicting tourism growth in Saudi Arabia with machine learning models for vision 2030 perspective, Sci. Rep.,16,1,2556,2026. https://doi.org/10.1038/s41598-025-32509-6
Dietterich, T. G. Ensemble Methods in Machine Learning, 1–15. (2000). https://doi.org/10.1007/3-540-45014-9_1
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. & Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, Boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(4), 463–484. https://doi.org/10.1109/TSMCC.2011.2161285 (2012).
Nazir, A. et al. Empirical evaluation of ensemble learning and hybrid CNN-LSTM for IoT threat detection on heterogeneous datasets. J. Supercomput. 81(6), 775. https://doi.org/10.1007/s11227-025-07255-1 (2025).
Singh, M., Rajan, M. A., Shivraj, V. L. & andBalamuralidhar Secure MQTT for Internet of Things (IoT), in Fifth International Conference on Communication Systems and Network Technologies, IEEE,2015,746–751., IEEE,2015,746–751. (2015). https://doi.org/10.1109/CSNT.2015.16
Kaganurmath, S., Cholli, N. G. & Anala, M. R. Integrating MQTT protocol to improve the performance using dynamic lightweight key sharing protocol for secure IoT communications, Cluster Comput.,28,16,1068,2025. https://doi.org/10.1007/s10586-025-05689-z
Al-Zidi, N. M. et al. Smart System for Real-Time Remote Patient Monitoring Based on Internet of Things, in 2021 2nd International Conference on Computational Methods in Science & Technology (ICCMST), IEEE,2021,1–6. https://doi.org/10.1109/ICCMST54943.2021.00013
Al-Maamari, M. R., Ramteke, R., Al-Hejri, A. M. & Alshamrani, S. S. Integrating CNN and transformer architectures for superior Arabic printed and handwriting characters classification. Sci. Rep. 15(1), 29936. https://doi.org/10.1038/s41598-025-12045-z (2025).
Ads, E. psutil documentation. Accessed:10, 2025. [Online]. Available: https://psutil.readthedocs.io/en/latest/
Pereira, G. C. C. F. et al. Performance evaluation of cryptographic algorithms over IoT platforms and operating systems. Secur. Commun. Networks 2017, 1–16. https://doi.org/10.1155/2017/2046735 (2017).
Maitra, S., Richards, D., Abdelgawad, A. & Yelamarthi, K. Performance Evaluation of IoT Encryption Algorithms: Memory, Timing, and Energy, in 2019 IEEE Sensors Applications Symposium (SAS), IEEE,2019,1–6. https://doi.org/10.1109/SAS.2019.8706017
Aslan, B., Yavuzer Aslan, F. & Sakallı, M. T. Energy consumption analysis of lightweight cryptographic algorithms that can be used in the security of Internet of Things applications. Secur. Commun. Networks 2020, 1–15. https://doi.org/10.1155/2020/8837671 (2020).
Kavzoglu, T. Object-Oriented Random Forest for High Resolution Land Cover Mapping Using Quickbird-2 Imagery. In Handbook of Neural Computation 607–619 (Elsevier, 2017). https://doi.org/10.1016/B978-0-12-811318-9.00033-8.
Yu, L., He, M., Liang, H., Xiong, L. & Liu, Y. A blockchain-based authentication and authorization scheme for distributed mobile cloud computing services. Sensors 23(3), 1264. https://doi.org/10.3390/s23031264 (2023).
Al-Hejri, A. M. et al. A hybrid explainable federated-based vision transformer framework for breast cancer prediction via risk factors. Sci. Rep. 15(1), 18453. https://doi.org/10.1038/s41598-025-96527-0 (2025).
Acknowledgements
The authors would like to acknowledge the Deanship of Graduate Studies and Scientific Research, Taif University for support this work.
Author information
Authors and Affiliations
Contributions
•Mohammed Ali Qasem: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Resources; Software; Validation; Visualization; project administration; Writing – original draft.•Bokare Madhav Motirame: Conceptualization; Methodology; Software; Data Curation; Formal Analysis; Writing—Original Draft; Writing—Review & Editing; Validation.Suryakant Thorat: Conceptualization; Methodology; Software; Data Curation; Formal Analysis; Writing—Original Draft; Writing—Review & Editing; Validation.•Aymen M. Al-Hejri: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Resources; Software; Validation; Visualization; writing—review & editing.• Sultan S. Alshamrani: Formal analysis, investigation, resources, validation, visualization, and Writing—Original Draft.• Kaled M Alshmrany: Formal analysis; investigation; resources; validation; visualization; Writing—Original Draft.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Qasem, M.A., Motiram, B.M., Thorat, S. et al. Enhancement of cryptography algorithms for security of cloud-based IoT with machine learning models. Sci Rep 16, 10972 (2026). https://doi.org/10.1038/s41598-026-45938-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-45938-8










