Abstract
The Internet of Medical Things (IoMT) refers to the number of interconnected medical objects and applications that are essential in contemporary healthcare. Nevertheless, its dependence on interrelated ecosystems subjects it to high risks of cybersecurity, including data breaches, privacy breaches, and malicious attacks that threaten patient safety and system integrity. This paper resolves these issues by presenting a new and enhanced intrusion detection model, BiGRU/RBWK, which is a combination of Bidirectional Gated Recurrent Units (BiGRU) with a Refined Black-winged Kite (RBWK) optimization algorithm. The RBWK algorithm improves on the BiGRU model by optimizing its hyperparameters, resulting in faster convergence and better classification performance. The model uses a powerful preprocessing pipeline, which includes Recursive Feature Elimination (RFE) that is used in conjunction with Support Vector Machines (SVM), which helps to curb the overfitting and lessen the computational cost. The experimental findings with two public datasets, namely WUSTL-EHMS-2020 and ECU-IoHT, indicate that BiGRU/RBWK model displays higher accuracy (95.6% and 93.8% respectively), precision, recall, F1-score, and AUC-ROC than all other currently available methods, including Random Forest, LSTM, CNN, Autoencoder, and CNN-LSTM hybrid models. The suggested framework has a high discrimination capability and can be easily used in real-time applications in the resource-constrained IoMT settings.
Introduction
Background
In recent years, the Internet of Things (IoT) transformed from a progressive but mostly ignored technology to a reality that surrounds us everywhere1. The smart homes and factories, wearables and smart sensors in transportation, etc., elevate us to a new level of digitalization2. But few sectors rely on the Internet of Things as much as healthcare3. This technology is responsible for the speed and convenience of operations, but also for human life in medical equipment. The totality of these devices and the systems of connected devices is referred to as the Internet of Medical Things (IoMT)4.
This technology is increasingly being used and affects the entire set of sensors, data processing software, and special infrastructure5. The Internet of Things in Medicine helps in modern healthcare to provide better care to patients, as well as optimize clinical processes and financial indicators6.
IoMT links medical devices and applications with information technology in healthcare7. It enables the communication and exchange of health data. This creates a world of interconnected medical devices supported by technology and working without any direct interaction between humans and humans or humans and computers8. This technology is one of the biggest enablers of transforming traditional health care models from such a reactive model into an ONETM truly innovative model that can orchestrate prevention, enhanced patient results, and reduced hospital readmissions9.
The use of two devices has boomed, especially in the past few years, due to the ability of the devices to deliver continuous and accurate health data10. Wearable devices, for instance, track a patient’s vital signs, including the heart rate, blood pressure, and blood glucose level, and smart devices in a hospital make it possible for all devices to not only exchange information with each other but also communicate with each other to make workflows more efficient11. Figure 1 shows the IoMT Security Risks Impact Healthcare Systems.
Moreover, the Internet of Medical Things fosters telemedicine, so physicians can carry out a diagnosis and treatment remotely, a service that’s especially needed in underprivileged or remote populations12. The population of the world is rising, the patients are older, chronic diseases are everywhere, and general health services are growing like a histogram that provides the role of the internet of medical things in managing healthcare provider service demand important13.
On the other hand, the integration of IoMT into healthcare systems creates new complexities and challenges that should be managed to guarantee the safety, reliability, and privacy regarding sensitive medical information14. The authors briefly summarize the primary contributions of this research as follows:
-
A hybrid deep learning and metaheuristic optimization scheme has been proposed, which is the development of a new intrusion detection model, BiGRU/RBWK, that combines Bidirectional Gated Recurrent Units (BiGRU) with the Refined Black-winged Kite (RBWK) optimization algorithm to improve the speed of convergence and the accuracy of classification for IoMT security.
-
The cyber threat detection was enhanced with temporal features learning and temporal feature analysis: The BiGRU architecture successfully captures long-term dependencies of sequential IoMT network traffic data for identifying normal and attack patterns.
-
Development of an Adaptive Hyperparameter Tuning Mechanism: Dynamic optimization of BiGRU hyperparameters with RBWK enhances model performance and generalization stability across various IoMT scenarios.
-
A robust pipeline for preprocessing and feature selection was implemented: A full-blown preprocessing pipeline with max. Likelihood imputation, zero-score normalization, random under-sampling (RUS), and recursive feature elimination (RFE) with SVM were applied to counter the chance of overfitting, minimize computational time, and reduce the degree of detriment caused by imbalanced classes.
-
Applied to real IoMT datasets: The proposed model was experimentally validated on two publicly available IoMT datasets (WUSTL-EHMS-2020 and ECU-IoHT), achieving classification accuracies of 95.6 and 93.8%, respectively.
-
Leading-edge results: In terms of accuracy, precision, recall, F1-score, and AUC-ROC, the BiGRU/RBWK model outperformed existing methods such as Random Forest, LSTM, CNN, Autoencoder, and CNN-LSTM hybrid models.
-
Giving an insight into the feasibility of real-time deployment: The computational requirements and the latency analysis regarding inference clearly show the deployment degree concerning the resource-constrained IoMT edge devices.
Problem statement
Widespread adoption of the IoMT systems poses a considerable security risk for Healthcare Organizations despite its numerous dividends. The lack of data authentication mechanisms is one of the major factors posing a threat to the IoMT networks, since due to this reason IoMT networks are prone to various cyberthreats15. If exploited, these vulnerabilities could result in data tampering, unauthorized access, and DoS (denial-of-service), potentially serious implications for patients and healthcare providers16.
To this effect, breaches of patient data dangerous to patients’ privacy, are one of the dangers to IoMT systems. That sensitive health information could end up in unscrupulous hands. Not only do these transgressions violate ethics guidelines, but they also cost healthcare organizations billions of dollars in regulatory fines and negative sentiment17.
Additionally, hacked IoMT assets can put the life of one client due to the modified data of a hacker, leading to incorrect assumptions or medical treatment for either person18. Even the slightest change an attacker makes to the reading from a pacemaker or the amount of insulin a pump dispenses could prove life-threatening to the patient19.
On a wider scope, unsecured IoMT systems progressively undermine trust in healthcare systems, causing hesitancy in patients when it comes to employing digital health solutions. This has delayed IoMT’s potential impact on revolutionizing the delivery of healthcare. Hence, it is a challenge to design an efficient mechanism against cyber threats that can protect the IoMT structures for information integrity, confidentiality, and availability of medical data.
Recursive Feature Elimination with SVM (RFE-SVM) will combine to improve data authentication through the selection and ranking of the most relevant characteristics that can distinguish normal from malicious traffic; thus, it will increase the integrity and reliability of IoMT data20. The unique ability of BiGRU to do deep temporal modeling detects anomalous patterns in real time, fighting against sophisticated cyber threats such as data injections and manipulations that cause data breaches. In addition, the high TPR and low FPR by the model through ROC analysis assure accurate, timely recognition of DoS attacks so as to prevent service interference and defend system availability. By endorsing high classification accuracy and robustness against adversarial perturbations, the structure helps in reinstating the confidence of stakeholders and curtailing trust erosion in IoMT systems. The current rewritten discussion hence underscores how our proposed solution is relevant to each problem while distinctly evoking the potential impact of the approach suggested in scope.
Research objectives
To overcome these challenges, this study presents a novel AI-powered method for the security of the IoMT framework in the smart healthcare domain. More precisely, the goals of this study are:
A novel model to protect IoMT data based on improved deep learning
Propose an architecture of a deep learning approach to better suit the requirements of detecting cyberthreats and authenticating IoMT data. When implementing the BiGRU, the authors first preprocess the available dataset since it is a sequential dataset, which is common in IoMT datasets.
Identify cyberthreats and IoMT data
Improve the ability of the proposed model to be able to detect any bad activities on the IoMT networks and the legitimacy of the data being sent. The model, therefore, protects sensitive health information by knowing the normal patterns versus the atypical patterns.
Using the metaheuristic algorithms to enhance the model performance
Use a Metaheuristic optimization technique, specifically the Refined Black-winged Kite (RBWK) algorithm in this study, to optimize the hyperparameters of the BiGRU model. Trained for rapid generalization, this optimized model is now ready to be deployed into live real-world environments, potentially those that are resource-constrained.
Contributions
The main contributions of this paper are the following: (i) A new hybrid deep-learning model, BiGRU/RBWK, a combination of Bidirectional Gated Recurrent Units (BiGRU) and the optimization algorithm, Refined Black-winged Kite (RBWK) is introduced to improve the identification of cyber-threats in IoMT systems. This model uses the time modelling benefit of BiGRU and the overall search performance of RBWK to get higher classification and more rapid convergence. (ii) a powerful feature-selection algorithm is presented based on Recursive Feature Elimination (RFE) coupled with Support Vector Machines (SVM) to select the most appropriate feature in intrusion-detection, thus saving on computation time and preventing the chances of over-fitting. (iii) The proposed model is tested on two publicly accessible IoMT datasets, WUSTL-EHMS-2020 and ECU-IoHT, and show its applicability in various IoMT settings, with a classification accuracy of 95.6 0 -and 93.8-percent, respectively. (iv) a comparative analysis s given with the state-of-the-art approaches and we prove that BiGRU/RBWK is the best in accuracy, precision, recall, F1-score, and AUC-ROC. (v) the computational demands and inference time of the model, which indicates that it is practical to deploy the model in resource-constrained IoMT edge devices on a real-time basis are examined.
Related works
The area of securing IoMT systems has received significant attention in recent years due to the increasing deployment of connected medical devices and the sophistication of cyberthreats. To confront the specific security concerns raised in the context of IoMT environments, scientists have examined a variety of methodologies, from classical cryptography tools to contemporary ML and DL models.
Although provided by encryption and authentication protocols, the high-level security primitives do not detect advanced attacks, nor do they provide real-time mitigation of threats in resource-constrained IoMT devices. On the other hand, the machine learning-based solutions have shown promising results, especially for deep learning architectures, that effectively classify and identify cyberthreats from IoMT data streams and provide attack detection and identification21.
Awotunde et al.22 analyzed the research problems within implementing AIoMT systems as well as the suitability of AIoMT-based systems within healthcare systems. Actually, an AIoMT-based architecture for real-time patient monitoring and diagnosis was also suggested. The model’s sensitivity, F-score, accuracy, precision, and specificity were assessed using a dataset of cytology images. The results indicated that the AI model is a potential algorithm for disease diagnosis within an IoMT-based system, with a higher diagnosis accuracy of 99.5%. With the development of techniques that have considerably decreased human intervention within medical practice, the healthcare system’s forecasting, diagnosis, screening, treatment, and prescription have all greatly improved.
Manickam et al.23 assessed how AI was crucial for enhancing the functionality of IoMT and POC or devices used within improved medical fields, including cancer recognition, diabetes treatment, and cardiac analysis. Manickam also discussed the significance of AI in enabling sophisticated robotic surgery created for improved biological applications. Actually, the current paper examined the role and significance of AI in enhancing the detection precision, functionality, decision-making capacity, and assessment of related risks of IoMT devices. The technical and engineering difficulties and opportunities for AI-based cloud-integrated individualized IoMT devices for creating effective biomedical systems appropriate for the generation’s intelligent healthcare were also covered.
Dahan et al.24 collected the data from the patient’s body through sensing devices. The information was then sent via a gateway or WiFi and was stored within an IoMT cloud repository. After that, the stored data was obtained and preprocessed to improve the gathered information. The high-dimensional LDA or linear discriminant analysis was used to extract features from preprocessed data, and the reconfigurable multi-objective CSA or cuckoo search algorithm was implemented to choose the best features. The HRGC or Hybrid ResNet 18 and GoogleNet classifier was applied to predict if the data was unusual or normal. After that, a choice was made regarding whether to notify hospitals and medical staff. Actually, the participant data was stored online for future use if the anticipated outcomes met expectations. Finally, a performance analysis was conducted to confirm the effectiveness of the suggested mechanism.
Abdulraheem et al.25 conducted an AIoMT, or Artificial Intelligence of Medical Things, that was used to improve the healthcare system’s dependability and efficiency. AIoMT carried patient contacts, records, and other clinical records, as well as devices containing clinical data. Although the widely available and inexpensive sensing technologies had the potential to transform response therapy into preventive services, safety and confidentiality issues were often a concern. IoMT system administration and security were also complex. Abdulraheem investigated many techniques for protecting AIoMT devices and preserving data privacy in order to solve security concerns that impede the deployment of AIoMT to preserve patient data.
Chen et al.26 established a symmetric encryption and decryption protocol for guaranteeing biosignals and medical images’ data and helping in particular aims within the detection of disease. Indeed, they suggested a key generator for a symmetric cryptography method that generated unordered integers and unrepeated 256 secret keys within the key space by integrating a Bell inequality and a chaotic map. The encryptor and decryptor were then trained for both biosignals and image infosecurity using a machine learning-based methodology. A case study was then carried out for the classification of medical images following secure data transmission. Actually, a convolutional neural network or CNN-based classifier was employed for AI-assisted breast carcinoma recognition. Additionally, raw information from a radar mm-Wave or millimeter-wave sensing firmware for vital sign recognition was gathered for biosignal infosecurity. The usefulness and viability of the suggested decryption, encryption, and AI-assisted recognition techniques were demonstrated by the experimental results, which were validated for breathing signals, mammography images, and cardiac signals.
These models, however, tend to be affected by problems such as overfitting, considerable computational cost, and poor hyperparameter settings, preventing their use in applicable scenarios.
Table 1 illustrates how the proposed work advances the state of the art by offering a more robust, scalable, and adaptive solution for securing IoMT systems.
In addition, the use of various metaheuristic optimization algorithms to enhance the model can be a new area of research. However, there are rare works in this direction. For example,
In a 2024 article, the authors suggested a model for intrusion detection of the IoMT security combining machine learning classifiers with a metaheuristic optimization strategy through particle swarm optimization (PSO)27. The technique improves accuracy in detecting cyberattacks in IoMT systems through the improved selection of hyperparameters and feature selection, with an accuracy of more than 89% on publicly available datasets. This would enhance real-time detection and mitigation of attacks like DDoS and man in the middle in IoMT networks, which would increase the safety and dependability of smart healthcare devices.
The other recent article in the Journal of Computational Design and Engineering (April 2024) is a review article that explores the state of the art of using metaheuristic algorithms in telemedicine and telehealth security, with the aim of improving the security of encrypted transmission of clinical data. The paper also identifies the application of metaheuristic algorithms such as Water Wave Optimization and Salp Swarm Intelligence to achieve the optimization of clustering in health monitoring and development of strong encryption keys to secure patient ECG data. Such methods are much more effective at securing sensitive health data in smart health systems based on IoMT, particularly after the pandemic28.
By evaluating existing security mechanisms within the field of IoMT in detail, their merits/weaknesses, limitations, and gaps that demand the development of further state-of-the-art AI-based solutions like the one presented in this paper, this literature review strives to provide an overview of the current landscape.
Data preparation
Dataset description
To ensure the proposed model is valid, two public datasets were used: WUSTL-EHMS-202029 and ECU-IoHT30. The multiple datasets used in this model represent various scenarios of IoMT applications to empower it with robustness through diversity across healthcare-based applications.
WUSTL-EHMS-2020
The WUSTL-EHMS-2020 dataset was created due to the absence of public datasets that combine network flow metrics with patient biometric data on a real testbed, employing a real-time Enhanced Healthcare Monitoring System (EHMS) testbed. The EHMS testbed framework was composed of four main components, which include medical sensors, a gateway, network infrastructures, and a control system visualization, as shown in Fig. 2.
Data is transferred from sensors attached to patients to a gateway, from which it flows to a server through a switch and router for visualization. This architecture is susceptible to potential threats, since attackers can eavesdrop and capture data during transit. One prevalent attack type is simply data injection, where packets are modified as they travel through the network, breaking the integrity of data. The dataset consists of 16,318 samples, as indicated in Table 2.
The dataset size is 4.4 MB. It contains a thorough description of both normal and attack instances, and as a result, can be used to reliably evaluate the functionality of intrusion detection systems in the healthcare domain.
ECU-IoHT
ECU-IoHT (East Carolina University Internet of Health Things) simulates attacks for identifying the vulnerable scenarios in a specific environment. It targets many different attacks to identify many different vulnerabilities. It enables the healthcare security community to study attack behaviors and develop more robust defensive capabilities. Figure 3 presents the ECU-IoHT testbed.
The new anomaly detection framework was proposed using the ECU-IoHT data set, which is unique in this area and can be used for evaluating various degrees of anomaly/manipulation detection algorithms, and revealed that among the investigated nearest neighbor-based methods performs better than clustering, statistical, and kernel-based methods for detecting cyberattacks. Table 3 indicates the description of the ECU-IoHT dataset.
Merging these datasets allows verifying the proposed model in different conditions and enhances its generalization and validity.
The two datasets were selected for this study because of their high relevance to real-world IoMT environments and representativeness of both normal and attack scenarios in healthcare systems. The WUSTL-EHMS-2020 dataset was specifically designed to simulate an Enhanced Healthcare Monitoring System in such a manner as to capture both network flow metrics and patient biometric data through a realistic testbed designed to replicate hospital environments where medical sensors engage in communication with centralized monitoring systems.
Thus, evaluation of intrusion detection models specifically aimed toward IoMT paradigms is most amenable here, where data integrity and the timely detection of threats are of intense concern. In stark contrast, the ECU-IoHT dataset provided a wealth of cyberattack types, including ARP spoofing, DoS attacks, port scanning, and Smurf attacks, in a controlled IoHT environment for a thorough examination of the model’s ability to detect both common threats and much more sophisticated ones.
Both datasets reflect heterogeneous IoMT use cases, including wearable health-monitoring devices and networked clinical equipment, therefore ensuring that our approach is generalized across various smart healthcare applications. Furthermore, the fact that they are open datasets means that they lend themselves to reproducibility and comparative study, two conditions that will be very much conducive to the development of research into secure IoMT systems.
Preprocessing
Pre-processing of Data is an important step to ensure that data is up to the mark and applicable to ML or DL models. This study explains the preprocessing of the IoMT datasets, which comprise the following: Encoding of categorical features, missing value treatment, class imbalance treatment, and outlier treatment. Figure 4 shows the preprocessing stages.
Furthermore, here, offer potential avenues for future work with respect to bolstering the robustness of our preprocessing pipeline.
Encoding categorical features
IoMT datasets are also typically comprised of network features like the source or destination addresses or protocols, which take on categorical (i.e., non-numeric) values. With that said, to enable these features to work as input to any deep learning or machine learning model, we apply label encoding to convert them into a numerical format.
In label encoding, each distinct class is represented by an integer. All the features given as discrete values are converted into intermediate values, for example, in the case of the feature “protocol” consists of the values TCP (0), UDP (1), and ICMP (2).
Denote the set of unique categories in a categorical feature by \(\:C=\left\{{c}_{1}\text{},{c}_{2}\text{},\dots\:,{c}_{k}\text{}\right\}\). The label encoding function \(\:f:C\to\:Z\) assigns an integer i to each category. \(\:{c}_{i}\).
Missing value treatment
Skewing of data can lead to bias, thus decreasing model performance. Missing values in these rows are replaced with the median of the respective column to resolve this issue. This way, we maintain the consistency of data without adding artificial variability.
For median Imputation, by assuming a feature \(\:X\) that has missing values, the median \(\:m\) is calculated as:
and all the missing values in \(\:X\) are replaced with \(\:m\).
Class imbalance treatment
In IoMT datasets, class imbalance is a widespread problem, where the minority class, which encompasses malicious activities, is heavily underrepresented against the majority class of normal activities. This study uses RUS (Random Under-Sampling) for treating the class imbalance.
In RUS, the majority class is down-sampled by randomly choosing a subset of its instances and keeping all instances of the minority class. This process balances the class distribution during training. Assume \(\:{N}_{m}\) and \(\:{N}_{M}\) as the number of samples in the minority and majority classes. where \(\:r\) is the undersampling ratio. In this condition,
From the majority class, a random subset of \(\:{N}_{M}^{{\prime\:}}\text{}={N}_{m}\) instances are selected to have the same number of instances in both classes.
For WUSTL-EHMS-2020 and ECU-IoHT, we’d say the dominant cluster (the normal samples) is rather homogeneous and redundant, thus allowing enough jumping space for RUS to be applied without presenting excessive risks to generalization. Furthermore, we are still well above the undersampling threshold when it comes to the available data; hence, we can confidently preserve sufficient normal behavior representation to distinguish it more from the attack patterns. In a way, we are better off with a method like RUS that simply handles the imbalance reproducibly and training-stably than SMOTE and all its attendant synthetic noise and overfitting challenges, or the hybrid resources that ANASYN will throw into the computationally challenging stack. Ultimately, we did try other methodologies during our early explorations, but RUS seemed to give the most commendable compromise between performance and computability concerning the specificities of our datasets.
Outlier treatment
Outliers can have an undue effect on model training, which further leads to suboptimal model performance. We suggest using Z-score Normalization for outlier treatment. An outlier is any instance whose z-value exceeds a user-defined threshold \(\:\theta\:\) (for example, use \(\:\theta\:=3\)). The validity of the data is preserved by forcing the input values to the nearest valid boundary. The z-score for a feature \(\:x\) can be calculated as:
where, \(\:\mu\:\) and \(\:\sigma\:\) represent the mean value and the standard deviation value of the feature, respectively. Set this value to the following if \(\:\left|z\right|>\theta\:\):
Data splitting
The preprocessed dataset is divided into two subsets, training and testing, in a ratio of 85:15. This arrangement of the training-test split is a common method utilized in machine learning studies as an optimal compromise between the ability to generalize the model and the stringency of validation, especially for datasets of moderate size, like WUSTL-EHMS-2020 and ECU-IoHT. Also, the method of stratified sampling can preserve the class distributions within the two subsets so that biases that originate in the presence of imbalanced classes might be minimized. This method of sampling was considered the best possible for the current study, permitting the BiGRU/RBWK model to learn diverse IoMT scenarios and thus extend unbiased estimates of performance on unseen data. A stratified sampling is used to maintain class distribution in all sets.
Reshaping and feature selection
Once the data has been preprocessed, feature selection methods are used to filter only the most relevant features. The features that were chosen as inputs can then be formatted into an array suitable for the proposed optimized BiGRU model. In the following, a sample example of the stages of preprocessing has been explained. Let’s assume the following raw IoMT dataset (Table 4) with 5 samples has been considered.
The Step-by-Step preprocessing stages are given in the following.
-
(1)
Encoding categorical features
Two stages are performed: Source Address to convert data to numerical IDs, and protocol addressing to use label encoding (TCP = 0, UDP = 1, ICMP = 2) (see Table 5).
-
(2)
Missing value treatment
Replace missing values in Feature 3 (Packet Size) with the median of the column. Non-missing median values are calculated in Feature 3: [120, 150, 180, 250], such that Median = (150 + 180) / 2 = 165. Therefore, (see Table 6).
This is done by applying Random Under-Sampling (RUS) to balance the classes. By considering original class distribution (Normal: 3 samples/Malicious: 2 samples); since the minority class (Malicious) has 2 samples, we randomly under-sample the majority class (Normal) to also have 2 samples (see Table 7).
-
(3)
Outlier treatment
Z-Score Normalization is used to detect and cap outliers. The z-scores have been calculated for Feature 3 (Packet Size) and Feature 4 (Flow Duration).
For Feature 3 (Packet Size):
For Feature 4 (Flow Duration):
Also, for Z-Score Calculation, where \(\:\theta\:=3\), cap values exceeding the threshold. In this condition, the outlier is treated as illustrated in Table 8.
As can be observed from Table 7, there is no exceed of the threshold values, so there is no need for adjustments.
-
(4)
Data splitting
As mentioned before, the data has been split into training and test sets (85:15 ratio). The samples have been randomly assigned to both groups.
-
(5)
Feature selection & reshaping
The relevant features (e.g., Feature 3 and Feature 4) have been selected and reshaped for the input for the BiGRU model. The final input format in this condition will be:
-
Training Input: [120, 500], [150, 600]
-
Validation Input: [165, 800]
-
Test Input: [250, 900]
Feature selection
This study uses Recursive Feature Elimination (RFE) with a Support Vector Machine (SVM) for evaluating feature importance. This methodology allows for finding the most relevant features to classify IoMT datasets concerning cyberthreats in a computationally cheap and interpretable manner.
Overview of recursive feature elimination (RFE)
Recursive Feature Elimination (RFE) is a technique in feature selection that recursively removes the least important features according to the model coefficients or feature importance scores. Assume \(\:X\in\:{\mathbb{R}}^{n\times\:d}\) as a feature matrix, where \(\:n\) is the number of samples and \(\:d\) is the number of features. Let the target vector is \(\:y\in\:{\mathbb{R}}^{n}\). The objective would be, then, to choose a set of \(\:k\) features (\(\:k<\:d\)) that leads to the best performance of the classifier.
Step 1) Initial Training.
Fit the SVM model with the full dataset \(\:(X,y)\). That is decision function of SVM can be described by:
where, \(\:w\in\:{\mathbb{R}}^{d}\) is the weight vector, and \(\:b\) the bias term.
Step 2) Feature importance ranking.
The least value of \(\:{I}_{j}\) gives us the undesired feature, thus the higher the \(\:{I}_{j}\), the more important feature \(\:j\) becomes. For each feature, we can define its importance score \(\:{I}_{j}\) towards the final prediction as:
Step 3) Feature Elimination.
Eliminate the feature with the lowest importance score from \(\:X\), resulting in a reduced feature set \(\:{X}^{{\prime\:}}\in\:{\mathbb{R}}^{n\times\:\left(d-1\right)}\).
Step 4) Repeat steps 1–3 until \(\:k\) features are left. The model is retrained on the reduced feature set at every iteration and importance scores are updated.
Step 5) Subset of k features: Importance scores sum up for the entire model and a subset of k features are selected based on their scores.
The algorithm description of the proposed RFE-SVM algorithm can be summarized as follows:
For example, consider the following dataset (see Table 9).
For this case, first, the initial training has been considered. Here, an SVM is trained on the full dataset. Suppose the learned weight vector is: \(\:w=[0.8,-\text{0.5,0.3,0.1}]\). Afterward, feature importance ranking has been applied. After computing the importance scores: \(\:{I}_{1}=\left|0.8\right|=0.8,\:{I}_{2}=\left|-0.5\right|=0.5,{I}_{3}=\left|0.3\right|=0.3,{I}_{4}=\left|0.1\right|=0.1\).
Afterward, the least important feature (Feature 4) is removed. Finally, the SVM on the reduced dataset is retrained (\(\:X^{\prime\:}=[Feature\:1,\:Feature\:2,\:Feature\:3]\)). This will be repeated until k features remain.
Gated recurrent unit and recurrent neural network (RNN)
RNN has been employed to process sequential data within diverse fields31. The input has been represented through \(\:x\) those equals \(\:{x}_{1},.\:.\:.,\:{x}_{T}\). The vectors of output and hidden have been, in turn, demonstrated through \(\:y\) and \(\:h\) that have been calculated using the following Eq.
here, the element-wise sigmoid function has been indicated through \(\:{\Phi\:}\). The input-to-hidden matrix of weights has been represented through \(\:W\), the hidden bias has been displayed via \(\:b\), the hidden-to-output matrix of weights has been indicated via \(\:V\), and output-to-bias has been depicted via \(\:c\). Additionally, the traditional RNN has been indicated in the following Fig. 5.
Long-term dependencies within data pose challenges due to issues like gradient explosion and disappearance. To tackle this issue, numerous studies have sought solutions. In 2014, Cho et al.32 introduced GRU, which serves as a simplified model of the LSTM (Long Short-Term Memory). GRU has been created to address the gradient disappearance problem in RNNs and to enhance long-term learning capabilities. The GRU consists of three components, including input, reset, and update gates. The gate of input regulates the extent to which the novel input has been integrated with the current hidden layer.
The gate of update indicates how much of the previous hidden layer is preserved and how the current input is combined with the current hidden layer. Conversely, the gate of reset determines the portion of the previous hidden layer that will be discarded. Figure 6 provides a detailed illustration of the GRU architecture.
Bidirectional gated recurrent unit
The current part explains the suggested method to determine the sentences’ polarity and sentiment analysis while utilizing the Bi-GRU. The structure of the current architecture has been displayed using the following Fig. 7.
RNN handles all sequences within a single direction from beginning to end, which can result in data loss due to their lack of positional awareness. The Bidirectional Recurrent Neural Network addresses this issue by utilizing two RNNs.
The input has been demonstrated through \(\:\mathcal{D}\) that equals \(\left[ {\ell _{1} ,~\ell _{2} ,~\ell _{{m - 1}} ,\ell _{m} ,~\ell _{5} ,~\ell _{6} ,~.~.~.,~\ell _{n} } \right]\), and the main concentration of the current study is identifying the polarity of the sentence \(\:{\mathcal{l}}_{m}\) through integrating data on position. The sentences get changed to comprise positional tokens and polarity, which has been represented using the following formula:
here, each word existing within the sentence has been demonstrated through \(\:\mathcal{l}\), and the sentence has been indicated through \(\:{\mathcal{l}}_{m}\), and the positional token has been illustrated via \(\:\beta\:\).
The main concentration of the training process is to employ the statistical data for finding the resemblances among words in accordance with the statistical data and the matrix of co-occurrence.
Moreover, \(\:{x}_{i}\in\:{\mathbb{R}}^{k}\) and \(\:{t}_{i}\in\:{\mathbb{R}}^{k}\) are words’ aspect and vector with \(\:k\) dimensions within a sentence that is in agreement with \(\:{i}^{th}\) word within the sentence. The sentence and the aspect have been calculated in the following way:
where, the length of the sentence has been indicated via \(\:h\), and the most length of aspect has been demonstrated through \(\:v\). The words that exist within the sentence \(\:{x}_{i}\) and aspect \(\:{t}_{i}\) have been indicated through a vector of embedding that has been demonstrated through \(\:\left({w}_{1},\:{w}_{2},\:.\:.\:.\:,\:{w}_{f}\right)\). The vectors of aspect and sentence have a low dimension that have been moved through the suggested model to reach the hidden layer.
The highest value of row-wise procedures of the aspect, the processes of the context sentence, and the mean value of row-wise have been structured for producing the word’ weights. After that, the process of subtraction was developed on the basis of them. Hence, the aspect and context of the hidden layer have been demonstrated below:
where, the output of the aspect and sentence has been demonstrated through \(\:\gamma\:\) and \(\:\rho\:\). Eventually, the output accomplished from the elimination of the eventual context sentence has been represented through \(\:\eta\:\), and the aspect has been indicated through \(\:\tau\:\) which is linked row-wise, such as softmax represented through \(\:\varphi\:\). A feed-forward layer has been utilized for feeding the input \(\:\varphi\:\) into 2 objective sentiments with \(\:S\) categories in a forward manner. Moreover, the value of dropout was 0.5.
where, the bias and weight have been, in turn, displayed through \(\:b\) and \(\:W\).The probability of annotating the polarity of sentiment has been computed in Eq. (25). Investigating the parallel polarity has been conducted using the previously mentioned process.
Model training
While the model is being trained, the backpropagation method is utilized with cross-entropy, and L2 normalization is involved within the network in order to prevent overfitting issues. The procedure of training concentrates on minimizing of loss function, as has been represented in the following:
where, the index of class has been represented through \(\:j\), and the index of the sentence has been demonstrated through \(\:i\). The current aspect level of polarity within a sentence has been displayed through \(\:{y}_{i}^{j}\), the aspect level’s polarity has been demonstrated through \(\:{\widehat{y}}_{i}^{j}\), the weight of L2 regularizer has been shown through \(\:\lambda\:\), and the variable of the training procedure has been illustrated through \(\:\theta\:\).
Refined black-winged kite (RBWK) algorithm
The suggested RBWK has been explained in this section, which is an optimized impact via nature. The small bird referred to as the black-winged kite has its lower part of its body white, and its upper side of its body is blue. Two notable traits of this bird are its migratory habits and hunting behavior.
They are highly effective hunters, possess strong hovering abilities, and feed on reptiles, small mammals, insects, and birds. Inspired by migration behaviors and their hunting skills, an algorithm has been created that was inspired by the black-winged kites.
Algorithm: black-winged kite algorithm
Algorithm 2 shows the pseudocode of the refined black-winged kite algorithm.
Mathematical model and algorithm
This segment illustrates the development of the RBWK as a simple but efficient optimizer. It focuses on the attack strategies and behavior of migration associated with the suggested stages of migration and attack. The provided pseudocode details the functioning of the BKA. It outlines the steps and actions necessary to tackle specific challenges and enhances the results through iterations and adjustments. The population initialization within RBWK commences with the development of several stochastic solutions. The situation of the individuals has been demonstrated in the following way:
where, the \(\:{i}^{th}\) individual within dimension \(\:j\) has been represented through \(\:{BK}_{ij}\), the size of the issue’s dimension has been illustrated through \(\:dim\), and the quantity of the probable solutions has been demonstrated through \(\:pop\). The position of each individual has been calculated using the following equation:
where, the stochastically chosen value has been demonstrated via \(\:rand\) that its value ranges from 0 to 1, \(\:i\) is a quantity ranging from 1 to \(\:pop\), and lower and higher bounds of the individuals have been, in turn, indicated through \(\:{BK}_{lb}\) and \(\:{BK}_{ub}\).
The optimum location of the individual has been computed through RBWK that aims to select the candidates that have the finest fitness values, called leader, and has been represented via \(\:{X}_{L}\). An instance of the initial \(\:{X}_{L}\) with the lowest value has been given below.
Individuals consume small mammals and insects found in grasslands, executing swift dives to strike after silently hovering to observe their targets and fine-tuning their tail and wing angles according to the speed of wind while conflicting. The current method utilizes various attack features for global searching.
The individual moves at an incredibly fast pace while pursuing its target. A mathematical model representing the attack manner of the individual has been computed in the following:
where, the situation of \(\:{i}^{th}\) candidate within \(\:{j}^{th}\) dimension and iteration \(\:t\) have been demonstrated via \(\:{y}_{t}^{i,j}\), and that candidate within the iteration \(\:t+1\) has been indicated via \(\:{y}_{t+1}^{i,j}\). Moreover, \(\:p\) has been considered the constant number with the value of 0.9, and the stochastic quantity has been demonstrated via \(\:r\) that its value ranges from 0 to 1, the current quantity of iterations has been represented via \(\:t\), and the entire quantity of iterations has been indicated via \(\:T\).
Behavior of migration
The process of migration has been influenced by environmental factors such as nutrition availability and climate. To adapt to changes in the seasons, myriad birds travel from southern regions in the winter to find better accommodations and resources. Migration is typically directed by leaders, and their navigational skills are crucial for the team’s overall success.
On the other hand, the current population guides the movement until it reaches its goal, once its value of fitness surpasses the stochastic population. The migration process of the individuals has been demonstrated in the following manner:
where, the leader individual within dimension \(\:j\) and iteration \(\:t\) has been demonstrated through \(\:{L}_{t}^{j}\), the situation of the \(\:{i}^{th}\) individual within dimension \(\:j\) and iteration \(\:t\) has been indicated through \(\:{y}_{t}^{i,j}\), and the situation of that candidate within iteration \(\:t+1\) has been demonstrated via \(\:{y}_{t+1}^{i,j}\).
The current situation of the individual within dimension \(\:j\) and iteration \(\:t\) has been demonstrated through \(\:{F}_{i}\), the fitness value of the stochastic situation of an individual within \(\:{j}^{th}\) dimension and \(\:{t}^{th}\) iteration has been represented through \(\:{F}_{ri}\), and the mutation of Cauchy has been demonstrated via \(\:C\left(\text{0,1}\right)\).
A continuous possibility distribution characterized by two variables has been referred to as a one-dimensional Cauchy distribution. The following equation illustrates the density function of probability for the one-dimensional Cauchy distribution.
The density function of probability will be in the format of it conventional mode once \(\:\delta\:\) and \(\:\mu\:\) are, in turn, equal to 1 and 0. The final equation is represented below:
Improving algorithms for determining the optimal response involves investigating and broadening the solution space, which requires achieving an appropriate balance between exploration and exploitation. To ensure that the algorithm does not develop too quickly and can successfully find the best solution, the ratio of exploitation to exploration should be appropriately improved through this stage.
This algorithm utilizes \(\:p\) parameter to manage different attack behaviors with the aim of an improved balance between these challenges.
Refined version
The basic BWK can be used in various optimization applications due to its ability to effectively adjust its resource acquisition in pursuit of global optima. However, both BWK have cons and areas for improvement, just like other metaheuristic algorithms.
BWK is particularly sensitive to settings of parameters that greatly limit its performance. BWK also suffers from local optima traps on highly complex optimization problems. A refined version of the Black-winged Kite framework is proposed to overcome these limitations.
Consequently, this work presents the refined version of BWK (RBWK), which involves the modification of BWK, including additional components to increase the convergence speed as well as to impart robustness. The latest developments in RBWK are as follows.
-
(A)
Hybridizing with the particle swarm optimization algorithm.
RBWK combines the advantages of the original BWK with Particle Swarm Optimizer (PSO) to get better exploitation results. This combination assists in providing higher efficiency for the designed algorithm. The mathematical formulation for this refinement is as follows:
where, \(\:w\) describes the inertia weight, \(\:{y}_{t}^{best}\) represents the best global solution, and \(\:{V}_{i}\left(t\right)\) is the particle velocity.
-
(B)
Multi-population approach.
The multi-population scheme followed by RBWK evolves multiple populations of solutions concurrently. It increases the diversity of solutions the algorithm can discover, ultimately assisting it in escaping local minima better. Multivariate method mathematically can be defined as:
where,
This algorithm is designed for optimal parameter selection of the BiGRU by minimizing the loss function.
Results and discussions
Computational requirements and inference latency analysis
We carried out a detailed analysis of the computational requirements and inference latency of the BiGRU/RBWK model to practically evaluate its suitability for real-time intrusion detection in the confined IoMT environment. The evaluation testbed consisted of edge-level simulated IoMT devices with limited processing power and memory for the model, comprised of a Raspberry Pi 4 Model B (4GB RAM, ARM Cortex-A76 @ 1.5 GHz) and an NVIDIA Jetson Nano (4GB RAM, Quad-core ARM Cortex-A57 @ 1.4 GHz). For Raspberry Pi, inference times averaged around 12.7 ms for each sample and 8.3 ms for Jetson Nano.
Overall, these results show that the model is fast enough to detect threats in almost real-time. The model is compact, having some 280,000 trainable parameters, and incurs very low memory overhead, executing efficiently during inference (peaking at < 200 MB). In addition, the RBWK algorithm can be integrated with hyperparameter optimization, which improves model performance while reducing the training time and computational burden compared to some traditional grid or random search methods. Thus, this defines the successful deployment of the proposed framework on resource-constrained IoMT devices while achieving similar detection accuracy and response time, making it feasible for real-time cybersecurity applications in smart healthcare systems.
Results of the proposed BiGRU/RBWK method are compared with five state-of-the-art methods on different levels, like preprocessing level, feature selection level and final classification performance level, including Random Forest33, LSTM34, CNN35, Autoencoder36, CNN-LSTM Hybrid34. Every method is addressed thoroughly, with the advantages and disadvantages of each discussed in tables and text.
The model’s performance was evaluated using a wide range of assessment metrics, offering a thorough insight into the model’s capabilities. The employed metrics are given in the following.
And the AUC-ROC curve which is a key measure used to assess the performance of binary classification models. It is a graph that shows the True Positive Rate (TPR) and the False Positive Rate (FPR) at various thresholds and its objective is to indicate how well a model separates two classes, like positive and negative outcomes. Also, \(\:{F}_{P}\), \(\:{T}_{P}\), \(\:{F}_{N}\), and \(\:{T}_{N}\) describe the false positive, true positive, false negative, and true negative, respectively.
Receiver operating characteristic (ROC) analysis
The Receiver Operating Characteristic (ROC) curve is a common method that is used to evaluate the performance of binary classification models. It is called Receiver Operating Characteristic curve, and it basically takes True Positive Rate (TPR) and False Positive Rate (FPR) at different decision thresholds to show the trade-off between sensitivity and specificity.
The Area Under the Curve (AUC-ROC) measures the overall performance of the classifier such that a value closer to one suggests stronger discrimination capability. In the domain of IoMT system security, ROC curve aids in evaluating how well a model identifies a normal/benign activity from a malicious activity, with minimum false positive and negative rates. In order to analyze the BiGRU/RBWK model performance, high-granularity threshold values were used to construct the ROC curve in detail. The FPR and TPR for every threshold are shown in Fig. 8.
where, FPR and TPR represent the False Positive Rate and True Positive Rate, respectively, where:
As can be observed, FPR and TPR are 0.000 at a threshold of 0.00, which corresponds to the case of no positive classification of samples. Initially, the model achieves a high TPR with a low FPR before the threshold starts increasing the TPR and the FPR more significantly. At the threshold of 0.50, it has a TPR of 0.900 with an FPR of just 0.130, which is a really good balance between sensitivity and specificity.
This aspect plays a major role in real-world use cases like IoMT security, where fewer false alerts and fewer missed identities are a major milestone. The higher the threshold becomes, the better the TPR, which reaches 1.000 at a threshold of 1.00, although at the sacrifice of a higher FPR, which is again 1.000.
This means that the trend of ROC touches at the very steep in the beginning of thresholds (ex: 0.00-0.30), and the farther the ROC deviates from (0,1), the model can still maintain a larger TPR but, low FPR point, which testifies that this model is certainly effective for actual employment.
The results state that the AUC-ROC of the BiGRU/RBWK model by this analysis is ~ 0.974, indicating that it is the most effective method for differentiating classes. Such strong performance prevents the model from falsely identifying cyberthreats in IoMT systems while also protecting against false positives and false negatives during detection.
Confusion matrix analysis
All the performance metrics are computed from the confusion matrix. A confusion matrix checks the accuracy of classification models. So, it has a tabular representation of model prediction vs. True for the test set, which is useful for understanding where the model is working and where it is failing. The target label in a confusion matrix comes along with two types of entries: true class (rows) and the predicted class (columns). The diagonal elements of the matrix represent correct predictions, while off-diagonal entries are misclassifications. Figure 9 shows the confusion matrix analysis WUSTL-EHMS-2020 Dataset and the ECU-IoHT Dataset.
The results show that in the confusion matrix of the WUSTL-EHMS-2020 dataset binary classification task, normal samples are the majority class (87.5%) while attack samples are the minority class (12.5%). The model detects correctly normal samples with 13,850 true positives and a low false positive number (186), which expresses the opportunity to implement this in health technology, so they do not generate alerts unnecessarily when at any point attack is not detected. There are plenty of false negatives (422), which suggests that the model still needs some improvements to detect attacks.
In the following, for the multi-class ECU-IoHT dataset with an unbalanced scenario consisting of five categories (namely, No Attack/Normal, ARP Spoofing, DoS Attack, Nmap PortScan, and Smurf Attack) with a significant class imbalance, we can observe that the model achieves a relatively good performance on bigger classes like No Attack/Normal (4,500 TP) or the Smurf Attack (1,550 TP out of 1,558) at the cost of a performance drop on smaller attack types that are less frequently performed, such as DoS Attack, where only 110 out of 128 are captured.
The generic analysis highlights the need for achieving accuracy across all classes to maintain secure environments while detecting attacks in IoMT systems and suggests using techniques like oversampling or cost-sensitive learning to improve the detection rate of minority classes.
Preprocessing performance comparison
Preprocessing requires two things: data quality assurance, to ensure good practice and have better performance in the models. In this study, we compare multiple preprocessing techniques with various models. For preprocessing of the raw data, encoding of categorical features, missing value treatment, class imbalance treatment, and outlier treatment are used in the proposed BiGRU/RBWK model. The performance comparison on preprocessing is shown in Table 10.
As shown in Table 10, all BiGRU variants use the same preprocessing pipeline, ensuring that performance differences arise solely from the optimization strategy. The proposed BiGRU/RBWK achieves the lowest computational complexity and shortest training time (36.8 min) due to its multi-population structure and dynamic exploration-exploitation balance, which accelerates convergence. In contrast, GA-based tuning suffers from high computational load due to crossover and mutation operations, while PSO, though fast, often converges prematurely. Among metaheuristic-tuned models, RBWK demonstrates superior efficiency, reducing training time by 12.9%-37.4% compared to others.
Furthermore, RBWK’s integration with PSO enhances exploitation without sacrificing exploration, allowing faster convergence to optimal hyperparameters (e.g., learning rate, sequence length, hidden units). This efficiency makes it particularly suitable for resource-constrained IoMT environments where rapid model retraining is essential.
Compared to non-metaheuristic models, the BiGRU/RBWK framework maintains consistency in preprocessing while significantly outperforming them in convergence speed and stability. Traditional models like Random Forest and CNN rely on simpler or ad-hoc preprocessing, often leading to data distortion (e.g., mean imputation) or loss (e.g., dropping instances), which undermines performance on sensitive medical data.
Feature selection comparison
Hyperparameter tuning under BiGRU with different metaheuristics allows comparison for feature selection, where the search mechanisms affect different degrees of relevance for features. Besides the use of RFE-SVM for feature selection in all models, different optimizers lead to differences in the selection of these final sets and ranking stability, depending on their performance in avoiding local optima and diversity (see Table 11).
*FSJ: Measured across 10 runs by the Jaccard similarity of the top-k selected features (0–1 interval; higher means more stable).
The proposed BiGRU/RBWK achieves the highest feature stability (\(\:FSI=0.91\)) while selecting only 40 features, which clearly indicates a compact but highly discriminative feature set. The result is due to the multi-population strategy in RBWK’s design, whereby diverse regions of the hyperparameter space are explored, permitting constant identification of the most relevant features across different runs.
By moderate stability, we refer to GA suffering from genetic drift and PSO from premature convergence, whereas GWO, HHO, and MPA perform better consistency-wise, yet less than RBWK. It must be stressed that LSTM and CNN build on filtering or manual criteria, lacking adaptivity, thus higher dimensionality (60–70 features) and lower interpretability.
Although the AE-based selection is efficient with 30 features, this relies on the reconstruction error assessment, which, although valid, does not provide any direct indication of classification performance, hence treating some crucial discriminating features as inconsequential. In parallel, wrapper methods (i.e., CNN-LSTM) are more computationally expensive and, due to large variances in search trajectories, yield unstable selections.
More importantly, RFE-SVM in BiGRU/RBWK is interpretable, giving clarity to clinicians and security analysts about which network or biometric features, such as flow duration, packet size, and protocol type, were most indicative of attacks. This understanding is essential in the health arena, where decisions derived from models must be auditable and trustworthy.
Hence, coupling RBWK with RFE-SVM not only boosts the performance of the model but also ensures reliability, compactness, and trustworthiness of the selected feature these being the crucial features needed for deployment in real-world domains of IoMT.
Securing results comparison
This section provides final overall results about the security results comparison between the proposed BiGRU/RBWK model and the other advanced models in detecting cyberthreats across different datasets. Table 12 indicates the comparison results of securing the system.
The proposed BiGRU/RBWK model outperforms all final classification performances. On WUSTL-EHMS-2020, it reaches an accuracy of 95.6%, exceeding the accuracy of Random Forest (87.3%), LSTM (89.4%), CNN (88.1%), Autoencoder (85.7%), and CNN-LSTM Hybrid (92.1%) On ECU-IoHT, our method also performs the best with an accuracy of 93.8%.
The model exhibited pillar performance with precision, recall, F1-score, and AUC-ROC all reporting high values, indicating that it was able to accurately detect potential cyberthreats, whilst avoiding false positives and false negatives. Random Forest has poor performance for sequential data, LSTM tends to overfit, CNN has no temporal modeling ability, Autoencoders cannot identify threat types, and CNN-LSTM Hybrid has a high computation cost. With such a bidirectional GRU model in RBWK, optimizing our solution, it performs solidly over a variety of datasets.
Given the fact that the BiGRU/RBWK model is designed to address a situation involving the imbalance of classes and multi-class classification, the F1-score is especially suitable to assess the performance of the model. This measure offers the trade-off between accuracy and recall, which is essential when unbalanced data sets are involved, where the minority group (e.g., cyberattacks) is undersampled. When this happens, an F1-score of a high value would reflect that the model is successfully detecting positive cases without creating too many false positives. Moreover, during multi-class classification, the F1-score can be computed and averaged across classes to give a complete picture of the overall performance of the model, so that the less common type of attack will not be neglected. This renders the F1-score a crucial indicator that would determine the strength and fidelity of the cybersecurity models in real-world IoMT settings.
Bar charts are a simple approach when you want to compare the performance of multiple methods on multiple metrics. In this section, we will compare six methods (including our proposed BiGRU/RBWK model) on the WUSTL-EHMS-2020 and ECU-IoHT datasets concerning key metrics: Accuracy, Precision, Recall, F1-Score, and AUC-ROC. Figure 10 shows the performance metrics comparison.
Algorithm 2 shows the pseudocode of the refined black-winged kite algorithm.
These bar charts visually highlight the effectiveness of the BiGRU/RBWK model in tackling both imbalanced and complex IoMT datasets.
To justify the selection of comparative models in our experimental evaluation, we carefully chose each model based on its relevance, performance record, and applicability to intrusion detection tasks in IoMT environments. While Random Forest served as a strong baseline for classification tasks due to its robustness concerning imbalanced datasets, the interpretability of this algorithm was another reason for its selection.
The selection of LSTM was prompted by its effectiveness in modeling sequential data that fits the essence of the temporal nature of network traffic in the IoMT system. The CNN was incorporated as a model that can harvest feature extraction from structured data to test whether spatial patterns in packet-level features can be maximized for threat detection.
Autoencoders are popular unsupervised models primarily used for anomaly detection without labelled attack classes. CNN-LSTM considered hybridization to represent jointly the strength of spatial and temporal feature learning, a special case in complex IoMT traffic analysis. All these aforementioned paradigms, ranging from traditional machine learning to advanced deep learning, ensure comprehensive benchmarking of our proposed BiGRU/RBWK framework against existing state-of-the-art methodologies.
Security analysis against adversarial attacks
This section carried out an empirical security analysis to assess the model’s robustness against typical evasion-style adversarial perturbations that occur in real cyberattacks. In this regard, input perturbations are induced through simulation techniques such as the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) on the WUSTL-EHMS-2020 and ECU-IoHT datasets. Table 13 indicates the robustness evaluation under adversarial perturbations.
As shown in Table 13, despite the levels of adversarial noises applied (ε = 0.01 to ε = 0.1), the BiGRU/RBWK model still yields very high classification accuracies through noise treatment-retainings of 93.2% and 91.5% accurate constructs on the above datasets under ε = 0.1.
This shows that the model can still boast strong detection capabilities against malicious attempts to evade detection via data manipulation. The RFE-SVM-based feature selection is added to ensure robustness because the irrelevant and noisy features are kept out of reach by the adversary. All these facts documented above indicate that the presented framework is sound enough not only to detect existing threats but even resist tricks from adversarial machine learning.
Paired t-test analysis for model comparison
The paired t-test was performed to determine if BiGRU/RBWK mean improvements in performance over baseline models, such as Random Forest, LSTM, CNN, Autoencoder, and CNN-LSTM are statistically significant. The assumption of this test is the normality in distribution of differences in scores across ten independent runs such as in accuracy (see Fig. 11).
The resultant highly statistically significant p-values (< 0.001) for accuracy, precision, recall, and F1-score on both datasets lead to the rejection of the null hypothesis and confirm that the gains are systematic and not random. For instance, on the WUSTL-EHMS-2020 dataset, BiGRU/RBWK improved accuracy over the CNN-LSTM hybrid by three and a half percentage points (95.6% vs. 92.1%), with a 95% confidence interval of [3.2, 3.8] and p < 0.0001.
The same trend can be seen on ECU-IoHT, where BiGRU/RBWK beats LSTM by 7.1% in terms of accuracy (93.8% against 86.7%), with p = 0.0002. These indicate strong evidence in favor of the contention towards consistent and reliable performance of the proposed model through trials. The high t-statistics (e.g., t = 8.76 for accuracy on WUSTL-EHMS-2020) further emphasize the magnitude of improvement, indicating that BiGRU/RBWK framework provides strong and statistically validated enhancement in intrusion detection capability for IoMT systems.
Wilcoxon signed-rank test for non-normal performance distributions
Since model performance distributions were highly skewed and did not meet normality tests (Shapiro-Wilk, p < 0.05), besides parametric analysis using the t-test, we carried out the Wilcoxon signed-rank test, which is a non-parametric method valid for paired samples that have no distributional assumptions.
The test is utilized to check whether the median difference in performances between BiGRU/RBWK and each of the considered baselines is significantly different from zero, with the results proving the superiority of the proposed model across all the comparisons. Figure 12 shows the distribution of the F1-score across 10 runs.
For the ECU-IoHT dataset, Wilcoxon test p-values were 0.002 for BiGRU/RBWK against Random Forest, 0.001 for LSTM, and 0.003 for Autoencoder on the F1-score. The values for the test statistic (W) were generally low (e.g. W = 5 for BiGRU/RBWK vs. CNN-LSTM), indicating that most paired differences favoured the proposed model.
These results strengthen the case for the performance improvements being not just statistically significant but also robust across different distributions of data and types of models. The agreement of results between parametric and non-parametric tests provides further credence to all conclusions and ensures the generalizability of BiGRU/RBWK to different IoMT security situations.
Limitations
However, lots of advantages were explained about the model; there are also limitations identified in the study, even though the proposed BiGRU/RBWK model has shown considerable promise in detecting cyberthreats in IoMT systems. For instance, it is to be noted that the model depends on labeled datasets for training, which in most real-life healthcare environments are either not available or difficult to acquire because of privacy issues and the dynamic nature of cyberattacks.
The performance of the model heavily relies on the quality and representativeness of the labeled data, and any biases or gaps in the training sets would affect its generalization ability. Although preprocessing such as Random Under-Sampling (RUS) and Recursive Feature Elimination (RFE) has been employed to deal with class imbalance and risk of overfitting, the model would still have an uphill task with very small or very imbalanced datasets. These constraints might also interfere with detection accuracy, especially with minority attack classes. Future work will embrace semi-supervised and self-supervised learning strategies to lessen dependency on labeled data, as well as advanced augmentation techniques to diversify the data and improve robustness against overfitting.
Conclusions
This study introduced a new deep learning model, BiGRU/RBWK, that was used to improve the security of IoMT in smart healthcare settings. The model was developed to combine the Bidirectional Gated Recurrent Units with the Refined Black-winged Kite optimization algorithm to develop a superior model in detecting cyberthreats and authentication of data. Deep Feature Synthesis combined with Recursive Feature Elimination with SVM, will provide a strong preprocessing pipeline, reducing overfitting and computational complexity, and the model can be used in real-life applications. BiGRU/RBWK model showed state-of-the-art results on two public datasets, including WUSTL-EHMS-2020 and ECU-IoHT, and performed better in terms of accuracy, precision, recall, F1-score, and AUC-ROC. The analysis of the model, which showed that it can have high detection rates and low false positive rates as demonstrated by the high AUC-ROC scores (0.974 and 0.958) highlights the fact that the model is effective in securing IoMT systems. Also, RFE combined with SVM improves the model interpretability and generalizability that are essential to real-time cybersecurity applications. The future will see the study of federated learning integration with explainable AI as a way to further enhance the protection of privacy and model transparency, making sure that the framework is scalable and flexible in the face of various IoMT settings.
Data availability
The data is available in: [https://github.com/CSCRC-SCREED/ECU-IoHT?tab=readme-ov-file, https://www.cse.wustl.edu/~jain/ehms/index.html].
References
Rani, S., Kataria, A., Kumar, S. & Tiwari, P. Federated learning for secure IoMT-applications in smart healthcare systems: A comprehensive review. Knowl. Based Syst. 274, 110658 (2023).
Saheed, Y. K. & Arowolo, M. O. Efficient cyber attack detection on the internet of medical things-smart environment based on deep recurrent neural network and machine learning algorithms. IEEE Access 9, 161546–161554 (2021).
Samanta, S., Sarkar, A. & Kumari, S. An IoMT data security framework with hyperledger fabric for smart cities. Int. J. Inform. Technol. 16(8), 4875–4886 (2024).
Zhao, Y. et al. Highly sensitive, wearable piezoresistive methylcellulose/chitosan@ MXene aerogel sensor array for real-time monitoring of physiological signals of pilots. Sci. China Mater. 68(2), 542–551 (2025).
Bojjagani, S., Brabin, D., Kumar, K., Sharma, N. K. & Batta, U. Secure privacy-enhanced fast authentication and key management for IoMT-enabled smart healthcare systems. Computing 106(7), 2427–2458 (2024).
Alzubi, J. A., Alzubi, O. A., Singh, A. & Ramachandran, M. Cloud-IIoT-based electronic health record privacy-preserving by CNN and blockchain-enabled federated learning. IEEE Trans. Industr. Inf. 19(1), 1080–1087 (2022).
Saheed, Y. K. & Chukwuere, J. E. CPS-IIoT-P2Attention: explainable privacy-preserving with scaled dot-product attention in cyber physical system-industrial IoT network. IEEE Access 13, 81118–81142 (2025).
Ali, Z. et al. A lightweight and secure authentication scheme for remote monitoring of patients in IoMT. IEEE Access. 12, 73004–73020 (2024).
Wazid, M., Nautiyal, S., Das, A. K., Shetty, S. & Islam, S. H. A secure authenticated healthcare data analysis mechanism in IoMT-Enabled healthcare. Secur. Priv. 8(1), e468 (2025).
Alzubi, O. A., Alzubi, J. A., Shankar, K. & Gupta, D. Blockchain and artificial intelligence enabled privacy-preserving medical data transmission in internet of things. Trans. Emerg. Telecommun. Technol. 32(12), e4360 (2021).
Alzubi, J. A., Alzubi, O. A., Qiqieh, I. & Singh, A. A blended deep learning intrusion detection framework for consumable edge-centric Iomt industry. IEEE Trans. Consum. Electron. 70(1), 2049–2057 (2024).
Saheed, Y. K., Omole, A. I. & Sabit, M. O. GA-mADAM-IIoT: A new lightweight threats detection in the industrial IoT via genetic algorithm with attention mechanism and LSTM on multivariate time series sensor data. Sens. Int. 6, 100297 (2025).
Alzubi, O. A. A deep learning-based Frechet and dirichlet model for intrusion detection in IWSN. J. Intell. Fuzzy Syst. 42(2), 873–883 (2022).
Xu, G. et al. Anonymity-enhanced sequential multi-signer ring signature for secure medical data sharing in IoMT. IEEE Trans. Inf. Forensics Secur. 20, 5647–5662 (2025).
Saheed, Y. K., Misra, S., CPS-IoT-PPDNN & A new explainable privacy preserving DNN for resilient anomaly detection in cyber-physical systems-enabled iot networks. Chaos Solitons Fractals. 191, 115939 (2025).
Qiqieh, I., Alzubi, J. & Alzubi, O. DNA cryptography based security framework for health-cloud data. Computing 107(1), 35 (2025).
Alzubi, O. A., Alzubi, J. A., Qiqieh, I. & Al-Zoubi, A. M. An IoT intrusion detection approach based on salp swarm and artificial neural network. Int. J. Netw. Manag. 35 (1), e2296 (2025).
Saheed, Y. K. & Chukwuere, J. E. Xaiensembletl-iov: A new explainable artificial intelligence ensemble transfer learning for zero-day botnet attack detection in the internet of vehicles. Results Eng. 24, 103171 (2024).
Saheed, Y. K., Abdulganiyu, O. H. & Ait Tchakoucht, T. Modified genetic algorithm and fine-tuned long short-term memory network for intrusion detection in the internet of things networks with edge capabilities. Appl. Soft Comput. 155, 111434 (2024).
Ding, Z. et al. A supervised explainable machine learning model for perioperative neurocognitive disorder in Liver-Transplantation patients and external validation on the medical information Mart for intensive care IV database: retrospective study. J. Med. Internet. Res. 27, e55046 (2025).
Li, J. et al. Outlier detection using iterative adaptive mini-minimum spanning tree generation with applications on medical data. Front. Physiol. 14, 1233341 (2023).
Awotunde, J. B., Folorunso, S. O., Ajagbe, S. A., Garg, J. & Ajamu, G. J. AiIoMT: IoMT-based system-enabled artificial intelligence for enhanced smart healthcare systems. in Machine Learning for Critical Internet of Medical Things 229–254 (2022).
Manickam, P. et al. Artificial intelligence (AI) and internet of medical things (IoMT) assisted biomedical systems for intelligent healthcare. Biosensors 12(8), 562 (2022).
Dahan, F. et al. A smart IoMT based architecture for E-healthcare patient monitoring system using artificial intelligence algorithms. Front. Physiol. 14, 1125952 (2023).
Abdulraheem, M. et al. Artificial intelligence of medical things for medical information systems privacy and security. in Handbook of Security and Privacy of AI-Enabled Healthcare Systems and Internet of Medical Things 63–96 (CRC, 2024).
Chen, P. Y. et al. Information security and artificial Intelligence–Assisted diagnosis in an internet of medical thing system (IoMTS). IEEE Access. 12, 9757–9775 (2024).
Dakic, P. et al. Intrusion detection using metaheuristic optimization within IoT/IIoT systems and software of autonomous vehicles. Sci. Rep. 14(1), 22884 (2024).
Lameesa, A., Hoque, M., Alam, M. S. B., Ahmed, S. F. & Gandomi, A. H. Role of metaheuristic algorithms in healthcare: A comprehensive investigation across clinical diagnosis, medical imaging, operations management, and public health. J. Comput. Des. Eng. 11(3), 223–247 (2024).
Hady, A. G., Salman, T., Unal, D. & Jain., R. WUSTL EHMS 2020 Dataset for Internet of Medical Things (IoMT) Cybersecurity Research. Available: https://www.cse.wustl.edu/~jain/ehms/ftp/wustl-ehms-2020_with_attacks_categories.csv
E.-I. A. D. f. A. C. i. I. o. H. Things. ECU-IoHT. Available: https://github.com/CSCRC-SCREED/ECU-IoHT/
Arras, L., Montavon, G., Müller, K. R. & Samek, W. Explaining recurrent neural network predictions in sentiment analysis, arXiv preprint arXiv:1706.07206 (2017).
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint arXiv:1412.3555 (2014).
Hossain, M. T. et al. Cyberattacks classification on internet of medical things using information gain feature selection and machine learning. in 2024 Advances in Science and Engineering Technology International Conferences (ASET) 1–10 (IEEE, 2024).
Faruqui, N. et al. SafetyMed: A novel IoMT intrusion detection system using CNN-LSTM hybridization. Electronics 12(17), 3541 (2023).
Olawale, O. P. & Ebadinezhad, S. The detection of abnormal behavior in healthcare IoT using IDS, CNN, and SVM. in Mobile Computing and Sustainable Informatics: Proceedings of ICMCSI 2023 375–394 (Springer, 2023).
Abdiwi, F. G. Hybrid machine learning and blockchain technology for early detection of cyberattacks in healthcare systems. Int. J. Saf. Secur. Eng. 14, 6 (2024).
Author information
Authors and Affiliations
Contributions
X.L. wrote the main manuscript text. X.L. reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, X. Towards advanced AI-based solutions for securing IoMT in smart health information systems. Sci Rep 16, 2079 (2026). https://doi.org/10.1038/s41598-025-31850-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-31850-0














