Introduction

In today’s digital landscape, the complexity and proliferation of cyberattacks have made cybersecurity the highest priority in several industries. Conventional security methods, once effective in safeguarding networks and data, currently face challenges in mitigating and identifying advanced cyber threats1. Cyber-attacks became increasingly varied, ranging from phishing, distributed denial-of-service (DDoS), and ransomware attacks to innovative, persistent attacks able to avoid traditional security mechanisms2. The rapid growth of these attacks has exposed substantial restrictions in classical mitigation and detection methods, which often depend on human oversight and static rules. Thus, there is a vital need for scalable, adaptive, and effective solutions to handle the scale and knowledge of modern cyber-attacks3. Cyber security employs AI models to mitigate and defend against possible cyber-attacks. But, AI methodologies like the absence of trust and black-box factor. How well-known domain specialists and consumers can trust and grasp these technique functionalities are the main factors affecting their effective implementation in cyber-attack searching4. Stakeholders request more explicability and transparency, while these black-box methods create substantial forecasts. In cyber-attack hunting, where professionals want extensively more information from the technique than a modest dual result for their examination, validations assist the essential AI methodologies output. Modern approaches are introduced to aid users in improving knowledge, the inside workings of AI techniques, and their results5. Consequently, XAI originated as a set of methods and processes that permit users to trust and understand the results made by AI models. It is utilized to elucidate the functioning of AI methods, predictable effects, and possible biases6. Recently, BC has been extensively applied in several domains to support data privacy and trust, permitting users to connect and share information while maintaining trust, integrity, and better transparency. It contains the possibility of improving the method’s safety by presenting data authentication, integrity, and threat intelligence (TI). Cyber threat detection and BC can cooperate to encourage integrity and transparency and protect against cyber threats. XAI is vital for cyber-attack searching to recognize proper trust and for their effectual management7. ML methods, such as a black box, are generally handled and have trust issues. Then, a few trust factors might be appended over several explainable methods. The explainable techniques are located once the ML steps as they fine-tune the results of the ML stage and advance multiple types of trust in the results8. Since a method can present incorrect results, which are standard for an attacker, the security authority wants a few stronger causes to recognize that. For these conditions, it should be superior to go for the XAI method. In XAI methods, SHAP and LIME are effective and utilized to explain predictions9. In the last decade, the implementation of AI in cybersecurity has developed extensively with applications ranging from intrusion detection systems (IDS) to malware investigation and user behaviour analytics. However, it is efficient; conventional AI methods lack the interpretability needed for significant fields like cybersecurity. This restriction has stimulated investigation into XAI, which aims to make AI techniques understandable without sacrificing performance10.

This manuscript presents a novel Two-Tier Optimization Algorithms for Cyberthreat Detection and Mitigation Using Explainable Artificial Intelligence with Recurrent Neural Networks (TTOCDM-XAIRNN) methodology. The main intention of the TTOCDM-XAIRNN algorithm framework is to improve the detection and mitigation of cyber threats in dynamic environments. The BC technology is utilized for safe inter-cluster data transmission methods. The presented TTOCDM-XAIRNN model initially employs data preprocessing with a linear scaling normalization (LSN) model to standardize the input features for improved model performance. The pelican optimization algorithm (POA) model is employed for dimensionality reduction to identify the most relevant data attributes. Furthermore, the hybrid attention-based long short-term memory and bidirectional gated recurrent unit (A-LSTM-BiGRU) technique is utilized for cyber threat detection. Finally, the earthworm optimization algorithm (EOA) is implemented to tune the hyperparameters and ensure the model’s parameters are optimized for superior detection and mitigation capabilities. Finally, XAI with SHAP presents transparent insights into model decisions, ensuring high performance and a clear understanding of the threat mitigation process. A wide range of simulation studies of the TTOCDM-XAIRNN approach is examined under the NSLKDD and CICIDS 2017 datasets.

  • The TTOCDM-XAIRNN model applies LSN to standardize feature values across the dataset, ensuring uniformity and improving learning dynamics. This preprocessing step improves training stability and accelerates convergence. It is crucial in preparing high-quality input data for the downstream detection model.

  • The TTOCDM-XAIRNN method utilizes the POA technique to mitigate high-dimensional feature space, improve computational efficiency, and reduce noise. This step chooses the most relevant attributes, improving the model’s focus on critical threat indicators. Removing redundant or less informative features significantly improves detection performance.

  • The TTOCDM-XAIRNN methodology integrates a hybrid A-LSTM-BiGRU model that effectively combines attention-based LSTM with bidirectional GRU to capture temporal dependencies and contextual data. This fusion improves the detection of complex cyber threats by utilizing the complementary strengths of both networks. It improves accuracy and robustness in detecting malicious activities over time.

  • The TTOCDM-XAIRNN approach implements the EOA model to fine-tune hyperparameters, ensuring an optimal configuration for learning. This process improves the model’s accuracy by preventing overfitting and improving generalization to unseen data. It plays a significant role in maximizing overall detection performance.

  • The TTOCDM-XAIRNN model incorporates XAI through SHAP values to present clear and transparent insights into its decision-making process. This enhances interpretability, allowing users to comprehend feature impacts on predictions. It builds trust and supports informed decision-making in cyber threat analysis.

  • The TTOCDM-XAIRNN model is novel in integrating multiple advanced techniques, such as LSN, POA, hybrid A-LSTM-BiGRU, EOA, and SHAP, into a unified intelligent framework. This integration effectively combines data preprocessing, optimization, DL, and explainability to attain superior cyber threat detection. This ensures both high performance and transparency, addressing gaps in existing research. This innovative process presents a comprehensive solution for accurate and interpretable threat mitigation.

Literature of works

Mohitkar and Lakshmi11 studied XAI for Transparent Cyber-Risk Assessment and Decision-Making and discovered the incorporation of XAI to improve the effectiveness and transparency of cyber-risk assessments. This paper highlights that conventional techniques often lack transparency, resulting in trust and responsibility in cybersecurity contexts. This work establishes enhanced decision-making procedures by utilizing XAI models, enabling organizations to handle vulnerabilities and cyber-attacks better. Prity et al.12 investigated the efficiency of ML in cyber-attack recognition aimed at classifying benign and dangerous objects in digital environments. This work verified 4 ML models. This method also utilized k-fold cross-validation to separate the data set and physically tune the hyperparameters to enhance model implementation, minimizing variance and bias. Greene and Burton13 utilized a qualitative method, comprising an inclusive literature survey, to examine internal desires and address ethical deliberations like privacy, fairness, and bias. This work inspects the effect of organizational culture on the endorsement and efficiency of AI-based solutions. It highlights the importance of end-user trust in AI-driven security alerts. The requirement of cultural adaptation and organizational readiness for the effectual AI implementation in system security. Biswas et al.14 developed a hybrid structure utilizing XAI models. The initial stage calculates the possibility of specialized phishers in a population of related attackers with different capabilities. The subsequent stage analyses the likelihood of phishing threats upon a firm regardless of whether it has to invest in IT security and accept regulatory stages. Next, it classifies genuine and phishing URLs utilizing several ML-based classifiers. Andrae et al.15 introduced a structure that incorporates ML models. It is a data-driven method to evaluate cyber risks quantitatively. Additionally, multiple ML techniques- especially supervised ML- have been developed. Generally, this chapter makes a novel contribution to utilizing ML to create standard and strong cyber risk management methods that could be valued by practitioners in diverse industries. Bahadoripour et al.16 developed a deep-federated multimodal method for cyber threat recognition in ICS settings. This method encloses three main elements for training the cyber-attack recognition technique. To improve comprehensibility, the SHAP approach is applied to give visions into model outcomes, assisting decision-making by cyber security specialists. Zangana and Mustafa17 presented progressive AI-driven models for image denoising, highlighting the role of DL, CNN, and generative techniques to improve image clarity. By exploiting AI, these approaches adaptably decrease sound while conserving crucial image attributes, enhancing efficiency or reliability in digital forensic procedures. This inspection comprises a comparative analysis of conventional versus AI-based denoising methods, measuring their efficacy and applicability in forensic and cybersecurity settings.

Aditya et al.18 developed an incorporation of XAI with ordinary differential deep recurrent unit neural network (OD-DRUNN) for a proactive defence mechanism in cyber-security for organizations. The input data is preprocessed, and then the features are removed. After the feature extraction, the user behaviours and traffic patterns are examined utilizing the minimum parameterized muller spanning tree (MPMST) method. Afterwards, the attained user behaviour, traffic designs, and the removed attributes are given as input to the OD-DRUNN classifier. Hence, the output is displayed with XAI. Markkandeyan et al.19 proposed a hybrid DL model incorporating adaptive TensorFlow and improved particle swarm optimization (IPSO). They enhanced LSTM (E-LSTM) for accurate malware and software piracy detection in IoT environments. Kumari20 proposed the next-gen IoT security using polar codes-based cryptography for malware defence through quantum self-attention neural network (NGIT-PCBC-MD-QSANN) technique, which integrates advanced preprocessing (MAACF), feature extraction (SCDT), and cryptographic key optimization (MOFDOA). QSANN and PCBC for detection and security enhancement, respectively. Karthic and Kumar21 developed an IDS using an enhanced conditional random field (CRF) for feature selection and an optimized hybrid deep neural network (OHDNN) integrating CNN and LSTM, with parameters tuned by adaptive golden eagle optimization (AGEO) to improve detection accuracy in wireless networks. Ahamed and Karim22 developed a cascaded IDS (CIDS) combining a one-class support vector machine (OC-SVM) model for anomaly detection and decision tree (DT)-based signature IDS (SIDS) for accurate classification of known and zero-day attacks. Kumar and Kumar23 proposed hybridizing grey wolf and lion optimization with the random forest (GWO-LOARF) method, integrated with an ML and IDS. Sundaram et al.24 presented an effective IDS using a novel ant lion optimizer (ALO) combined with a gated recurrent unit (GRU) model to detect attacks in WSN. Suhana, Karthic, and Yuvaraj25 proposed an effective IDS using blended linear discriminant analysis (BLDA) for feature extraction and RF classifier for accurate intrusion detection in networks. Paul et al.26 improved cybersecurity in deep web environments using federated learning (FL) and graph-based analysis with a hybrid web crawler and ontology-based scoring for accurate threat detection and data protection. Karthic, Manoj Kumar, and Senthil Prakash27 presented an enhanced IDS using GWO for feature selection and LSTM networks for accurate attack classification. Sundaram et al.28 proposed an efficient IDS for IoT networks using incorporated feature selection with recursive feature elimination (RFE) and information gain (IG), followed by cascaded LSTM for accurate attack classification.

Despite advances in IDS and cybersecurity using ML, XAI, and hybrid models, limitations still exist. Several models concentrate on specific datasets or attack types, lacking adaptability across diverse and growing cyber threats. The technique shows inefficiency due to high false positives, model complexity, and insufficient handling of zero-day attacks. Feature selection methods sometimes fail to balance accuracy and computational efficiency, particularly in resource-constrained IoT and WSN environments. Furthermore, some DL models’ transparency and interpretability remain underexplored, affecting trust and practical deployment. The research gap is developing robust, scalable, and interpretable IDS frameworks that effectively address dynamic threats while optimizing performance across heterogeneous data sources and environments.

Proposed method

This study develops a novel TTOCDM-XAIRNN methodology. The main intention of the TTOCDM-XAIRNN methodology is to improve the detection and mitigation of cyber threats in dynamic environments. To attain that, the proposed TTOCDM-XAIRNN model contains various stages, such as BC technology, data preprocessing, feature selection, classification, hyperparameter tuning, and XAI. The complete workflow of the TTOCDM-XAIRNN model is given in Fig. 1.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Overall workflow of TTOCDM-XAIRNN model.

BC technology

BC is utilized for secure inter-cluster data transmission. In general, BC implies a collection of blocks. The only block in this block includes four segmented data regarding the transaction (Ethereum, Bitcoin), the Hash value of the recent block, the Timestamp, and the preceding block. Additionally, the BC was limited as distributed, and the normal digital ledger was applied to save the transaction data at different points. Hence, once an attacker attempts to originate information, it is challenging as each block obtains the previous block’s cryptographical value. At this point, all transactions were attained using the cryptographic hash value proved by each miner. It is captured with a similar value as a finished ledger and has blocks of all transactions. The decentralized storage is other sources from BC, and a higher count of data was protected and associated with the recent block for the previous block using smart contract code. SiacoinDB, Swarm, LitecoinDB, BigchainDB, Interplanetary File System (IPFS), MoneroDB, and other factors are currently performed on the decentralized database.

Data preprocessing

At first, the presented TTOCDM-XAIRNN model employs data preprocessing with LSN to standardize the input features for improved model performance29. This model was chosen because of its capability to effectively handle noisy, high-dimensional data common in cyber threat datasets. This method effectually captures local patterns and preserves crucial structural relationships, thus enhancing feature quality and mitigating data loss. This technique improves model robustness and detection accuracy by concentrating on relevant data characteristics. Additionally, LSN adapts well to dynamic and heterogeneous IoT environments, outperforming standard methods such as principal component analysis (PCA) or simple normalization that may overlook subtle but critical discrepancies in data. Its capability to improve downstream learning tasks makes it a superior choice for preprocessing in the cyber threat detection process.

LSN is a data preprocessing system that normalizes feature values within the range of [0, 1]. In the framework of cyber threat recognition, LSN aids in regularizing raw input data, like system logs or network traffic features, ensuring that every feature donates similarly to the model’s performance. Measuring the data decreases bias caused by fluctuating feature extents, permitting more precise recognition of subtle cyberattack patterns. This model improves the efficacy of ML systems by averting overfitting and enhancing convergence throughout training. In cybersecurity applications, LSN certifies that the model manages features steadily, which leads to more trustworthy threat recognition and mitigation.

POA-based feature selection process

For dimensionality reduction, the POA is utilized to identify the most relevant data attributes30. This model is chosen for its capability to explore massive and intrinsic search spaces to detect the most appropriate features for intrusion detection. Compared to conventional methods like filter or wrapper techniques, POA utilizes bio-inspired metaheuristic strategies that balance exploration and exploitation, mitigating the risk of local optima. This results in a more optimal subset of features, improving model accuracy and mitigating computational overhead. Additionally, the adaptability of the technique to dynamic datasets and its faster convergence rate make it particularly appropriate for real-time IoT environments, outperforming other heuristic algorithms such as PSO or GA in both speed and precision.

The pelican is a large bird that captures and consumes targets due to its longer mouth and large neck pouch. They rarely eat crabs, frogs, and turtles. When extremely starving, they will consume shellfish. They regularly search in crowds. Fish inevitably move to shoal water and can quickly grasp their food once they extend their wings over the water’s surface. As it catches a fish, more water enters the pelican’s beak, moving forward before consuming it to spit the extra water. Pelicans’ approach to intellectual procedure and hunting habits led to birds turning into expert predators. Modelling the designated model gives the basic idea for the POA design.

Stage1: Initialization.

All members of the population characterize a potential solution using population-based methods. All populations in the group propose values for variables in the optimizer problem based on where they possess the exploration area. They are randomly distributed using the bottom and top limits defined in Eq. (1).

$$\:{u}_{ij}={t}_{j}+rand\cdot\:\left({x}_{j}-{t}_{j}\right),i=\text{1,2},\dots\:,O,j=\text{1,2},\dots\:,q\:\:$$
(1)

Wherein \(\:{u}_{i,j}\) signifies \(\:{j}^{th}\) variable provided \(\:{wi}^{th}\) candidate solution, \(\:O\) represents the size of the population, \(\:q\) symbolizes problem variable counts\(\:\:and\) means random amount at interval \(\:\left[\text{0,1}\right],{\:t}_{i}\) and \(\:{x}_{i}\) indicate lower and upper limits of \(\:{i}^{th}\) and \(\:{j}^{th}\) problematic variables.

The pelican population members in the calculated POA were identified using a population matrix. The matrix’s rows signify promising solutions, and the columns imitate the suggested values for the problem variables.

$$\:U={\left[\begin{array}{l}{U}_{1}\\\:{U}_{i}\\\:{U}_{O}\end{array}\right]}_{O\times\:q}={\left[\begin{array}{lllll}{u}_{\text{1,1}}&\:\dots\:&\:{u}_{1,j}&\:\dots\:&\:{u}_{1,q}\\\:\vdots&\:\ddots\:&\:\vdots&\:&\:\vdots\\\:{u}_{i,1}&\:\dots\:&\:{u}_{i,j}&\:\dots\:&\:{u}_{i,q}\\\:\vdots&\:&\:\vdots&\:\ddots\:&\:\vdots\\\:{u}_{O,1}&\:\dots\:&\:{u}_{o,j}&\:\dots\:&\:{u}_{O,q}\end{array}\right]}_{O\times\:q}\:\:\:\:\:\:$$
(2)

Now \(\:U\) signifies pelican matrix, \(\:{U}_{i}\) signifies \(\:{i}^{th}\) pelican. Initialization is accompanied by random generation.

Stage2: Random generation.

Afterwards, initialization and input parameters were made random-wise to receive precise outcomes regarding its clear hyperparameter setting. The following stage evaluates the fitness function (FF) for the weighted parameter optimizer.

Stage3: Assessment of FF.

An initialized valuation produces an arbitrary solution. The suitable function for identifying the precise outcome is assessed using values to improve the weighted parameter \(\:K\). It is formulated in Eq. (3).

$$\:FF\:=op\tau\:imizing\left[K\right]\:\:$$
(3)

As established in Eq. (4) below, the exploration stage emulates the FF assessment.

Stage4: Exploration Stage (Moving near Prey).

The pelicans approach the prey once they spot it in the early stage. This pelican’s approach was used to inspect the search space, and the model can explore and uncover numerous search space positions. The prey’s location in the exploration area is produced randomly, a significant feature of POA. Equation (4) mathematically mimics the previous principles and the pelican model for travelling near the prey location.

$$\:{u}_{i,j}^{{R}_{1}}=\left\{\begin{array}{l}{u}_{i,j}+rand\cdot\:\left({r}_{j}-D\cdot\:{u}_{i,j}\right),\:{S}_{r}<{S}_{i};\\\:{u}_{i,j}+rand\cdot\:\left({u}_{i,j}-{r}_{j}\right),\:else,\end{array} \right.\:$$
(4)

Whereas \(\:{u}_{i,j}^{{R}_{1}}\) signifies the novel location of \(\:{the\:i}^{th}\) pelican at \(\:{j}^{th}\) size relying on stage 1, \(\:D\) represents a randomly generated number equal to 1 or 2, \(\:{r}_{j}\) signifies prey location in \(\:{j}^{th}\) size, and \(\:{S}_{r}\) signifies the value of an objective function.

When the objective function value increases, the novel location for a pelican is accepted. The method of end migration to non-optimal positions in the description is called effective apprising. Equation (5) used a representative procedure.

$$\:{U}_{i}=\left\{\begin{array}{l}{U}_{i}^{{R}_{1}},{\:S}_{i}^{{R}_{1}}<{S}_{i};\\\:{\:U}_{i},\:else,\end{array}\:\:\right.$$
(5)

Here, \(\:{U}_{i}^{{R}_{1}}\) signifies the novel location of \(\:{i}^{th}\) pelican, \(\:{and\:S}_{i}^{{R}_{1}}\) signifies the value of the goal function, which relies on stage 1.

Stage5: Exploitation Stage (Winging on Water Surface).

The pelicans extend their wings immediately to capture the fish higher and seize it in their neck bag. Show-off behaviour causes the proposed POA to be incorporated into better searching positions. The process increases exploitation and power possible. The procedure requires exploration points in the neighbouring pelican’s location to reach an improving solution. Equation (6) mathematically present this searching behaviour of pelicans.

$$\:{U}_{i,j}^{{R}_{2}}={U}_{i,j}+P\left(1-\frac{l}{L}\right)\cdot\:\left(2\cdot\:rand-1\right).{U}_{i,j},$$
(6)

\(\:{U}_{i.j}^{{R}_{2}}\) signifies the new location of \(\:{i}^{th}\) pelican at \(\:{j}^{th}\) size, which relies on stage 2. \(\:P\) signifies constant, equal to 0.2, and \(\:P\left(1-\frac{l}{L}\right)\) signifies adjacent radius \(\:{U}_{ij}\). In contrast, \(\:l\) indicates repetition counter, and \(\:L\) means several repetitions.

Now, effective apprising is applied to accept or reject novel pelican position, modelled Eq. (7).

$$\:{U}_{i}=\left\{\begin{array}{l}{U}_{i}^{{R}_{2}},{\:S}_{i}^{{R}_{2}}<{S}_{i};\\\:{\:U}_{i},\:else,\end{array}\:\right.$$
(7)

\(\:{U}_{i}^{{R}_{2}}\) signifies the \(\:{i}^{th}\) pelican’s new position, and \(\:{S}_{i}^{{R}_{2}}\) signifies the objective function value at stage 2.

Stage6: Termination Criteria.

The POA model assists in improving the weighted parameter \(\:K\) value or repeats stage 3 until it meets the ending condition \(\:u=u+1.\) Finally, the POA model finds accurate results with superior accuracy and lower computation time with error.

The fitness function (FF) employed in the POA, which is proposed to balance the number of chosen features in every solution (least) and the accuracy of classification (highest) gained by employing these selected features, Eq. (8), denotes the FF for assessing solutions.

$$\:Fitness=\alpha\:{\gamma\:}_{R}\left(D\right)+\beta\:\frac{\left|R\right|}{\left|C\right|}$$
(8)

While \(\:{\gamma\:}_{R}\left(D\right)\) denotes an assumed classifier’s classification rate of error. \(\:\left|R\right|\:\)refers to the cardinality of the chosen subset, \(\:\left|C\right|\) represents the total quantity of features α, and \(\:\beta\:\) represents the dual parameters to the significance of classifier excellence and subset length. [1,0] and \(\:\beta\:=1-\alpha\:.\)

A-LSTM-BiGRU-based classification model

Furthermore, the hybrid A-LSTM-BiGRU technique is employed for cyber threat detection31. This technique is chosen for its ability to capture both temporal and contextual dependencies in sequential data, which is crucial for detecting advanced cyber threats. By integrating attention-based LSTM with bidirectional GRU, the model benefits from enhanced memory retention and bidirectional sequence learning, enabling it to detect patterns that simpler models like CNN or standalone LSTM may miss. The attention mechanism (AM) enhances focus on critical input features, improving interpretability and precision. Compared to conventional DL methods, this hybrid structure achieves better accuracy, faster convergence, and improved detection of complex, evolving attack behaviours in IoT environments. Figure 2 represents the structure of the A-LSTM-BiGRU technique.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Structure of A-LSTM-BiGRU technique.

The proposed model merged numerous LSTM and Bi-GRU layers with an AM for enhancing feature learning. It began with dual-stacked LSTM layers, which are followed by dual Bi‐GRU layers. An attention layer was employed, and outputs were linked. Lastly, dense layers were employed for classification using a sigmoid activation function. The formulations employed in the structure define the core processes within the LSTM, Bi‐GRU, and attention layers. The initial layer of LSTM is answerable to capture time-based dependencies in input data. It is formulated in Eqs. (9)- (12).

$$\:{f}_{t}=\sigma\:\left({w}_{f}\cdot\:\left[{h}_{t-1},\:{x}_{t}\right]+{b}_{f}\right)$$
(9)
$$\:{i}_{t}=\sigma\:\left({w}_{i}\cdot\:\left[{h}_{t-1},\:{x}_{t}\right]+{b}_{i}\right)\:$$
(10)
$$\:{\stackrel{\sim}{C}}_{\text{t}}=\text{t}\text{a}\text{n}\text{h}\left({w}_{C}\cdot\:\left[{h}_{t-1},\:{x}_{t}\right]+{b}_{C}\right)\:$$
(11)
$$\:{C}_{t}={f}_{t}\text{*}{C}_{t-1}+{i}_{t}\text{*}{\stackrel{\sim}{C}}_{t}\:$$
(12)

The Bi-GRU layer is formulated in Eqs. (13)- (16), which improves the model’s capability for capturing dependencies in both forward and backwards directions.

$$\:{r}_{t}=\sigma\:\left({w}_{r}\cdot\:\left[{h}_{t-1},\:{x}_{t}\right]+{b}_{r}\right)\:$$
(13)
$$\:{z}_{t}=\sigma\:\left({w}_{z}\cdot\:\left[{h}_{t-1},\:{x}_{t}\right]+{b}_{z}\right)\:$$
(14)
$$\:{\stackrel{\sim}{h}}_{\text{t}}=\text{t}\text{a}\text{n}\text{h}\left(w\cdot\:\left[{r}_{t}\text{*}{h}_{t-1},\:{x}_{t}\right]+b\right)\:$$
(15)
$$\:{h}_{t}=\left(1-{z}_{t}\right)\text{*}{h}_{t-1}+{z}_{t}\text{*}{\stackrel{\sim}{h}}_{t}\:\:\:\:\:\:\:\:\:\:\:\:$$
(16)

Equations (17) and (18) mean attention layer, which aids the model in concentrating on the most significant parts of an input sequence by allocating weights to the hidden layer (HL).

$$\:{\alpha\:}_{t}=\frac{\:\text{e}\text{x}\text{p}\:\left({e}_{t}\right)}{{\sum\:}_{k=1}^{T}\text{e}\text{x}\text{p}\left({e}_{k}\right)},\:{e}_{t}=\text{t}\text{a}\text{n}\text{h}({W}_{a}\cdot\:{h}_{t})\:$$
(17)
$$\:{c}_{t}={\sum\:}_{t=1}^{T}{\alpha\:}_{t}{h}_{t}\:$$
(18)

The final layer gives the dual classification outcome utilizing a sigmoid activation function, classifying an input. It is formulated in Eq. (19).

$$\:y=\sigma\:\left(wx+b\right)\:\:$$
(19)

EOA-based hyperparameter tuning process

In addition, the EOA is employed to fine-tune the hyperparameters, ensuring the model’s parameters are optimized for superior detection and mitigation capabilities32. This model is chosen for its exceptional efficiency in balancing exploration and exploitation in high-dimensional search spaces, which is significant for optimizing DL methods in cyber threat detection. The model replicates the natural behaviour of earthworms to adaptively update solutions, allowing it to escape local optima more efficiently than conventional methods such as grid or random search. The technique presents faster convergence and better precision when tuning critical parameters, enhancing model performance and generalization. Its lightweight and flexible nature makes it appropriate for dynamic IoT settings, outperforming other metaheuristics such as GA and PSO in tuning efficiency and detection accuracy.

Stimulated by the earthworm’s food pursuit activity, the EOA is a nature-based optimization. EOA simulations of the earthworm’s foraging and movement behaviour and all earthworms serve as promising solutions in the searching region. The model converges near the best solution by continually varying the earthworm locations, which depends upon their current positions and the most identified location in the population. Let \(\:\varTheta\:\in\:{\mathbb{R}}^{d}\) represent the hyperparameter vector to be fine-tuned, whereas \(\:d\) denotes the dimensionality of the hyperparameter area. The aim is to discover the optimum collection of hyperparameters \(\:{\varTheta\:}^{\text{*}}\), which reduces the loss function \(\:L\left(\varTheta\:\right)\) of the ML method.

Initialization: The earthworm’s primary population, demonstrating candidate solutions, is arbitrarily generated:

$$\:{\varTheta\:}_{i}^{\left(0\right)}={\varTheta\:}_{\text{m}\text{i}\text{n}}+{r}_{j}\left({\varTheta\:}_{\text{m}\text{a}\text{x}}-{\varTheta\:}_{\text{m}\text{i}\text{n}}\right)\:$$
(20)

Here, \(\:{\varTheta\:}_{i}^{\left(0\right)}\) denotes the primary location of the \(\:i\:th\) earthworm, \(\:{r}_{i}\) denotes the randomly generated vector, and \(\:{\varTheta\:}_{\text{m}\text{a}\text{x}}\) and \(\:{\varTheta\:}_{\text{m}\text{i}\text{n}}\) individually represent the upper and lower limits of the hyperparameter area.

Fitness Assessment: The fitness of all earthworms is assessed depending on the loss function \(\:L\left({\varTheta\:}_{i}\right)\) of the ML approach trained with the hyperparameters \(\:{\varTheta\:}_{i}\):

$$\:Fitness\:\left({\varTheta\:}_{i}\right)=-L\left({\varTheta\:}_{i}\right)\:\:$$
(21)

Position Upgrade: The position is updated according to their present site, the well-known location, and the behaviour of the movement exhibited by EOA:

$$\:{\varTheta\:}_{i}^{(t+1)}={\varTheta\:}_{i}^{\left(t\right)}+\alpha\:{r}_{1}\left({\varTheta\:}_{best}-{\varTheta\:}_{i}^{\left(t\right)}\right)+\beta\:{r}_{2}\left({\varTheta\:}_{i}^{\left(t\right)}-{\varTheta\:}_{mean}^{\left(t\right)}\right)\:\:$$
(22)

Whereas \(\:{\varTheta\:}_{i}^{\left(t\right)}\) refers to the location of \(\:ith\) earthworm at the \(\:tth\) iteration, \(\:{\varTheta\:}_{best}\) stands for the position of the top-performing earthworm, \(\:{\varTheta\:}_{mean}^{\left(t\right)}\) denotes the mean position of every earthworm, \(\:\alpha\:,\) and \(\:\beta\:\) represents learning features directing the inspiration of the average and the optimal position, \(\:{r}_{1}\) and \(\:{r}_{2}\) symbolize randomly generated vectors.

Termination: The model iterates till an ending condition is encountered like a maximal iteration counts \(\:{T}_{\text{m}\text{a}\text{x}}\) or convergence to a smaller tolerance \(\:e\):

$$\:\parallel{\varTheta\:}_{best}^{\left(t+1\right)}-{\varTheta\:}_{best}^{\left(t\right)}\parallel<e\:or\:t>{T}_{\text{m}\text{a}\text{x}}\:\:$$
(23)

Finally, in the optimization procedure, the optimum collection of hyperparameters \(\:{\varTheta\:}^{\text{*}}\) is gained:

$$\:{\varTheta\:}^{\text{*}}={\varTheta\:}_{best\:}\:$$
(24)

Application to Model Training: The enhanced hyperparameters \(\:{\varTheta\:}^{\text{*}}\) are then applied for training the last ML approach on the minimized feature set \(\:{X}_{top}\) gained from Stage 2:

$$\:Trained\:Model\:=Train\left({X}_{top},\:y,\:{\varTheta\:}^{\text{*}}\right)\:$$
(25)

The EOA originates an FF to accomplish the amended classification performance. It establishes a positive quantity for indicating the superior outcome of the candidate solution. At this point, the classification error rate is lessened to FF. Its mathematical computation is given in Eq. (26).

$$\:fitness\left({x}_{i}\right)=ClassifierErrorRate\left({x}_{i}\right) \:=\frac{no.\:of\:misclassified\:instances}{Total\:no.\:of\:instances}\times\:100\:$$
(26)

SHAP-based XAI

Finally, XAI with SHAP presents transparent insights into model decisions, ensuring high performance and a clear understanding of the threat mitigation process. SHAP was employed to improve the interpretive ability of the method forecast. SHAP allocates significant scores to every feature in the technique, which signifies its contribution to the final prediction33. This is vital in medical applications; while understanding the decision-making process, this technique is essential for medical validation. SHAP values are determined as follows:

$$\:{\varphi\:}_{i}={\sum\:}_{s\subseteq\:N\left\{i\right\}}\frac{\left|S\right|!\left(\right|N|-|S|-1)!}{\left|N\right|!}\left[f\left(S\cup\:\left\{i\right\}\right)-f\left(s\right)\right]\:$$
(27)

While \(\:N\) signifies the set of every feature, \(\:S\) represents the feature subset without feature \(\:i\), and \(\:f\left(\right)\) denotes the method forecast depends on the subset \(\:S\). This approach determines the marginal contribution of every feature, presenting a strong elucidation of how the technique creates its predictions.

Experimental analysis

The experimental analysis of the TTOCDM-XAIRNN technique is examined under dual datasets such as NSLKDD34 and CICIDS 201735. These two datasets contain 100,000 samples under normal and anomaly classes, as depicted in Table 1.

The NSLKDD dataset originally contains 41 features, of which 25 relevant features are selected for intrusion detection in this study. These features are carefully chosen to retain essential information while reducing dimensionality and improving model performance and efficiency. The selected features from the NSLKDD dataset include:

duration, protocol_type, service, flag, src_bytes, dst_bytes, land, wrong_fragment, urgent, hot, num_failed_logins, logged_in, num_compromised, root_shell, su_attempted, num_root, num_file_creations, num_shells, num_access_files, num_outbound_cmds, is_host_login, is_guest_login, count, srv_count, serror_rate, srv_serror_rate, rerror_rate, srv_rerror_rate, same_srv_rate, diff_srv_rate, srv_diff_host_rate, dst_host_count, dst_host_srv_count, dst_host_same_srv_rate, dst_host_diff_srv_rate, dst_host_same_src_port_rate, dst_host_srv_diff_host_rate, dst_host_serror_rate, dst_host_srv_serror_rate, dst_host_rerror_rate, dst_host_srv_rerror_rate.

The CICIDS 2017 dataset initially comprises 78 features. For effective training and analysis, 51 informative and non-redundant features are chosen, focusing on capturing traffic patterns and anomalies indicative of intrusions. These selected features ensure comprehensive flow, packet, and statistical data coverage. The 51 chosen features include:

Flow Duration, Total Fwd Packets, Total Backward Packets, Total Length of Fwd Packets, Total Length of Bwd Packets, Fwd Packet Length Max, Fwd Packet Length Min, Fwd Packet Length Mean, Fwd Packet Length Std, Bwd Packet Length Max, Bwd Packet Length Min, Bwd Packet Length Mean, Bwd Packet Length Std, Flow Bytes/s, Flow Packets/s, Flow IAT Mean, Flow IAT Std, Flow IAT Max, Flow IAT Min, Fwd IAT Total, Fwd IAT Mean, Fwd IAT Std, Fwd IAT Max, Fwd IAT Min, Bwd IAT Total, Bwd IAT Mean, Bwd IAT Std, Bwd IAT Max, Bwd IAT Min, Fwd PSH Flags, Bwd PSH Flags, Fwd URG Flags, Bwd URG Flags, Fwd Header Length, Bwd Header Length, Min Packet Length, Max Packet Length, Packet Length Mean, Packet Length Std, Packet Length Variance, Average Packet Size, Subflow Fwd Packets, Subflow Fwd Bytes, Subflow Bwd Packets, Subflow Bwd Bytes, Init_Win_bytes_forward, Init_Win_bytes_backward, act_data_pkt_fwd, min_seg_size_forward, Active Mean, Idle Mean, Idle Std.

Table 1 Details of NSLKDD and CICIDS 2017 datasets.

Figure 3 displays the classifier results of the TTOCDM-XAIRNN method under the NSLKDD dataset. Figure 3a and b depicts the confusion matrices through precise classification and identification of dual classes below 70%TRPH and 30%TSPH. Figure 3c shows the PR outcome, which notified superior performance over all class labels. Finally, Fig. 3d represents the ROC outcome, which signifies skilful solutions with great ROC values for different class labels.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

NSLKDD dataset (a-b) 70% TRPH and 30% TSPH of confusion matrices and (c-d) curves of PR and ROC.

The cyber threat detection of the TTOCDM-XAIRNN model under the NSLKDD dataset is illustrated in Table 2; Fig. 4. The average value revealed that the TTOCDM-XAIRNN system obtained effectual detection of dual classes. According to 70%TRPH, the TTOCDM-XAIRNN system attains an average \(\:acc{u}_{y}\) of 98.31%, \(\:pre{c}_{n}\) of 98.32%, \(\:rec{a}_{l}\) of 98.31%, \(\:{F}_{means}\:\)of 98.31%, and \(\:{G}_{measure}\:\)of 98.31%. Furthermore, on 30%TSPH, the TTOCDM-XAIRNN approach gains an average \(\:acc{u}_{y}\) of 98.34%, \(\:pre{c}_{n}\) of 98.34%, \(\:rec{a}_{l}\) of 98.34%, \(\:{F}_{means}\:\)of 98.34%, and \(\:{G}_{measure}\:\)of 98.34%.

Table 2 Cyber threat detection of AIDCNN-CDMKO method under the NSLKDD dataset.
Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Average of TTOCDM-XAIRNN method on NSLKDD dataset.

In Fig. 5, the training (TRA) \(\:acc{u}_{y}\) and validation (VAL) \(\:acc{u}_{y}\) performances of the TTOCDM-XAIRNN technique under the NSLKDD dataset are depicted. The values of \(\:acc{u}_{y}\:\)are computed across a period of 0–25 epochs. The figure emphasized that the values of TRA and VAL \(\:acc{u}_{y}\) showcase an increasing trend, indicating the capacity of the TTOCDM-XAIRNN approach through maximum performance across numerous repetitions. In addition, the TRA and VAL \(\:acc{u}_{y}\) values remain close through the epochs, notifying diminished overfitting and expressing the superior performance of the TTOCDM-XAIRNN approach, which guarantees steady calculation on unseen samples.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

\(\:Acc{u}_{y}\) curve of TTOCDM-XAIRNN method on NSLKDD dataset.

Figure 6 shows the TRA loss (TRALOS) and VAL loss (VALLOS) graph of the TTOCDM-XAIRNN technique under the NSLKDD dataset. The loss values are computed throughout 0–25 epochs. The values of TRALOS and VALLOS expose a diminishing trend, which indicates the competency of the TTOCDM-XAIRNN system in equalizing a tradeoff between data fitting and generalization. The sequential decrease in loss values indicates the superior performance of the TTOCDM-XAIRNN method, and the prediction results are tuned gradually.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Loss curve of TTOCDM-XAIRNN method on NSLKDD dataset.

Table 3; Fig. 7 examine the comparative results of the TTOCDM-XAIRNN approach under the NSLKDD dataset19,20,36,37,38. The performances underscored that the BiLSTM, AESMOTE, DAE-DNN, RNN-XGBoost, HC-DTTWSVM, WISARD, IPSO, E-LSTM, and QSANN models exhibited poorer performance. Meanwhile, the AE-RF technique has attained closer values with \(\:acc{u}_{y}\), \(\:pre{c}_{n}\), \(\:rec{a}_{l}\), and \(\:{F}_{means}\:of\:\)97.62%, 97.35%, 97.79%, and 97.3%, respectively. At the same time, the proposed TTOCDM-XAIRNN technique reported enhanced performance with the highest \(\:acc{u}_{y}\), \(\:pre{c}_{n}\), \(\:rec{a}_{l}\), and \(\:{F}_{means}\:of\:\)98.34%, 98.34%, 98.34%, and 98.34%, respectively.

Table 3 Comparative analysis of TTOCDM-XAIRNN approach under the NSLKDD dataset19,20,36,37,38.
Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Comparative analysis of TTOCDM-XAIRNN approach under the NSLKDD dataset.

Table 4; Fig. 8 illustrates the computational time (CT) evaluation of the TTOCDM-XAIRNN technique with the existing models. The QSANN method recorded a CT of 7.08 s, showing faster processing unlike models RNN-XGBoost and IPSO with 14.71 s and 12.72 s. The BiLSTM method illustrated a CT of 10.94 s, while AESMOTE and DAE-DNN exhibited 10.09 and 10.42 s, respectively. More efficient methods like the WISARD and E-LSTM achieved CTs of 8.03 and 8.26 s, while AE-RF and HC-DTTWSVM followed closely with 8.38 and 9.51 s. The TTOCDM-XAIRNN method illustrated the lowest CT at just 5.38 s, highlighting its superior computational efficiency.

Table 4 CT evaluation of TTOCDM-XAIRNN technique under the NSLKDD dataset.
Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

CT evaluation of TTOCDM-XAIRNN technique under the NSLKDD dataset.

The ablation study of the TTOCDM-XAIRNN methodology is highlighted in Table 5; Fig. 9. The LSN achieved an \(\:acc{u}_{y}\) of 95.82%, \(\:pre{c}_{n}\) of 95.41%, \(\:rec{a}_{l}\) of 95.90%, and \(\:{F}_{means}\) of 95.63%. When examining POA, improvements are evident with an \(\:acc{u}_{y}\) of 96.36%, \(\:pre{c}_{n}\) of 96.14%, \(\:rec{a}_{l}\) of 96.49%, and \(\:{F}_{means}\) of 96.35%. The A-LSTM-BiGRU model without the EOA enhancement additionally advanced the results, achieving an \(\:acc{u}_{y}\) of 97.09%, \(\:pre{c}_{n}\) of 96.92%, \(\:rec{a}_{l}\) of 97.01%, and \(\:{F}_{means}\) of 96.92%. Integrating EOA into the A-LSTM-BiGRU method resulted in additional performance gains, reaching an \(\:acc{u}_{y}\) of 97.61%, \(\:pre{c}_{n}\) of 97.68%, \(\:rec{a}_{l}\) of 97.72%, and \(\:{F}_{means}\) of 97.58%. The most notable outcome is observed with the TTOCDM-XAIRNN method, which delivered superior results across all metrics, attaining an \(\:acc{u}_{y}\), \(\:pre{c}_{n}\), \(\:rec{a}_{l}\), and \(\:{F}_{means}\) of 98.34%, indicating a well-balanced and highly effective approach in detecting intrusions.

Table 5 Result analysis of the ablation study of TTOCDM-XAIRNN methodology under the NSLKDD dataset.
Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Result analysis of the ablation study of TTOCDM-XAIRNN methodology under the NSLKDD dataset.

Figure 10 presents the classifier results of the TTOCDM-XAIRNN model under the CICIDS 2017 dataset. Figure 10a and b demonstrates the confusion matrices through precise classification and identification of dual-class labels below 70%TRPH and 30%TSPH. Figure 10c illustrates the PR examination, which notified superior performance through all class labels. Eventually, Fig. 10d specifies the ROC examination, which reveals skilful solutions with great ROC values for dissimilar class labels.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

CICIDS 2017 dataset (a-b) 70% TRPH and 30% TSPH of confusion matrices and (c-d) curves of PR and ROC.

The cyber threat detection of the TTOCDM-XAIRNN model under the CICIDS 2017 dataset is exposed in Table 6; Figs. 11 and 12. The performances indicated that the TTOCDM-XAIRNN technique attained efficient detection of dual classes. On 70%TRPH, the TTOCDM-XAIRNN technique gains an \(\:acc{u}_{y}\) of 98.87%, \(\:pre{c}_{n}\) of 98.87%, \(\:rec{a}_{l}\) of 98.87%, \(\:{F}_{means}\:\)of 98.87%, and \(\:{G}_{measure}\:\)of 98.87%. Moreover, on 30%TSPH, the TTOCDM-XAIRNN technique reaches an \(\:acc{u}_{y}\) of 98.85%, \(\:pre{c}_{n}\) of 98.85%, \(\:rec{a}_{l}\) of 98.85%, \(\:{F}_{means}\:\)of 98.85%, and \(\:{G}_{measure}\:\)of 98.85%.

Table 6 Cyber threat detection of TTOCDM-XAIRNN method under the CICIDS 2017 dataset.
Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.
Full size image

Average of TTOCDM-XAIRNN method on CICIDS 2017 dataset under 70%TRPH.

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.
Full size image

Average of TTOCDM-XAIRNN method on CICIDS 2017 dataset under 30%TSPH.

In Fig. 13, the TRA \(\:acc{u}_{y}\) and VAL \(\:acc{u}_{y}\) performances of the TTOCDM-XAIRNN method under the CICIDS 2017 dataset are exemplified. The values of \(\:acc{u}_{y}\:\)are computed across a period of 0–25 epochs. The figure underscored that the values of TRA and VAL \(\:acc{u}_{y}\) showcase an increasing trend, indicating the proficiency of the TTOCDM-XAIRNN method through maximum performance across numerous repetitions. This is followed by the TRA and VAL \(\:acc{u}_{y}\) values remaining close through the epochs, notifying lesser overfitting and showing improved performance of the TTOCDM-XAIRNN approach, assuring reliable calculation on unseen samples.

Figure 14 depicts the TRALOS and VALLOS graph of the TTOCDM-XAIRNN technique under the CICIDS 2017 dataset. The loss values are computed throughout 0–25 epochs. The values of TRALOS and VALLOS represent a diminishing trend, which indicates the proficiency of the TTOCDM-XAIRNN approach in corresponding to a tradeoff between data fitting and generalization. The sequential weakening in values of loss and securities improves the TTOCDM-XAIRNN approach’s performance and tunes the calculation results.

Fig. 13
Fig. 13The alternative text for this image may have been generated using AI.
Full size image

\(\:Acc{u}_{y}\) curve of TTOCDM-XAIRNN technique on CICIDS 2017 dataset.

Fig. 14
Fig. 14The alternative text for this image may have been generated using AI.
Full size image

Loss curve of TTOCDM-XAIRNN method on CICIDS 2017 dataset.

Table 7; Fig. 15 inspect the comparison results of the TTOCDM-XAIRNN method under the CICIDS 2017 dataset22,23,36,37,38. The performances underscored that FURIA, Forest-PA, BLDSVM, BBPM, BLR, GSAE, LIB-SVM, OC-SVM, DT, and GWO-LOARF methodologies accomplished diminished solutions. At the same time, the proposed TTOCDM-XAIRNN technique got enhanced performance with the highest \(\:acc{u}_{y}\), \(\:pre{c}_{n}\), \(\:rec{a}_{l}\), and \(\:{F}_{means}\) of 98.87%, 98.87%, 98.87%, and 98.87%, respectively.

Table 7 Comparative analysis of TTOCDM-XAIRNN approach under the CICIDS 2017 dataset22,23,36,37,38.
Fig. 15
Fig. 15The alternative text for this image may have been generated using AI.
Full size image

Comparative analysis of TTOCDM-XAIRNN approach under the CICIDS 2017 dataset.

Table 8; Fig. 16 depicts the CT evaluation of the TTOCDM-XAIRNN method with the existing approaches. The GWO-LOARF approach recorded the highest CT of 24.85 s, closely followed by the BLR model at 23.61 s and BLDSVM at 21.88 s. Methods like Forest-PA and FURIA showed moderate CTs of 19.03 and 18.73 s, while the GSAE method also maintained a similar range with 18.51 s. Conventional models such as OC-SVM and DT performed slightly better with CTs of 14.99 and 11.74 s, respectively. LIB-SVM illustrated an even more efficient result at 10.43 s. The BBPM model outperformed many with a CT of 11.56 s, but the TTOCDM-XAIRNN model specified the lowest CT of just 8.87 s, establishing it as the most computationally efficient among other approaches.

Table 8 CT evaluation of TTOCDM-XAIRNN method under the CICIDS 2017 dataset.
Fig. 16
Fig. 16The alternative text for this image may have been generated using AI.
Full size image

CT evaluation of TTOCDM-XAIRNN method under the CICIDS 2017 dataset.

The ablation study of the TTOCDM-XAIRNN approach is shown in Table 9; Fig. 17. The LSN method achieved an \(\:acc{u}_{y}\) of 96.46%, \(\:pre{c}_{n}\) of 96.25%, \(\:rec{a}_{l}\) of 96.24%, and \(\:{F}_{means}\) of 96.30%. POA showed noticeable enhancement with an \(\:acc{u}_{y}\) of 97.19%, \(\:pre{c}_{n}\) of 96.88%, \(\:rec{a}_{l}\) of 97.02%, and \(\:{F}_{means}\) of 96.86%. The A-LSTM-BiGRU model without EOA, attained an \(\:acc{u}_{y}\) of 97.80%, \(\:pre{c}_{n}\) of 97.68%, \(\:rec{a}_{l}\) of 97.64%, and \(\:{F}_{means}\) of 97.60%. When EOA was incorporated into the A-LSTM-BiGRU framework, the performance rose to an \(\:acc{u}_{y}\) of 98.36%, \(\:pre{c}_{n}\) of 98.24%, \(\:rec{a}_{l}\) of 98.28%, and \(\:{F}_{means}\) of 98.24%. The TTOCDM-XAIRNN method attained the highest performance across all metrics. This consistent in results highlights the efficiency of each added component and underscores the robustness of the TTOCDM-XAIRNN model in handling complex network intrusion scenarios.

Table 9 Result analysis of the ablation study of TTOCDM-XAIRNN approach under the CICIDS 2017 dataset.
Fig. 17
Fig. 17The alternative text for this image may have been generated using AI.
Full size image

Result analysis of the ablation study of TTOCDM-XAIRNN approach under the CICIDS 2017 dataset.

Conclusion

This study presents a novel TTOCDM-XAIRNN methodology. The main intention of the TTOCDM-XAIRNN methodology is to improve the detection and mitigation of cyber threats in dynamic environments. The BC technology is utilized for safe inter-cluster data transmission methods. At first, the presented TTOCDM-XAIRNN model employs data preprocessing with LSN to standardize the input features for improved model performance. The POA is utilized to identify the most relevant data attributes for dimensionality reduction. Furthermore, the hybrid A-LSTM-BiGRU technique is employed for cyber threat detection. In addition, the EOA is used to fine-tune the hyperparameters, ensuring the model’s parameters are optimized for superior detection and mitigation capabilities. Finally, XAI with SHAP presents transparent insights into model decisions, ensuring high performance and a clear understanding of the threat mitigation process. A wide range of simulation studies of the TTOCDM-XAIRNN approach is examined under the NSLKDD and CICIDS 2017 datasets. The comparison study of the TTOCDM-XAIRNN approach portrayed a superior accuracy value of 98.34% and 98.87% under dual datasets. The limitations of the TTOCDM-XAIRNN approach comprise restricted generalization to highly dynamic and unseen attack patterns due to dataset-specific training. The model shows less efficiency in resource-constrained edge devices where computational efficiency is critical. Real-time detection under high traffic loads remains a challenge, potentially affecting latency. Additionally, the model may be vulnerable to evasion tactics. Interpretability for non-technical users is limited despite analytical tools. Future works may improve adaptability through continual learning, mitigating dependency on labelled data via self-supervised methods, expanding multilingual and multimodal threat detection capabilities, and validating performance in live, heterogeneous IoT environments.