An advanced fire detection system for assisting visually challenged people using recurrent neural network and sea-horse optimizer algorithm

Al-Wesabi, Fahd N.; Alharbi, Abeer A. K.; Yaseen, Ishfaq

doi:10.1038/s41598-025-91829-9

Download PDF

Article
Open access
Published: 01 July 2025

An advanced fire detection system for assisting visually challenged people using recurrent neural network and sea-horse optimizer algorithm

Fahd N. Al-Wesabi¹,
Abeer A. K. Alharbi² &
Ishfaq Yaseen^3,4

Scientific Reports volume 15, Article number: 21493 (2025) Cite this article

572 Accesses
1 Citations
Metrics details

Subjects

Abstract

The developing elderly population undergoes a high level of eyesight and mental impairment, which frequently results in a defeat of independence. That kind of person should do vital daily tasks like heating and cooking, with methods and devices intended for visually unaffected persons, which does not consider the requirements of people with blind and intellectual impairment. Innovative technology needs the proper techniques for perceiving fires as rapidly as possible to avert damages. Initial fire recognition and notification models deliver fire inhibition and protection information to visually challenged individuals in an emergency if a fire happens indoors. Using an early fire recognition and warning model for blind individuals can decrease the number of victims, the number of losses, and, most essentially, early deaths. Recently, the growth of the fire recognition approach using artificial intelligence (AI) has advanced in helping blind people. This manuscript presents a Smart Fire Detection System for Assisting the Blind Using Attention Mechanism-Driven Recurrent Neural Network and Seahorse Optimizer Algorithm (SFDAB-ARNNSHO). The main intention of the SFDAB-ARNNSHO method is to detect and classify fire for blind people. To achieve this, the proposed SFDAB-ARNNSHO model performs image pre-processing by utilizing the sobel filtering (SF) model to remove noise in input data. Furthermore, the fusion of feature extraction comprises three methods, EfficientNetB7, CapsNet, and ShuffleNetV2. Furthermore, the SFDAB-ARNNSHO model performs fire detection and classification using stacked two-layer bidirectional long short-term memory with attention mechanism (SBiLSTM-AM) technique. Finally, the parameter tuning of the SBiLSTM-AM method is accomplished by implementing the seahorse optimizer (SHO) technique. The simulation validation of the SFDAB-ARNNSHO methodology is examined under the fire detection dataset, and the outcomes are measured using various measures. The performance validation of the SFDAB-ARNNSHO methodology portrayed a superior accuracy value of 99.30% over existing models under diverse measures.

Optimal features assisted multi-attention fusion for robust fire recognition in adverse conditions

Article Open access 04 July 2025

Evaluation of machine learning and deep learning algorithms for fire prediction in Southeast Asia

Article Open access 29 May 2025

Federated learning based fire detection method using local MobileNet

Article Open access 05 December 2024

Introduction

Disorders in the visual system are the reason for blindness and visual impairment, which can prevent persons from performing housework and hinder their travel, work, studies, and participation in sports¹. The global population is rising quickly, and an increasing number of individuals are alive well into old age, increasing the number of visually impaired and blind (VIB) people. Blind and elderly individuals face various difficulties in executing everyday actions, including environmental awareness, housework, travelling, work, and because of physical and sports practice, cognitive decline, and sensory². This condition wants significant attention because the amount of individuals with visual impairment is ready to rise highly in the future decades. These numbers are mainly crucial in advanced countries as progress in medical care methods has allowed a long life. Various solutions have been found to address such difficulties, and assistive and software technology remains advanced to aid visually impaired individuals³. Applying an early fire recognition and notification method for VIB persons can decrease the count of victims, property damage, and, most significantly, the number of premature deaths. The VIB should take extra precautions regarding safety and health, particularly fire safety⁴. Some assistive gadgets offer improved protection to blind people compared to self-initiated life safety procedures. Efficient fire detection is unique to VIB individuals and can assist in maintaining an individual’s life in a fire emergency. Initial fire recognition is vital, as it directly influences human safety and the environment⁵. An entirely automatic fire recognition and notification method was advanced for BVI individuals to offer safety information in emergency and fire prevention⁶. Conventionally, fires have been recognized utilizing sensory methods that identify alterations in temperature or smoke in indoor surroundings. Nearly every fire detection method nowadays has built-in sensors, and thus, the methods are substantially based on spatial dispersion and reliable sensors⁷. Large space coverage in outdoor or indoor surroundings is impractical in a sensor-based fire recognition method due to the necessity for nearby regular distribution sensors; therefore, such a method has a higher false alarm rate⁸. AI-based fire recognition is the prevailing method to identify flames and notify building occupants in diverse indoor surroundings. It is highly distinctive, is not based on shape, size, or colour, and is firm to lighting changes⁹. To overwhelm the problematic machine learning (ML), fire detection, deep learning (DL) based computer vision methods are applied to define fire detection when outdoor or indoor fire detection methods cannot. Researchers and scientists select ML and DL methods; meanwhile, they present enhanced predictions compared to other methods¹⁰. Both methods are applied to ensure automated feature extraction by training complex features to obtain more informative data demonstration.

This manuscript presents a Smart Fire Detection System for Assisting the Blind Using Attention Mechanism-Driven Recurrent Neural Network and Seahorse Optimizer Algorithm (SFDAB-ARNNSHO). The main intention of the SFDAB-ARNNSHO method is to detect and classify fire for blind people. To achieve this, the proposed SFDAB-ARNNSHO model performs image pre-processing by utilizing the sobel filtering (SF) model to remove noise in input data. Furthermore, the fusion of feature extraction comprises three methods, EfficientNetB7, CapsNet, and ShuffleNetV2. Furthermore, the SFDAB-ARNNSHO model performs fire detection and classification using stacked two-layer bidirectional long short-term memory with attention mechanism (SBiLSTM-AM) technique. Finally, the parameter tuning of the SBiLSTM-AM method is accomplished by implementing the seahorse optimizer (SHO) technique. The simulation validation of the SFDAB-ARNNSHO methodology is examined under the fire detection dataset, and the outcomes are measured using various measures. The key contribution of the SFDAB-ARNNSHO methodology is listed below.

The SFDAB-ARNNSHO model integrates SF-based pre-processing to improve image quality by detecting edges, significantly improving fire detection accuracy. This step ensures improved feature extraction by emphasizing crucial boundaries and structures in the images. By refining image details, the model attains more precise classification and detection of fire events.
The SFDAB-ARNNSHO method incorporates feature extraction from three powerful methods, EfficientNetB7, CapsNet, and ShuffleNetV2, utilizing their merits to improve feature learning. This integration allows the model to capture both global and local features effectively. By harnessing these techniques, the model attains a more robust and comprehensive understanding of the input data for fire detection.
The SFDAB-ARNNSHO approach employs the SBiLSTM-AM method to improve fire detection and classification. This architecture processes sequential data, capturing temporal dependencies and improving model performance. The attention mechanism additionally optimizes focus on critical features, ensuring accurate classification.
The SFDAB-ARNNSHO technique utilizes the SHO methodology to optimize the model’s performance by efficiently tuning hyperparameters. It also enhances the fire detection system’s accuracy and efficiency. By fine-tuning the parameters, SHO ensures that the model operates at its best performance for real-world applications.
The novelty of the SFDAB-ARNNSHO approach is in integrating advanced feature extraction methods, such as EfficientNetB7, CapsNet, and ShuffleNetV2, with advanced fire detection techniques. The incorporation of the SBiLSTM-AM model allows for enhanced sequential data processing. Additionally, implementing the SHO fine-tunes the model, improving the accuracy and efficiency of fire detection in complex environments.

Literature survey

Kumar et al.¹¹ present an innovative method incorporating face detection technology and navigation abilities. Utilizing the power of the Raspberry Pi single-board computer, this method offers an efficient solution and is cost-effective. The face recognition part of this method employs computer vision and advanced ML models. The navigation feature of the technique utilizes a combination of fire detection, obstacle detection, panic detection, and face detection using a Raspberry Pi camera module and fire sensor button. Singh et al.¹² projected the progressive blind stick by incorporating an obstacle detector in front of the consumer, a fire detector in the neighbouring environment, and IoT incorporation for real-world information conveyed to the website. In¹³, the design of a model focused on supporting those who are visually impaired or blind people. Blind people might attain individual mobility using a blind stick that helps them navigate. Persons with vision impairments often depend on external assistants, like computer devices, trained canines, or human beings. This paper attained this target by involving ultrasonic and buzzer sensors to help the consumer overcome these difficulties. The recommended method will utilize ultrasonic sensors to offer accurate guidance to the consumer on element position. In¹⁴, the smart blind stick defined in this research was designed to assist VIPs in maintaining their safety and mobility. The smart blind stick is a reasonably priced and cutting-edge gadget that associates many communication modules and sensors to help users possibly spot risks and avoid them. Tesfaye¹⁵ aimed to implement and design electronic travel support for visually impaired pedestrians, employing a GSM module, infrared technology, and ultrasonic sensors. The smart cane incorporates IR technology and ultrasonic sensors to identify obstacles utilizing IR signals and ultrasonic waves. In addition, this method is furnished with a GSM module, enabling it to send SMS notifications to selected contacts in case of emergencies. Pressing an emergency button sends a message to a particular phone number. Oureshi et al.¹⁶ apply smart devices to make daily activities easier for each category of blind individuals. These smart devices can use picture processing and AI to identify diverse objects, colours, and faces. Abuelmakarem et al.¹⁷ project an intelligent cane for VIB individuals incorporated with IoT. These smart gadgets promote confidence and independence for VIB people.

In¹⁸, the concept of the smart stick intends to offer smart electronic support to the visually impaired. The method utilizes the Voice Module and Arduino UNO to deliver real-world assistance and artificial vision. This paper mainly targets the visually impaired, who cannot move around individually. This method contains ultrasonic sensors, and feedback is received by voice. The system-wide objective is to offer visually impaired individuals by delivering data about scenarios around dynamic and static objects in their surroundings. Xie et al.¹⁹ propose an AIoT-integrated Digital Twin for real-time fire monitoring and forecasting in multi-floor buildings, using the AutoDecoder Long Short-term Memory Neural Network (ADLSTM-Fire) model to predict fire development. Tejani et al.²⁰ present the 2-archive Multi-Objective Cuckoo Search (MOCS2arc) approach, which improves optimization by minimizing mass and compliance in truss structures and ZDT test functions. Dzeng, Fan, and Tian-Lin²¹ developed a dynamic-threshold collision alert system for construction equipment, improving accuracy by considering object types, orientation, and distances while reducing false alarms. Nonut et al.²² propose a metaheuristic-based method for system identification of small-scale fixed-wing UAVs, dividing the aerodynamic model into longitudinal and lateral dynamics. Aye et al.²³ introduce a multi-fidelity, multi-objective surrogate-assisted optimization for airfoil shape, maximizing lift-to-drag ratio with geometry constraints. It utilizes Computational Fluid Dynamic (CFD) and XFoil for high and low-fidelity simulations and improves the surrogate model with infill sampling. Duggi, Rafiei, and Salehi²⁴ present benchmark datasets for three application types: DNN inference, ML inference, and video transcoding across diverse clouds. Cai et al.²⁵ investigate the effect of nitrogen injection into a closed tunnel on fire-fighting effectiveness, concentrating on the impact on oxygen concentration, combustion, and smoke production. He et al.²⁶ present a real-time, scale-adaptive visual tracking method based on Best Buddies Similarity (BBS), designed to handle nonrigid deformation and perspective changes in bionic robots for improved environmental perception and movement control. Kiamansouri²⁷ emphasizes the critical role of programming in advancing renewable energy solutions, focusing on optimizing energy systems, enabling smart grids, and driving the transition to sustainable energy in Iran and globally.

Sun et al.²⁸ review the Underwater Vehicle-Manipulator System’s (UVMS) development, examining its challenges, limitations, and future directions. Sun et al.²⁹ propose a service function chain (SFC) deployment optimization algorithm using breadth-first search (BFS) to find the shortest path and prioritize paths with fewer hops for deployment. Performance is compared with the greedy and simulated annealing (G-SA) algorithms. Cui et al.³⁰ present an autonomous navigation framework for intelligent wheelchairs, utilizing multi-sensor integration and hierarchical cost maps for precise path planning, obstacle avoidance, and real-time safety in dynamic outdoor environments. Wang et al.³¹ present a robotic teleoperation system integrating augmented reality (AR) and robotic arm operations to enable telekinetic control. Zhao et al.³² introduce Causal Intervention Visual Navigation (CIVN), incorporating deep reinforcement learning (DRL) with causal intervention through Causal Attention. This approach enhances visual navigation by addressing confounding effects, enhancing representation quality and reducing navigation issues. Qiao et al.³³ examine the tourism experiences of wheelchair users, using embodiment theory to explore their experiences across three stages: body appearance, presence, and departure, and provide recommendations for enhancing accessibility for disabled tourists. He et al.³⁴ introduce a Semantic-Group Textual Learning (SGTL) and Vision-guided Knowledge Transfer (VGKT) module, integrating semantic-based text grouping with vision-guided attention for extracting relevant textual features. The relational knowledge transfer adapts visual cues to improve textual representation, enhancing the alignment between vision and language. Fan, Lei, and Yang³⁵ present a one-stage object detection framework using mixed data augmentation, a novel backbone enhancement strategy, and shape-aware loss to enhance detection accuracy, particularly for small and irregularly shaped targets. Zheng et al.³⁶ studied the relationship between coverage criteria and DNN fairness across diverse models and datasets. Gu et al.³⁷ introduce the Siamese Manhattan LSTM-SNP (SiMaLSTM-SNP) approach for sentence recordering (SR), integrating Word2vec, 10-layer Attention, and a Siamese LSTM-SNP structure. It utilizes multi-head self-attention to capture text associations and calculates the relatedness score with Manhattan distance. Ding et al.³⁸ introduce dialogue emotion recognition (DialogueINAB), a neural network model for emotion recognition in conversations, utilizing attitude behaviour theory. It utilizes crossmodal transformers to simulate the interaction of interlocutors’ attitudes and speech behaviour for emotion detection.

The limitations of the existing studies comprise the reliance on conventional object detection and navigation methods, which may face difficulty with dynamic, complex environments, especially in scenarios with self-obstruction or varying object movements. While several approaches focus on specific domains like fire detection, blind assistance, or wheelchair navigation, there is limited integration across these domains, affecting the development of a more holistic solution. Furthermore, although advanced sensors and ML methods have improved accuracy, the models often encounter real-time processing and scalability challenges, specifically in unpredictable real-world conditions. Additionally, fairness in DNN models remains a significant challenge, with limited exploration of the correlation between coverage criteria and fairness metrics. There is also a gap in applying causal interventions to address confounding effects in tasks like visual navigation, and the integration of knowledge transfer techniques to improve model performance is underexplored. Lastly, incorporating renewable energy systems is critical, but integrating smart grids and system optimization is still in the early stages, needing additional research to scale effectually.

The proposed methodology

This manuscript presents a novel SFDAB-ARNNSHO methodology. The main intention of the SFDAB-ARNNSHO methodology is to detect and classify fire for blind people. It encompasses four major steps: image pre-processing, a fusion of feature extractors, fire detection using SBiLSTM-AM, and an SHO-based parameter optimizer process. Figure 1 defines the entire procedure of the SFDAB-ARNNSHO model.

Stage I: image pre-processing

Initially, the proposed SFDAB-ARNNSHO model performs the image pre-processing by utilizing SF to remove noise in input data³⁹. This model is chosen due to its simplicity and efficiency in edge detection. It emphasizes areas of rapid intensity change, which helps detect the boundaries of objects like fire, which often have sharp contrasts. Unlike more complex methods, SF is computationally effectual and can be easily integrated into real-time systems, making it ideal for applications like fire detection. Its capability to highlight edges assists in mitigating the noise and improves the visibility of crucial features, resulting in improved detection accuracy. Furthermore, SF is less prone to overfitting due to more complex filters, ensuring stable and reliable performance in diverse environments. These advantages make it an appropriate choice over other techniques, such as Gaussian or Laplacian filters, which may not be as effective in edge localization or computationally efficient.

SF is an influential image processing model employed to perceive edges by emphasizing regions of major strength variations in an image. Smart fire recognition methods mainly classify fire-related features like smoke edges, flame contours, or hotspots. The technique removes vital data about latent fire hazards by employing SF to picture or thermal camera inputs. This filtered data improves the accuracy of recognition approaches, decreasing false positives and enhancing response times. To help the blind, the system interprets the processed data into available alerts, like vibrations or audio signals, certifying appropriate warnings. This combination of SF with adaptive outputs makes the system trustworthy and complete for visually challenged individuals.

Stage II: fusion of feature extraction

Furthermore, the fusion of the feature extractor adopts three methods EfficientNetB7, CapsNet, and ShuffleNetV2. This hybrid model provides a robust incorporation for feature extraction due to their complementary strengths. EfficientNetB7, known for its high accuracy and efficiency, utilizes a scaling method that optimizes the network’s depth and width, making it highly effective for complex feature extraction in massive datasets. CapsNet, with its capsule network architecture, outperforms in recognizing spatial hierarchies and preserving the relationship between parts of an object, which improves the robustness of the model to discrepancies in input data. ShuffleNetV2, designed for lightweight mobile applications, balances computational efficiency and high performance, making it ideal for real-time fire detection. Integrating these three methods benefits the model from diverse features, ensuring more comprehensive and robust feature learning related to a single process. This fusion allows the model to handle diverse challenges such as scale variation, rotation, and noise, making it more adaptable and accurate in detecting fire in diverse environments.

EfficientNetB7 classifier

Owing to the higher improvement in the network depth and width, convolution layers, deep CNN (DCNN) structures are typically overloaded, making a structure computationally costly and cooperating with system efficiency⁴⁰. There are exchanges between network efficiency and accuracy. The authors presented a family of EfficientNet sequences, namely EfficientNetB0-B7, as a backbone architecture that keeps outshining numerous DCNN-based architectures, like Inception-V2, ResNet, Inception-V3, $\:ResNet50$, and DenseNet. This challenges standard scaling methods utilized by previous studies, which randomly improve the system’s width, resolution, and depth to enhance the generalizability. Compound scaling emerged from balancing the system dimensions of depth $\:d$, resolution $\:r$, and width $\:w$, by scaling it by an infinite ratio as represented in Eq. (1).

$$\:d=\alpha\:{\varnothing},w=\beta\:{\varnothing},\:r=\gamma\:{\varnothing}\:$$

(1)

Such that$\:\:\alpha\:.\beta\:2.y2\approx\:2$ while $\:\alpha\:\ge\:1,\beta\:\ge\:1,y\ge\:1.\alpha\:,\beta\:$ and $\:y$ values are described by the grid searching method. A user’s definable parameters describe the increase in computing sources to the system is identified as $\:{\varnothing}$. Flops of the convolutional networking process are equal to $\:w2$, $\:r2,\:$and $\:d$. Fops should be doubled when the network depth is doubled. Simultaneously, fops should be improved 4 times when the width and resolution are doubled. The rise in fops is depending the relation $\:(\alpha\:.\beta\:2.y2){\varnothing}$, so complete fops are detailed by $\:2{\varnothing}$ for new values. The EfficientNet architecture contains stem blocks, which follow the 7 blocks and final layers. All blocks in EfficientNet comprise the different module counts, and the number of modules rises as it benefits from EfficientNet-B0-B7. It comprises variable depth and parameters. EffcientNetB0 is the simplest version of EfficientNet using 5. 237 layers and$\:\:3M$ parameters, while EffcientNet-B7 contains 66 M and 813 layers. EfficientNet architecture uses MB $\:Conv$ layers related to MnasNet and MobileNet-V2. In the meantime, the layer of normalization is pre-existing within the stem layer. Thus, no other image normalization is required as a pre-processing phase, and thus, it captures an input image inside the 0255 range. Five differences of pretrained EfficientNet, namely EfficientNet B0‐B4, maintain the classification. The disorders for selecting EfficientNet difference are according to dissimilar variables: dataset size, the resource available for the network parameter, assessment and model training, model depth, and batch size.

CapsNet model

The CapsNet model contains four main parts: the initial caps, the standard convolutional, the input, and the output caps layers⁴¹. The resultant capsule counts in the resultant caps layer are equivalent to the sum of condition categories, and the length of every resultant capsule characterizes the occurrence likelihood of some condition category. The condition category equivalent to the greatest likelihood of occurrence is the last identified condition category.

This input layer transforms the real-time raw acceleration signals into 2-D images; the raw signals fulfil the image pixels in sequence. Based on guaranteeing the continuance of the raw signal, a 2D association of nonadjacent data is advantageous for the system to remove the signal association in the nonadjacent interval. The standard convolutional layer has been utilized to detect local combinations of characteristics in the unique input and to map their arrival for mapping the feature. The main caps layer represents the convolutional capsules applied to distribute the progressive look of signal patterns into low-level initial capsules. During this output layer, the dynamic routing model ensured that the initial capsules moved their data to the maximum related resultant capsule.

The CapsNet loss is stated as Eq. (2), while the loss of margin $\:Los{s}_{k}$ for every output capsule:

$$\:Los{s}_{all}={\sum\:}_{k}Los{s}_{k}$$

(2)

$$\:Los{s}_{k}={C}_{k}\text{m}\text{a}\text{x}(0,{m}^{+}-\Vert\:{\nu\:}_{k}\Vert\:{)}^{2}+{\lambda\:}_{1}\left(1-{c}_{k}\right)\text{m}\text{a}\text{x}(0,\:\Vert\:{\nu\:}_{k}\Vert\:-{m}^{-}{)}^{2}$$

(3)

while $\:\Vert\:{\nu\:}_{k}\Vert\:$ signifies the complete length of the output capsule vector, if condition category $\:k$ exists, $\:{C}_{k}$ is equivalent to 1, representing that simply the initial word of the function of loss is calculated for the consistent $\:k\:th$ capsule. Still, for another capsule, simply the next word is calculated. Moreover, $\:{m}^{+}$ and $\:{m}^{-}$ are fixed at 0.9 and 0.1. This means if $\:{C}_{k}=1$, the loss for the $\:k\:th$ capsule of output should be $\:\text{z}\text{e}\text{r}\text{o}$ when forecasting the accurate condition category $\:k$ using a likelihood better than 0.9, and it should be $\:non-zero\:$when the likelihood is lower than 0.9. Likewise, if $\:{C}_{k}=0$, the loss should be $\:0$ when the capsule forecasts the improper condition category by likelihood lower than 0.1 and $\:non-zero\:$when the possibility is greater than 0.1. The down-weighted parameter $\:{\lambda\:}_{1}$ is fixed to 0.5 as default.

ShuffleNetV2

ShuffleNetV1 is an effective and lightweight CNN method designed for embedded devices⁴². This system considerably decreases the model parameter counts while preserving great precision over methods like grouped convolutions, channel shuffle, and depthwise separable convolutions. Building on the general mechanisms of ShuffleNetV1, ShuffleNetV2 is intended according to the following four principles: (i) keeping constant input and output channel dimensions in convolution layers for maximizing the speed of the model; (ii) carefully utilizing grouped convolutions to prevent improving memory access costs; (iii) decreasing the number of model branches to improve speed; and (iv) reducing tensor processes to drop time consumption. Figure 2 depicts the structure of ShuffleNetV2.

ShuffleNetV2 is made of elementary units and down-sampling units. During the elementary unit, the input characteristics are divided into dual branches: the left branch permits directly over network layers, whereas the right branch implements a sequence of $\:1\text{x}1$ convolutional, $\:3\text{x}3$ depth-wise convolution, and $\:1\times\:1$ convolution, accompanied by channel connection. The right branch carried out $\:1\text{x}1$ convolution, $\:3\text{x}3$ depth-wise convolutional using a stride of 2, and other $\:1\text{x}1$ convolutions to decrease spatial dimensions and remove features. The left branch carried out $\:3\text{x}3$ depth-wise convolutions by a stride of 2, succeeded by $\:1\text{x}1$ convolutions. The outputs of these dual branches are then connected through channel shuffle, allowing effective feature extraction but preservative information integrity and meaningfully enhancing model computational complexity. However, ShuffleNet-V2-1.0 outshines computational cost; it faces challenges like exchanges amongst computational complexity and model dimensions, insufficient multitask adaptability, limited feature expression, and complexity in tuning and training. Additional investigation and developments are required to improve the performance of the models.

Stage III: fire detection using SBiLSTM-AM

In addition, the SFDAB-ARNNSHO model performs fire detection and classification using the SBiLSTM-AM technique⁴³. This method is chosen because it captures spatial and temporal dependencies in sequential data. It is significant for comprehending fire patterns in video or time-series data. The bidirectional LSTM allows the model to process information in both forward and backward directions, capturing context from past and future frames and improving its ability to detect fires accurately over time. The attention mechanism refines the model by concentrating on the most relevant features, enhancing classification performance and mitigating false positives. This is specifically valuable in fire detection, where quick and accurate identification of critical features is crucial. Additionally, SBiLSTM-AM effectively handles variable-length sequences and noise, making it more resilient to conventional methods like CNNs that may face difficulty with sequential dependencies. By incorporating context awareness and feature attention, SBiLSTM-AM gives a robust approach to fire detection, outperforming simpler models in handling dynamic and complex fire events. Figure 3 illustrates the structure of the SBiLSTM-AM method.

LSTM is a unique variation of RNN, which is obtained to tackle the task of trapping longer-term dependencies and relieving problems such as exploding or vanishing gradients combatted by CNN. The developed method covers dual layers stacked BiLSTM to take forward and backward dependencies. The input sequence like $\:{x}_{1},{x}_{2},{x}_{3}\cdots\:{x}_{f}$ arrives the BiLSTM initial hidden layer (HL) for $\:the\:tth$ time. It moves over dual directions like the forward direction $\:({a}_{1},{x}_{2},{a}_{3}\dots\:{a}_{t}$), which is employed to collect contextual data from every preceding time step. In contrast, the backward direction $\:({c}_{1},{c}_{2},{c}_{3}\cdots\:{c}_{f}$) collects data from every incoming time step. Lastly, every output layer is combined and served to an attention layer. Every LSTM cell contains memory cell $\:{U}_{t}$ with 3 gates termed input gate $\:{i}_{t},$ an output gate $\:{o}_{t}$ and forget gate $\:{f}_{t}$. The relation among the distinct gates, HL, and memory cells is explained below for the early forward layer and HL.

$$\:{i}_{t}^{a}=\sigma\:\left({U}_{i}^{a}{x}_{t}+{W}_{i}^{a}\text{a}\left(t-1\right)+{b}_{i}^{a}\right)$$

(4)

$$\:{f}_{t}^{a}=\sigma\:\left({U}_{f}^{a}{x}_{t}+{W}_{f}^{a}a\left(t-1\right)+{b}_{f}^{a}\right)$$

(5)

$$\:{o}_{t}^{a}=\sigma\:\left({U}_{o}^{a}{x}_{t}+{W}_{0}^{a}a\left(t-1\right)+{b}_{o}^{a}\right)$$

(6)

$$\:{U}_{t}^{a}=\text{t}\text{a}\text{n}\text{h}\left({U}_{u}^{a}{x}_{t}+{W}_{u}^{a}a\left(t-1\right)+{b}_{u}^{a}\right)$$

(7)

$$\:{C}_{t}^{a}=({i}_{t}^{a}*{U}_{t}^{a}*{f}_{t}^{a}C\left(t-1{)}^{a}\right)$$

(8)

$$\:{a}_{t}={o}_{t}^{a}*tanh{C}_{t}^{a}$$

(9)

The below-mentioned formulations are employed to calculate the HL $\:{b}_{t}$ for the 2nd forward layer:

$$\:{i}_{t}^{b}=\sigma\:\left({U}_{i}^{b}{x}_{t}+{W}_{i}^{b}b\left(t-1\right)+{b}_{i}^{b}\right)$$

(10)

$$\:{f}_{t}^{b}=\sigma\:\left({U}_{f}^{b}{x}_{t}+{W}_{f}^{b}b\left(t-1\right)+{b}_{f}^{b}\right)$$

(11)

$$\:{o}_{t}^{b}=\left({U}_{o}^{b}{x}_{t}+{W}_{0}^{b}b\left(t-1\right)+{b}_{o}^{b}\right)$$

(12)

$$\:{U}_{t}^{b}=\text{t}\text{a}\text{n}\text{h}\left({U}_{u}^{b}{x}_{t}+{W}_{u}^{b}b\left(t-1\right)+{b}_{u}^{b}\right)$$

(13)

$$\:{C}_{t}^{b}=({i}_{t}^{b}*{U}_{t}^{b}*{f}_{t}^{b}C\left(t-1{)}^{b}\right)$$

(14)

$$\:{b}_{t}={o}_{t}^{b}*\text{t}\text{a}\text{n}\text{h}{C}_{t}^{b}$$

(15)

The prescribed relations for 1st backward layer and the HL $\:{c}_{t}$ are given as follows:

$$\:{i}_{t}^{c}=\sigma\:\left({U}_{i}^{c}{x}_{t}+{W}_{i}^{c}c\left(t-1\right)+{b}_{i}^{c}\right)$$

(16)

$$\:{f}_{t}^{c}=\sigma\:\left({U}_{f}^{c}{x}_{t}+{W}_{f}^{c}c\left(t-1\right)+{b}_{f}^{c}\right)$$

(17)

$$\:{o}_{t}^{c}=\sigma\:\left({U}_{o}^{c}{x}_{t}+{W}_{0}^{c}c\left(t-1\right)+{b}_{o}^{c}\right)$$

(18)

$$\:{U}_{t}^{c}=\:\text{t}\text{a}\text{n}\text{h}\:\left({U}_{u}^{c}{x}_{t}+{W}_{u}^{c}c\left(t-1\right)+{b}_{u}^{c}\right)$$

(19)

$$\:{C}_{t}^{c}=({i}_{t}^{c}*{U}_{t}^{c}*{f}_{t}^{c}C\left(t-1{)}^{C}\right)$$

(20)

$$\:{c}_{t}={o}_{t}^{c}*\text{t}\text{a}\text{n}\text{h}{C}_{t}^{c}$$

(21)

The correct relations for the 2nd backward layer and the HL $\:{d}_{t}$ are below.

$$\:{i}_{t}^{d}=\sigma\:\left({U}_{i}^{d}{x}_{t}+{W}_{i}^{d}d\left(t-1\right)+{b}_{i}^{d}\right)$$

(22)

$$\:{f}_{t}^{d}=\sigma\:\left({U}_{f}^{d}{x}_{t}+{W}_{f}^{d}d\left(t-1\right)+{b}_{f}^{d}\right)$$

(23)

$$\:{o}_{t}^{a}=\sigma\:\left({U}_{o}^{a}{x}_{t}+{W}_{0}^{a}d\left(t-1\right)+{b}_{o}^{d}\right)$$

(24)

$$\:{U}_{t}^{d}=\:\text{t}\text{a}\text{n}\text{h}\:\left({U}_{u}^{d}{x}_{t}+{W}_{u}^{d}d\left(t-1\right)+{b}_{u}^{d}\right)$$

(25)

$$\:{C}_{t}^{d}=({i}_{t}^{d}*{U}_{t}^{d}*{f}_{t}^{d}C\left(t-1{)}^{d}\right)$$

(26)

$$\:{d}_{t}={o}_{t}^{d}*\text{t}\text{a}\text{n}\text{h}{C}_{t}^{d}$$

(27)

In the present work, attention models are recommended to avoid a few analyses performed by pooling operators. An encoder uses numerous weights and values for the phrase’s words, interpreting the HLs of an entire sentence into a vector model.

$$\:{\alpha\:}_{t}=\frac{exp\left(VT.{h}^{-}\right)}{{\sum\:}_{t}exp\left(V.{h}^{-}\right)}\:$$

(28)

$$\:{S}_{Aw}={\sum\:}_{t}{\alpha\:}_{t}{h}_{t}$$

(29)

Here, $\:h$ and $\:{h}^{-}$ denote the forward and backward HLs from Bi-LSTM, and $\:V$ denotes a trainable parameter. $\:{S}_{Aw}$ represents an average of transferred weights to ISTM HLs.

$$\:\overrightarrow{{h}_{tLSTM}}=\overrightarrow{\left(LSTM\right)}\left({c}_{t}\right),t\in\:\left(1,\:m\right)\:$$

(30)

$$\:\overleftarrow{{h}_{tLSTM}}=\overleftarrow{\left(LSTM\right)}\left({c}_{t}\right),t\in\:\left(m,\:1\right)$$

(31)

An explanation for every word, $\:{W}_{t}$, is obtained by concatenating forward and backward directions in the formulation below.

$$\:{h}_{tLSTM}=\overrightarrow{{h}_{tLSTM}}\oplus\:\overleftarrow{{h}_{tLSTM}}\:$$

(32)

The attention technique was employed for the $\:{h}_{tLSTM}$ technique, which allows it to give less or more attention to several terms in the commentary. On $\:{h}_{tLSTM}$, the attention model allows the model to yield specific words in the sentence with less or more attention. To achieve this, the vector of features is upgraded by isolating useful terms in the comment, as presented in Eq. (33).

$$\:{u}_{tLSTM}=\text{t}\text{a}\text{n}\text{h}\left(\left({W}_{wLSTM}\cdot\:{h}_{tLSTM}\right)+{b}_{wLSTM}\right)$$

(33)

$$\:{\alpha\:}_{tLSTM}=\frac{exp\left({u}_{t}^{{T}_{LSTM}}\right)\cdot\:{u}_{wLSTM}}{{\sum\:}_{t}exp\left({u}_{t}^{{T}_{LSTM}}\cdot\:{u}_{wLSTM}\right)}$$

(34)

$$\:{S}_{LSTM}={\sum\:}_{t}({\alpha\:}_{tLSTM}\cdot\:{h}_{tLSTM})$$

(35)

Stage IV: SHO-based parameter optimizer

Eventually, the hyperparameter tuning of the SBiLSTM-AM method is accomplished by implementing the SHO model⁴⁴. This methodology is chosen for its ability to effectively fine-tune hyperparameters and improve the performance of the fire detection model. SHO is motivated by natural phenomena, making it a unique and powerful global optimization technique that avoids local minima, a common issue with other methods like gradient descent. It is specifically beneficial for optimizing complex, multi-dimensional spaces with non-linearities, which is often the case in DL models. Unlike conventional optimization algorithms, SHO adapts dynamically, ensuring improved convergence to optimal solutions with fewer iterations. Its capability to balance exploration and exploitation during the optimization process allows it to find better-performing configurations without exhaustive search, enhancing the overall efficiency and accuracy of the model. This results in superior fire detection capabilities compared to conventional techniques, such as grid or random search, which may require more time and computational resources to achieve similar results. Figure 4 specifies the structure of the SHO methodology.

The SHO is presented to pretend a variation of the sea horse’s behaviours in the ocean, like Brownian and spiral motions. Consistent behaviours should influence the sea horse’s behaviour to update its location.

Population initialization

In SHO, initial populations are generated at random in the search space. Assume that all sea horses characterize a solution candidate within the problem-searching space; the mathematic equation to initialize a sea horse is as shown:

$$Seahorses~=\left[ {\begin{array}{*{20}{l}} {x_{1}^{1}}& \cdots &{x_{1}^{{Dim}}} \\ \vdots & \ddots & \vdots \\ {x_{{pop}}^{1}}& \cdots &{x_{{pop}}^{{Dim}}} \end{array}} \right]$$

(36)

whereas the formulation for the $\:i\:th$ individual is:

$$\:{X}_{i}=\left[{x}_{i}^{1},\:\dots\:,\:{x}_{i}^{Dim}\right]$$

(37)

$$\:{x}_{i}^{j}=rand\:\times\:\left(U{b}^{j}-L{b}^{j}\right)+L{b}^{j}$$

(38)

$\:Ub$ and $\:Lb$ represent upper and lower limits; $\:rand$ signifies the randomly generated number amongst 0 and 1, $\:and\:i$ and $\:j$ signify the positive integer amongst [1, pop] and$\:\:[1,\:Dim].$

Seahorses motion behaviour

This sea horse utilizes Brownian and spiral motion to regenerate the location. The spiral motion is mainly applied in the local exploitation stage after the seahorse mimics Lévy flight to measure the length of the movement step. To balance the exploration and exploitation of the model, $\:{r}_{1}=0$ is employed as the boundary division, and it should be utilized $\:{r}_{1}>0$ for local exploitation directly and $\:{r}_{1}\le\:0$ for global exploration.

(1)
Spiral motion.

Once the normal randomly generated number is $\:{r}_{1}>0$, the sea horse will address the elite individual $\:{X}_{elite}$ based on the spiral motion, which is concentrated on the local exploitation phase. Lévy flight has been applied to pretend the size of the movement step after the sea horse advances in spiral motion such that the crossover location is performed with higher probability in the initial iteration to prevent extreme local exploitation.

$$\:{X}_{new}^{1}\left(t+1\right)={X}_{\iota\:}\left(t\right)+Levy\cdot\:(\left({X}_{elite}\:\left(t\right)-{X}_{i}\left(t\right)\right)\times\:x\times\:y\times\:z+{X}_{elite}))$$

(39)

$$\:x=\rho\:\times\:cos\left(\theta\:\right)$$

(40)

$$\:y=\rho\:\times\:sin\left(\theta\:\right)$$

(41)

$$\:z=\rho\:\times\:\theta\:$$

(42)

$$\:\rho\:=\mu\:\times\:{e}^{\theta\:\nu\:}$$

(43)

whereas $\:{X}_{elite}$ signifies the location of elite individuals, $\:x,y,$ and $\:z$ correspondingly characterize the 3D element of spiral motion, $\:\theta\:$ means the randomly generated value among $\:\left[\text{0,2}\pi\:\right]$, $\:\rho\:$ signifies stem lengths, the logarithmic spiral constant $\:\mu\:$ and $\:v$ are fixed to 0.05, the function of Levy distribution is stated as shown:

$$\:Levy=s\times\:\frac{\omega\:\times\:\sigma\:}{|k{|}^{\frac{1}{\lambda\:}}}$$

(44)

Here, $\:s$ denotes fixed constant 0.01, the value of $\:\omega\:$ and $\:k$ represents randomly generated numbers between zero and one, and the range of value of $\:\lambda\:$ is 0 and 2, whose formulation is demonstrated as:

$$\:\sigma\:=\left(\frac{\varGamma\:\left(1+\lambda\:\right)\times\:\text{sin}\left(\frac{\pi\:\lambda\:}{2}\right)}{\varGamma\:\left(\frac{1+\lambda\:}{2}\right)\times\:\lambda\:\times\:2\left(\frac{\lambda\:-1}{2}\right)}\right)$$

(45)

(2)
Brownian motion.

Once the standard randomly generated numbers are placed $\:{r}_{1}\le\:0$, the sea horse will travel based on Brownian motion, concentrating on global exploration. It has been applied to mimic the size of the movement step to guarantee improved exploration within the searching region and prevent a fall into local ideals. The equation is as shown:

$$\:{X}_{new}^{1}\left(t+1\right)={X}_{i}\left(t\right)+rand\cdot\:l\cdot\:{\beta\:}_{t}\cdot\:\left({X}_{\text{i}}\right(t)-{\beta\:}_{t}\cdot\:{X}_{elite}))$$

(46)

Meanwhile, the parameter $\:l$ is fixed to a fixed value of 0.05. The arbitrary walking coefficient equation for $\:{\beta\:}_{t}$ is as demonstrated:

$$\:{\beta\:}_{t}=\frac{1}{\sqrt{2\pi\:}}exp\left(-\frac{{x}^{2}}{2}\right)$$

(47)

The last mathematical representation of seahorse motor behaviour is stated as follows:

$$\:{X}_{new}^{1}(t+1)=\left\{\begin{array}{ll}{X}_{i}\left(t\right)+Levy\cdot\:\left(\right({X}_{elite}\left(t\right)-{X}_{i}\left(t\right))\times\:x\times\:y\times\:z+{X}_{elite}(t\left)\right)&\:{r}_{1}>0\\\:{X}_{i}\left(t\right)+rand\cdot\:l\cdot\:{\beta\:}_{t}\cdot\:\left({X}_{i}\left(t\right)-{\beta\:}_{t}\cdot\:{X}_{elite}\left(t\right)\right)&\:{r}_{1}\le\:0\end{array}\right.$$

(48)

Seahorses predation behavior

During this phase of predation behaviour, the outcome of predation is established by $\:{r}_{2}$. If $\:{r}_{2}>0.1$, the sea horse effectively chased and moved toward the food, representing the growth capability of SHO, or else representing that the predation miscarried and the sea horse sustained to discover. The location renewal equation of sea horses in predation behaviour is as demonstrated:

$$\:{X}_{new}^{2}\left(t+1\right)=\left\{\begin{array}{l}\alpha\:\:\left({X}_{elite}-rand\cdot\:{X}_{new}^{1}\left(t\right)\right)+\left(1-\alpha\:\right)\cdot\:{X}_{elite}\:\:\:\:\:\:if\:{r}_{2}>0.1\\\:\left(1-\alpha\:\right)\cdot\:\left(\:{X}_{new}^{1}\left(t\right)-rand.\:{X}_{elite}\right)+\alpha\:\:{X}_{new}^{1}\left(t\right)\:\:\:if{r}_{2}\le\:0.1\end{array}\right.$$

(49)

$$\:\alpha\:={\left(1-\frac{t}{T}\right)}^{\frac{2t}{T}}$$

(50)

Here, $\:{r}_{2}$ and $\:rand$ denote randomly generated numbers among [0,1]. $\:{X}_{new}^{1}\left(t\right)$ signifies the novel place of the seahorse after the $\:t-th$ iteration, $\:t$ and $\:T$ characterize the present iteration, and the maximal iteration count correspondingly. $\:\alpha\:$ signifies the weighting of the influencing feature.

Seahorses breeding behaviour

Based on the fitness value level, this seahorse’s breeding behaviour separates the population into female and male groups. The best half of the optimal individuals are described as males, and the bottom half are females, which aids in obtaining the genetic features of the earlier generations.

$$\:{X}_{\iota\:}^{offspring}={r}_{3}\cdot\:{X}_{i}^{father}+\left(1-{r}_{3}\right)\cdot\:{X}_{\iota\:}^{mother}$$

(51)

$$\:fathers\:={X}_{sort}^{2}\left(1\::\frac{pop}{2}\right)$$

(52)

$$\:mothers\:={X}_{sort}^{2}(pop/2+1):\:pop\:)$$

(53)

Here, $\:i$ signifies a positive integer inside the interval among $\:[1,\:pop/2],$ $\:{r}_{3}$ represents randomly generated numbers amongst [0,1], $\:{X}_{sort}^{2}$ specify that $\:{X}_{new}^{2}$ is organized in sequential order of fitness value.

The SHO method has subsequent benefits. (i) Specific modifiable parameters are advantageous to the execution of the model. (ii) The exploration and exploitation of balanced models are attained by utilizing $\:{r}_{1}=0$ as the separating point between global exploration and local exploitation. (iii) The distribution pattern within the reproductive behaviour phase can get the genetic features of the earlier generation and gain improved seahorse individuals. The SHO method originates a fitness function (FF) to attain enhanced classification performance. It defines an optimistic number to signify the improved performance of the candidate outcomes. At this point, the decrease in the classifier error ratio is reflected as FF.

$$\begin{gathered} fitness\left( {{x_i}} \right)=ClassifierErrorRate\left( {{x_i}} \right) \hfill \\ =\frac{{Misclassified~instance~counts}}{{Total~~instance~counts}} \times 100 \hfill \\ \end{gathered}$$

(54)

Result analysis and discussion

The performance evaluation of the SFDAB-ARNNSHO technique is examined using the fire detection dataset⁴⁵. The dataset contains 600 images in dual classes, such as fire and normal, as shown in Table 1. Figure 5 portrays a sample of fire and normal images.

Table 1 Details of database.

Full size table

Figure 6 provides a set of confusion matrices attained by the SFDAB-ARNNSHO approach on dissimilar epochs. On 500 epochs, the SFDAB-ARNNSHO technique has identified 97 samples in the fire and 490 samples in normal. Similarly, on 1000 epochs, the SFDAB-ARNNSHO technique has identified 99 samples in the fire and 492 samples in normal. Followed by, on 1500 epochs, the SFDAB-ARNNSHO technique has identified 100 samples into the fire and 493 samples into normal. Besides, in 2000 epochs, the SFDAB-ARNNSHO technique identified 92 samples in the fire and 479 samples in normal. Finally, on 3000 epochs, the SFDAB-ARNNSHO technique has identified 98 samples in the fire and 491 samples in normal.

The fire detection result of the SFDAB-ARNNSHO technique is determined under different epochs in Table 2; Fig. 7. The outcome values state that the SFDAB-ARNNSHO method correctly identified fire and standard samples. On 500 epochs, the SFDAB-ARNNSHO method provides an average $\:acc{u}_{y}$ of 97.50%, $\:pre{c}_{n}\:$of 95.02%,$\:\:rec{a}_{l}\:$of 97.50%,$\:\:\:{F}_{score}$ of 96.21%, and $\:{G}_{Measure\:}$of 96.23%. Besides, on 1000 epochs, the SFDAB-ARNNSHO method gives an average $\:acc{u}_{y}$ of 98.70%, $\:pre{c}_{n}\:$of 96.16%, $\:\:rec{a}_{l}\:$of 98.70$\:\:\:{F}_{score}$ of 97.37%, and $\:{G}_{Measure\:}$of 97.40%. Moreover, on 1500 epochs, the SFDAB-ARNNSHO method presents an average $\:acc{u}_{y}$ of 99.30%, $\:pre{c}_{n}\:$of 96.73%, $\:\:rec{a}_{l}\:$of 99.30$\:\:\:{F}_{score}$ of 97.96%, and $\:{G}_{Measure\:}$of 97.99%. At the same time, on 2000 epochs, the SFDAB-ARNNSHO approach provides an average $\:acc{u}_{y}$ of 93.90%, $\:pre{c}_{n}\:$of 89.89%, $\:\:rec{a}_{l}\:$of 93.90$\:\:\:{F}_{score}$ of 91.72%, and $\:{G}_{Measure\:}$of 91.81%. At last, based on 3000 epochs, the SFDAB-ARNNSHO approach offers an average $\:acc{u}_{y}$ of 98.10%, $\:pre{c}_{n}\:$of 95.59%, $\:\:rec{a}_{l}\:$of 98.10$\:\:\:{F}_{score}$ of 96.79%, and $\:{G}_{Measure\:}$of 96.82%.

Table 2 Fire detection of SFDAB-ARNNSHO methodology under dissimilar epochs.

Full size table

Figure 8 illustrates the training (TRA) $\:acc{u}_{y}$ and validation (VAL) $\:acc{u}_{y}$ outcomes of the SFDAB-ARNNSHO technique under different epochs. The $\:acc{u}_{y}\:$analysis is computed across the range of 0-3000 epochs. The figure emphasized that the TRA and VAL $\:acc{u}_{y}$ analysis displays an increasing trend, which informed the capacity of the SFDAB-ARNNSHO method to have superior outcomes across multiple iterations.

Figure 9 shows the TRA loss (TRALOS) and VAL loss (VALLOS) analysis of the SFDAB-ARNNSHO approach under dissimilar epochs. The loss values are calculated across the range of 0-3000 epochs. It is denoted that the TRALOS and VALLOS values exemplify a diminishing trend, informing the SFDAB-ARNNSHO technique’s capacity to balance a trade-off between data fitting and simplification.

In Fig. 10, the PR graph analysis of the SFDAB-ARNNSHO methodology under different epochs clarifies its outcome by plotting Precision beside Recall for two labels. The result shows that the SFDAB-ARNNSHO technique constantly achieves better PR values than dissimilar classes. It demonstrates its size to keep a vital section of true positive predictions among all the positive predictions (precision), taking a massive ratio of real positives (recall).

Figure 11 examines the ROC graph of the SFDAB-ARNNSHO approach under different epochs. The results showed that the SFDAB-ARNNSHO technique gains maximum ROC across all classes, demonstrating the critical ability to discriminate the class labels. This dependable tendency of higher ROC analysis across various classes shows the capable outcome of the SFDAB-ARNNSHO methodology on predicting class labels, highlighting the robust nature of the classification method.

The comparative outcome of the SFDAB-ARNNSHO model with existing methodologies is demonstrated in Table 3; Fig. 12^19,46,47,48. The simulation results stated that the SFDAB-ARNNSHO approach outperformed better performances. Based on $\:acc{u}_{y}$, the SFDAB-ARNNSHO approach has a higher $\:acc{u}_{y}$ of 99.30%. In contrast, the ConvNeXtTiny, ResNet152-V2, VGG19, NASNet-Large, DL-MFDSED, Bi-LSTM, Inception Time, Transformer approaches, ADLSTM, artificial neural network (ANN), graph neural networks (GNNs), and generative adversarial network (GAN) exhibits lesser $\:acc{u}_{y}$ of 90.08%, 95.56%, 97.46%, 96.29%, 98.17%, 99.08%, 98.25%, 98.97%, 96.36%, 98.05%, 96.89%, and 98.81%, respectively. Also, depend on $\:pre{c}_{n}$, the SFDAB-ARNNSHO approach has superior $\:pre{c}_{n}$ of 96.73% where the ConvNeXtTiny, ResNet152-V2, VGG19, NASNet-Large, DL-MFDSED, Bi-LSTM, Inception Time, and Transformer techniques, ADLSTM, ANN, GNNs, and GAN illustrates minimal $\:Pre{c}_{n}$ of 82.46%, 90.84%, 91.82%, 92.34%, 95.47%, 93.79%, 96.05%, 94.53%, 91.54%, 92.58%, 93.06%, and 96.02%, correspondingly. Moreover, for $\:{the\:F1}_{score}$, the SFDAB-ARNNSHO approach has a maximal $\:{F1}_{score}$ of 97.96%. In contrast, the ConvNeXtTiny, ResNet152-V2, VGG19, NASNet-Large, DL-MFDSED, Bi-LSTM, Inception Time, Transformer techniques, ADLSTM, ANN, GNNs, and GAN depicts inferior $\:{F1}_{score}$ of 81.77%, 91.18%, 91.38%, 92.27%, 95.45%, 94.12%, 96.11%, 95.65%, 91.89%, 92.12%, 92.97%, and 96.20%, correspondingly.

Table 3 Comparative results of SFDAB-ARNNSHO technique with existing models^19,46,47,48.

Full size table

In Table 4; Fig. 13, the comparative results of the SFDAB-ARNNSHO methodology are identified in terms of execution time (ET). The table values suggest that the SFDAB-ARNNSHO approach gets a higher outcome. Depending on ET, the SFDAB-ARNNSHO approach presents a lower ET of 10.86 s. In contrast, the ConvNeXtTiny, ResNet152-V2, VGG19, NASNet-Large, DL-MFDSED, Bi-LSTM, Inception Time, and Transformer techniques, ADLSTM, ANN, GNNs, and GAN accomplish greater ET values of 16.27 s, 30.14 s, 21.03 s, 26.37 s, 18.08 s, 18.14 s, 37.83 s, 33.14 s, 20.11 s, 10.38 s, 11.98 s, and 19.30 s, correspondingly.

Table 4 ET outcome of SFDAB-ARNNSHO approach with existing techniques.

Full size table

Conclusion

In this manuscript, a novel SFDAB-ARNNSHO method is presented. The main intention of the SFDAB-ARNNSHO method is to detect and classify fire for blind people. To achieve this, the proposed SFDAB-ARNNSHO model involves image pre-processing stage SF to remove noise in input data. Furthermore, the fusion of feature extraction adopts three methods: EfficientNetB7, CapsNet, and ShuffleNetV2. In addition, the SFDAB-ARNNSHO model performs fire detection and classification using the SBiLSTM-AM technique. Finally, the parameter tuning of the SBiLSTM-AM method is executed by the design of the SHO method. The simulation validation of the SFDAB-ARNNSHO methodology is examined under the fire detection dataset, and the outcomes are measured using various measures. The performance validation of the SFDAB-ARNNSHO methodology portrayed a superior accuracy value of 99.30% over existing models under diverse measures. The limitations of the presented study SFDAB-ARNNSHO methodology comprise challenges related to the robustness of fire detection in highly dynamic or noisy environments, where environmental factors such as lighting changes or smoke density may affect performance. Additionally, the model’s reliance on high-quality image data could pose challenges in real-world scenarios where data quality may vary. Another limitation is the computational complexity, which may impact real-time implementation on resource-constrained devices. The technique’s capability to generalize across diverse fire scenarios and environments remains an area for enhancement. Future works can improve model robustness to varying conditions, optimize it for faster processing, and explore integrating additional sensor data (e.g., thermal imaging) to improve detection accuracy. Further research can also explore the model’s scalability for broader applications, including integration with IoT devices for real-time monitoring and response.

Data availability

The data that support the findings of this study are openly available in the Kaggle repository at https://www.kaggle.com/datasets/atulyakumar98/test-dataset.“.

References

Abdusalomov, A. B., Mukhiddinov, M., Kutlimuratov, A. & Whangbo, T. K. Improved real-time fire warning system based on advanced technologies for visually impaired people. Sensors, 22(19), 7305 (2022).
Hoang, V. N. et al. Obstacle detection and warning system for visually impaired people based on electrode matrix and mobile kinect. Vietnam J. Comput. Sci. 4, 71–83 (2017).
Article Google Scholar
Gaur, A. et al. Fire sensing technologies: A review. IEEE Sens. J. 19 (9), 3191–3202 (2019).
Article CAS Google Scholar
Mukhiddinov, M., Abdusalomov, A. B. & Cho, J. Automatic fire detection and notification system based on improved YOLOv4 for the blind and visually impaired. Sensors, 22(9), 3307 (2022).
Abdusalomov, A. B., Islam, B. M. S., Nasimov, R., Mukhiddinov, M. & Whangbo, T. K. An improved forest fire detection method based on the detectron2 model and a deep learning approach. Sensors, 23(3), 1512 (2023).
Sharma, A., Singh, P. K. & Kumar, Y. An integrated fire detection system using IoT and image processing technique for smart cities. Sustain. Cities Soc. 61, 102332 (2020).
Choi, M., Lee, S., Hwang, S., Park, M. & Lee, H. S. Comparison of emergency response abilities and evacuation performance involving vulnerable occupants in building fire situations. Sustainability, 12(1), 87 (2019).
Reddy, P. M., Reddy, S. P. K., Karthik, G. S. & Priya, B. K. Intuitive voice controlled robot for obstacle, smoke and fire detection for physically challenged people. In 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184) (pp. 763–767). IEEE. (2020).
Saquib, Z., Murari, V. & Bhargav, S. N. May. BlinDar: An invisible eye for blind people making life easy for the blind with Internet of Things (IoT). In 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) (pp. 71–75). IEEE. (2017).
Aljaberi, S. M. & Al-Masri, A. N. Automated deep learning-based video summarization approach for forest fire detection. Full Length Article, 5(2), (2021). pp.54 – 4.
Kumar, S. D., Balaji, S., Selvan, A. & Kumar, Y. Innovative assistance for the visually impaired: face recognition, navigation, water and fire safety with raspberry PI. In Advances in Additive Manufacturing Technologies (545–550). CRC. (2024).
Singh, A. K. et al. February. Common Smart Stick for Blind and Elderly People to Detect Environmental Factors and Free Navigation. In International Conference on Biomedical Engineering Science and Technology (pp. 388–400). Cham: Springer Nature Switzerland. (2023).
Mallika, T., & Veeresh, K. An intelligent walking stick for visually challenged people with voice alert. Mater. Sci. 23(01) (2024).
Kumar, M. A. et al. June. Smart voice-guided assistance stick for the visually impaired. In 2023 3rd International Conference on Pervasive Computing and Social Networking (ICPCSN) (pp. 594–598). IEEE. (2023).
Tesfaye, A. Enhancing mobility and safety: A smart walking cane for visually impaired individuals with ultrasonic sensor, infrared, and GSM module. J. Comput. Sci. Data Analytics. 1 (01), 59–74 (2024).
Article Google Scholar
Oureshi, M. S., Khan, I. U., Qureshi, S. B., Khan, F. M. & Aleshaiker, S. September. Empowering the Blind: AI-Assisted Solutions for Visually Impaired People. In 2023 IEEE International Smart Cities Conference (ISC2) (pp. 1–4). IEEE. (2023).
Abuelmakarem, H. S., Abuelhaag, A., Raafat, M. & Ayman, S. An integrated IoT smart cane for the blind and visually impaired individuals. SVU-International J. Eng. Sci. Appl. 5 (1), 71–78 (2024).
Google Scholar
Abiramee, M. R., Nithya, M. P., Kumar, M. M. N., Muhammed, M. T. S. & Sowmiya, M. S. Smart blind stick using voice module (2023).
Xie, W. et al. AIoT-powered building digital twin for smart fire-fighting and super real-time fire forecast. Adv. Eng. Inf. 65, 103117 (2025).
Tejani, G. G., Mashru, N., Patel, P., Sharma, S. K. & Celik, E. Application of the 2-archive multi-objective cuckoo search algorithm for structure optimization. Sci. Rep. 14(1), 31553 (2024).
Dzeng, R. J., Fan, B. & Tian-Lin, H. Dynamic Collision Alert System for Collaboration of Construction Equipment and Workers. Buildings 15(1), 110 (2025).
Nonut, A. et al. A small fixed-wing UAV system identification using metaheuristics. Cogent Eng. 9(1), 2114196 (2022).
Aye, C. M. et al. Airfoil shape optimisation using a Multi-Fidelity Surrogate-Assisted metaheuristic with a new Multi-Objective infill sampling technique. CMES-Computer Model. Eng. Sci., 137(3). (2023).
Duggi, N., Rafiei, M. & Salehi, M. A. Benchmarking Different Application Types across Heterogeneous Cloud Compute Services. arXiv preprint arXiv:2501.06128. (2025).
Cai, G., Zheng, X., Gao, W. & Guo, J. Self-extinction characteristics of fire extinguishing induced by nitrogen injection rescue in an enclosed urban utility tunnel. Case Stud. Thermal Eng. 59, 104478 (2024).
He, H. et al. Practical tracking method based on best buddies similarity. Cyborg Bion. Syst. 4, 0050 (2023).
Kiamansouri, M. Integration of renewable energy sources in oil and gas operations a sustainable future. Euras. J. Chem. Med. Petroleum Res. 4 (1), 111–135 (2025).
Google Scholar
Sun, L. et al. Underwater robots and key technologies for operation control. Cyborg Bionic Syst. 5, 0089 (2024).
Sun, G. et al. Low-latency and resource-efficient service function Chaining orchestration in network function virtualization. IEEE Internet Things J. 7 (7), 5760–5772 (2019).
Article Google Scholar
Cui, J., Yu, S., Shang, Y., Dai, Y. & Zhang, W. January. Research on outdoor navigation of intelligent wheelchair based on a novel layered cost map. In Actuators (Vol. 14, No. 2, 46). MDPI. (2025).
Wang, X. et al. A Robotic Teleoperation System Enhanced by Augmented Reality for Natural Human–Robot Interaction. Cyborg Bionic Syst. 5, 0098 (2024).
Zhao, X. et al. Target-Driven visual navigation by using causal intervention. IEEE Trans. Intell. Veh. 9 (1), 1294–1304 (2023).
Article Google Scholar
Qiao, G., Hou, S., Chen, Q., Xiang, G. & Prideaux, B. Role of body in travel: wheelchair users’ experience from a Multi-Sensory perspective. J. Travel Res., 00472875241249391. (2024).
He, S., Luo, H., Jiang, W., Jiang, X. & Ding, H. VGSG: Vision-Guided Semantic-Group network for Text-Based person search. IEEE Trans. Image Process. 33, 163–176 (2023).
Article PubMed Google Scholar
Fan, X., Lei, F. & Yang, K. Real-Time Detection of Smoke and Fire in the Wild Using Unmanned Aerial Vehicle Remote Sensing Imagery. Forests 16(2), 201 (2025).
Zheng, W., Lin, L., Wu, X. & Chen, X. An empirical study on correlations between deep neural network fairness and neuron coverage criteria. IEEE Trans. Software Eng. (2024).
Gu, X. et al. SiMaLSTM-SNP: novel semantic relatedness learning model preserving both Siamese networks and membrane computing. J. Supercomput.. 80 (3), 3382–3411 (2024).
Article Google Scholar
Ding, J. et al. DialogueINAB: an interaction neural network based on attitudes and behaviors of interlocutors for dialogue emotion recognition. J. Supercomput.. 79 (18), 20481–20514 (2023).
Article Google Scholar
Rout, N. & Nesam, J. J. J. May. Optimizing RGB to Grayscale, Gaussian Blur and Sobel-Filter operations on FPGAs for reduced dynamic power consumption. In 2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT) (pp. 1–6). IEEE. (2024).
Alkhalifa, A. K. et al. Prairie dog optimization algorithm with deep learning assisted based aerial image classification on UAV imagery. Heliyon, 10(18). (2024).
Hu, W. et al. A transferable diagnosis method with incipient fault detection for a digital twin of wind turbine. Digital Eng. 1, 100001 (2024).
Zhou, Y. et al. Innovative Ghost Channel Spatial Attention Network with Adaptive Activation for Efficient Rice Disease Identification. Agronomy 14(12), 2869 (2024).
Khan, L., Qazi, A., Chang, H. T., Alhajlah, M. & Mahmood, A. Empowering Urdu sentiment analysis: an attention-based stacked CNN-Bi-LSTM DNN with multilingual BERT. Complex Intell. Syst. 11(1), 10 (2025).
Wang, Y. C. et al. GOG-MBSHO: multi-strategy fusion binary seahorse optimizer with Gaussian transfer function for feature selection of cancer gene expression data. Artifi. Intelli. Rev. 57(12), 347 (2024).
https://www.kaggle.com/datasets/atulyakumar98/test-dataset.
Sultan, T., Chowdhury, M. S., Safran, M., Mridha, M. F. & Dey, N. Deep Learning-Based Multistage Fire Detection System and Emerging Direction. Fire 7(12), 451 (2024).
Kim, Y., Heo, Y., Jin, B. & Bae, Y. Real-Time Fire Classification Models Based on Deep Learning for Building an Intelligent Multi-Sensor System. Fire, 7(9), 329 (2024).
Nguyen, M. D., Vu, H. N., Pham, D. C., Choi, B. & Ro, S. Multistage real-time fire detection using convolutional neural networks and long short-term memory networks. IEEE Access. 9, 146667–146679 (2021).
Article Google Scholar

Download references

Acknowledgements

The authors extend their appreciation to the King Salman center For Disability Research for funding this work through Research Group no KSRG-2024- 068.

Author information

Authors and Affiliations

Department of Computer Science, Applied College at Mahayil, King Khalid University, Abha, Saudi Arabia
Fahd N. Al-Wesabi
Department of Information Systems, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11432, Saudi Arabia
Abeer A. K. Alharbi
Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, AlKharj, Saudi Arabia
Ishfaq Yaseen
King Salman Center for Disability Research, Riyadh, 11614, Saudi Arabia
Ishfaq Yaseen

Authors

Fahd N. Al-Wesabi
View author publications
Search author on:PubMed Google Scholar
Abeer A. K. Alharbi
View author publications
Search author on:PubMed Google Scholar
Ishfaq Yaseen
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: Fahd N. Al-Wesabi, Abeer A. K. Alharbi and Ishfaq Yaseen, Data curation and Formal analysis: Fahd N. Al-Wesabi, Abeer A. K. Alharbi and Ishfaq Yaseen, Investigation and Methodology: Fahd N. Al-Wesabi, Abeer A. K. Alharbi and Ishfaq Yaseen, Project administration and Resources: Supervision; Fahd N. Al-Wesabi, Writing—original draft: Fahd N. Al-Wesabi, Abeer A. K. Alharbi and Ishfaq Yaseen, Validation and Visualization: Fahd N. Al-Wesabi, Abeer A. K. Alharbi Writing—review and editing, Fahd N. Al-Wesabi, Abeer A. K. Alharbi and Ishfaq Yaseen, All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Fahd N. Al-Wesabi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Al-Wesabi, F.N., Alharbi, A.A.K. & Yaseen, I. An advanced fire detection system for assisting visually challenged people using recurrent neural network and sea-horse optimizer algorithm. Sci Rep 15, 21493 (2025). https://doi.org/10.1038/s41598-025-91829-9

Download citation

Received: 14 December 2024
Accepted: 24 February 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41598-025-91829-9

An advanced fire detection system for assisting visually challenged people using recurrent neural network and sea-horse optimizer algorithm

Subjects

Abstract

Similar content being viewed by others

Optimal features assisted multi-attention fusion for robust fire recognition in adverse conditions

Evaluation of machine learning and deep learning algorithms for fire prediction in Southeast Asia

Federated learning based fire detection method using local MobileNet

Introduction

Literature survey

The proposed methodology

Stage I: image pre-processing

Stage II: fusion of feature extraction

EfficientNetB7 classifier

CapsNet model

ShuffleNetV2

Stage III: fire detection using SBiLSTM-AM

Stage IV: SHO-based parameter optimizer

Population initialization

Seahorses motion behaviour

Seahorses predation behavior

Seahorses breeding behaviour

Result analysis and discussion

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Optimal features assisted multi-attention fusion for robust fire recognition in adverse conditions

Evaluation of machine learning and deep learning algorithms for fire prediction in Southeast Asia

Federated learning based fire detection method using local MobileNet

Introduction

Literature survey

The proposed methodology

Stage I: image pre-processing

Stage II: fusion of feature extraction

EfficientNetB7 classifier

CapsNet model

ShuffleNetV2

Stage III: fire detection using SBiLSTM-AM

Stage IV: SHO-based parameter optimizer

Population initialization

Seahorses motion behaviour

Seahorses predation behavior

Seahorses breeding behaviour

Result analysis and discussion

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links