Artificial intelligence-driven ensemble deep learning models for smart monitoring of indoor activities in IoT environment for people with disabilities

Arasi, Munya A.; AlEisa, Hussah Nasser; Alneil, Amani A.; Marzouk, Radwa

doi:10.1038/s41598-025-88450-1

Download PDF

Article
Open access
Published: 05 February 2025

Artificial intelligence-driven ensemble deep learning models for smart monitoring of indoor activities in IoT environment for people with disabilities

Munya A. Arasi¹,
Hussah Nasser AlEisa²,
Amani A. Alneil^3,4 &
…
Radwa Marzouk⁵

Scientific Reports volume 15, Article number: 4337 (2025) Cite this article

3507 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Disabled persons demanding healthcare is a developing global occurrence. The support in longer-term care includes nursing, intricate medical, recovery, and social help services. The price is large, but advanced technologies can aid in decreasing expenditure by certifying effective health services and enhancing the superiority of life. The transformative latent of the Internet of Things (IoT) prolongs the existence of nearly one billion persons worldwide with disabilities. By incorporating smart devices and technologies, the IoT provides advanced solutions to tackle numerous tasks challenged by individuals with disabilities and promote equality. Human activity detection methods are the technical area which studies the classification of actions or movements an individual achieves over the recognition of signals directed by smartphones or wearable sensors or over images or video frames. They are efficient in certifying functions of detection of actions, observing crucial functions, and tracking. Conventional machine learning and deep learning approaches effectively detect human activity. This study develops and designs a metaheuristic optimization-driven ensemble model for smart monitoring of indoor activities for disabled persons (MOEM-SMIADP) model. The proposed MOEM-SMIADP model concentrates on detecting and classifying indoor activities using IoT applications for physically challenged people. First, data preprocessing is performed using min–max normalization to convert input data into useful format. Furthermore, the marine predator algorithm is employed in feature selection. For the detection of indoor activities, the proposed MOEM-SMIADP model utilizes an ensemble of three classifiers, namely the graph convolutional network model, long short-term memory sequence-to-sequence (LSTM-seq2seq) method, and convolutional autoencoder. Eventually, the hyperparameter tuning is accomplished by an improved coati optimization algorithm to enhance the classification outcomes of ensemble models. A wide range of experiments was accompanied to endorse the performance of the MOEM-SMIADP technique. The performance validation of the MOEM-SMIADP technique portrayed a superior accracy value of 99.07% over existing methods.

Smart indoor monitoring for disabled individuals using an ensemble of deep learning models in an IoT environment

Article Open access 08 May 2025

Internet of things enabled indoor activity monitoring for visually impaired people with hybrid deep learning and optimized algorithms for enhanced safety

Article Open access 10 October 2025

Advanced internet of things enhanced activity recognition for disability people using deep learning model with nature-inspired optimization algorithms

Article Open access 14 May 2025

Introduction

Globally, more than one billion individuals, nearly 15% of the population, live with more than one disability, as per the World Health Organization (WHO). These diseases can exhibit initially in childhood or improve in older age, like the function of an impaired hand due to stroke¹. Day by day, people with disabilities substantially struggle to manage their home appliances. Thus, classical houses were transformed into smart homes to enhance living standards for people with disabilities. In past years, the purpose of IoT technology has been to permit the interaction between gadgets without the necessity for human involvement². Nowadays, IoT technology is incorporated with home gadgets to allow these devices to control the internet remotely. IoT defines physical object networks or things with software, sensors, and other technology developed to share and connect data with other devices or systems through the internet³. These gadgets comprise light switches that respond to thermostats or turn-off and on commands that can modify the indoor temperature to decrease energy consumption. Diverse authors represent multiple solutions for assisting disabled persons in regulating gadgets remotely through IoT depending on the consumer voice or the smartphone Graphical User Interface (GUI)⁴. Human Activity Recognition (HAR) is the domain of science that examines action or movement identifications undertaken to detect signals sent by smartphones or wearable sensors through image or video frames⁵.

These actions are implemented indoors, like sitting, walking, standing, and stairs. Also essential to identify where the practical activities are implemented. Computer technology and human movements were utilized to understand artificial vision. HAR has many applications like anti-terrorist security, surveillance, assistance, and lifelogging⁶. These methods have been verified as helpful for offering effective home care for disabled persons and indoor tracking methods. The number of disabled persons and older adults increases, thus defining a necessity to support those people who are losing autonomy and want to stay alive in their houses, requiring continuous assistance in the real world⁷. Compared with indoor and outdoor localization, it is difficult for indoor communication channels to diverge considerably from their surroundings. It is based on several factors, like construction materials, building structure, and room layout. The Indoor Positioning System (IPS) presents real-world localization of people or objects, and spaces are surrounded by diverse environments, utilizing a network for receivers and transmitters⁸. HAR can be defined as the art of identifying and naming activities using artificial intelligence (AI) from the activities collected from raw data by employing several resources. ML and DL progress have made meaningful task feature extractors from the raw sensor data. The increasing global prevalence of disabilities highlights the requirement for effective solutions that improve the quality of life for affected individuals. People with physical or cognitive impairments often encounter challenges in managing everyday tasks, comprising controlling their home environment⁹. With the rise of connected devices and smart technologies, there is a growing opportunity to develop systems that offer autonomy and support to people with disabilities. Intelligent monitoring systems can facilitate enhanced home management, ensuring safety and independence. Integrating advanced DL methods within an IoT ecosystem can revolutionize how smart homes are designed, providing tailored solutions for diverse needs¹⁰.

This study develops and designs a Metaheuristic Optimization-Driven Ensemble Model for Smart Monitoring of Indoor Activities for Disabled Persons (MOEM-SMIADP) model. The proposed MOEM-SMIADP model concentrates on detecting and classifying indoor activities using IoT applications for physically challenged people. First, data preprocessing is performed using min–max normalization to convert input data into useful format. Furthermore, the marine predator algorithm (MPA) is employed in feature selection. For the detection of indoor activities, the proposed MOEM-SMIADP model utilizes an ensemble of three classifiers, namely the graph convolutional network (GCN) model, long short-term memory sequence-to-sequence (LSTM-seq2seq) method, and convolutional autoencoder (CAE). Eventually, the hyperparameter tuning is accomplished by an improved coati optimization algorithm (ICOA) to enhance the classification outcomes of ensemble models. A wide range of experiments was accompanied to endorse the performance of the MOEM-SMIADP technique. The key contribution of the MOEM-SMIADP technique is listed below.

The MOEM-SMIADP model utilizes min–max normalization to preprocess data by scaling it to a specific range, ensuring balanced feature contributions during learning. This approach improves subsequent algorithms’ performance and convergence speed, specifically in handling diverse feature scales effectively. It plays a significant role in improving the accuracy and stability of the ensemble classifiers.
The MOEM-SMIADP approach employs the MPA method to choose the most relevant features, effectively mitigating data dimensionality and eliminating noise or irrelevant information. This optimization-driven approach improves computational efficiency and ensures enhanced performance of subsequent classifiers. It significantly improves the model’s capability to focus on critical features, enhancing overall detection accuracy.
The MOEM-SMIADP method combines three classifiers to utilize diverse feature extraction capabilities. The GCN captures spatial and structural relationships, the LSTM-seq2seq handles temporal sequences and long-term dependencies in time-series data, and the CAE extracts latent features for complex pattern detection. This integration improves the accuracy and robustness of the model in indoor activity recognition.
The MOEM-SMIADP model utilizes an ICOA model for hyperparameter tuning, ensuring optimal configurations for the ensemble classifiers. This approach improves the model’s accuracy, performance, and efficiency by finding the optimal parameter settings. It plays a key role in improving the detection capability and robustness of the indoor activity recognition system.
The MOEM-SMIADP methodology outperforms by integrating diverse methods such as MPA-based feature selection, ICOA-based hyperparameter tuning, and an ensemble of three state-of-the-art classifiers, namely GCN, LSTM-seq2seq, CAE, to address indoor activity detection. This synergistic approach uniquely incorporates spatial, temporal, and latent feature extraction techniques with advanced optimization algorithms, resulting in significant improvements in detection accuracy and efficiency compared to traditional methods.

Related works

In¹¹, a strong DL structure known as a Multiple Spectrogram Fusion Network (MSF-Net) is presented for fine activity recognition and coarse utilizing Channel State Information (CSI). Initially, a dual-stream framework integrating DWT and short-time Fourier transform is introduced to highlight abnormal information in the CSI data. Formerly, a Transformer was applied as the backbone to remove higher-level features efficiently. Berkani et al.¹² developed an intelligent method to monitor air quality and classify activities in indoor surroundings employing the DL method depending on a 1D Convolutional Neural Network (1D-CNN). This method incorporates six sensors to collect measurement parameters that are eventually trained in a 1D CNN method for activity detection. This projected method boasts an edge-deployable and lightweight design, making it standard for real-world applications. Sun and Chen¹³ developed a novel asynchronous detection approach, the Rapid Response Elderly Safety Monitoring (RESAM) technique. During the primary analysis of inertial sensor data utilizing multi-class classifiers and Kernel Principal Component Analysis (KPCA), this method effectively decreases the processing period and lowers the FNR. Then, decision-level data fusion was performed, integrating skeleton image investigation depending on the primary stage’s inertial sensor data and ResNet. In¹⁴, data processing approaches suited for a non-invasive indoor noisy sound examination method operating edge environments were developed. To accomplish this, MFCC and Mel-spectrogram-based methods for classifying sound environments are applied to compare their performance depending on optimizations and diverse preprocessing parameters. In¹⁵, a BC and IoT-based Assisted Living System (BIoT-ALS) is presented utilizing 6G communication. These nodes in the projected model use smart contracts to particularize norms of interaction while working together to offer computing resources and storage. Kan et al.¹⁶ developed an innovative method employing dual Kinect V2, developed by progressed ensemble learning models and advanced Transmission Control Protocol (TCP). Data-adaptive adjustment mechanism, embedded in localization results, to decrease self-occlusion in dynamic orientations and amalgamation of the RF and bat models, offering novel action detection approaches for complex scenarios. Srinivasan et al.¹⁷ developed an innovative approach to enhance outdoor comfort by associating adaptive thermal apparel with RFR and IoT. Wearing inflexible classical outdoor gear might be a real pain for those living in diverse forms of the world. This method introduces an intelligent garment method, which can change its thermal insulation in the real world by employing data collected from IoT sensors. An RFR method was used to determine the useful thermal settings for the garments, depending on collected data, which included environmental aspects like temperature, wind speed, and humidity.

Manimaran et al.¹⁸ aimed to support the elderly by tracking their activities in both outdoor and indoor surroundings. A semi-supervised DL structure was introduced for better HAR outcomes, which efficiently uses the incorrectly labelled sensor information and fine-tunes the classifier learning method. Xiao et al.¹⁹ present a framework for activity recognition and health monitoring using smartphone accelerometer data, utilizing BiLSTM and Bayesian optimization to improve the performance and fine-tune the model. Shereef, Varghese, and Kamalraj²⁰ explore the role of IoT and cloud computing in healthcare, concentrating on their application in diagnosing sleep apnea while addressing benefits, challenges, and their potential to revolutionize sleep medicine. Rezaee et al.²¹ present an optimized BiLSTM model with Grey Wolf Optimizer (GWO) for real-time student activity classification and health monitoring using accelerometer data, validated on UCI-HAR and WISDM datasets. Anitha et al.²² enhance fall detection in the elderly by incorporating multiple sensors with AI and ML methods for real-time monitoring, accuracy, and adaptability while exploring future sensor integration for improved health tracking. Maddeh et al.²³ propose an ensemble DL model to detect a patient’s mobility state using sensors in a smartbed, distinguishing between sleeping, standing, sitting, walking, and emergency states for improved accuracy. Akhmetshin et al.²⁴ present EADL-FDC, using DL and evolutionary algorithms for fall detection, with SPA-Net for feature extraction, SOS for parameter selection, DBN for classification, and MFO for hyperparameter tuning. Namoun et al.²⁵ propose an ensemble meta-learning model to select the best IoT services for disabled students in education, considering their unique needs. Jawad et al.²⁶ developed a sustainable greenhouse model that optimizes energy consumption while ensuring optimal plant growth conditions using the Artificial Bee Colony (ABC) optimization technique and a fuzzy controller to regulate environmental factors. Kao et al.²⁷ explore drowning prevention technology using embedded systems, AI, and IoT for real-time monitoring and alerting. Computer vision and DL improve image recognition for identifying drowning situations, while IoT connectivity improves system intelligence and rescue efficiency. Yazici et al.²⁸ present an e-health framework utilizing IoT-based inertial, ECG, and video sensors for real-time monitoring of elderly and disabled individuals, employing edge computing for efficient data analysis and ensuring privacy by activating sensors only when needed.

The reviewed studies highlight significant IoT, AI, and DL improvements for healthcare, activity recognition, and assistive systems. However, limitations persist, comprising challenges in real-time processing, scalability, and adaptability to diverse environments and user needs. Many solutions face privacy concerns, high computational costs, and energy inefficiency, specifically in resource-constrained settings. Furthermore, integrating multi-sensor data and optimizing models for dynamic and heterogeneous conditions remain underexplored. Research gaps encompass the requirement for robust frameworks that ensure privacy, energy efficiency, and adaptability while giving seamless integration of advanced technologies like 6G, blockchain, and edge computing for scalable, real-time applications across broader use cases.

The proposed method

This study develops a MOEM-SMIADP model. The proposed model concentrates on detecting and classifying indoor activities using IoT applications for physically challenged people. It encompasses four steps: data normalization, MPA-based feature selection, an ensemble of classification models, and parameter selection using ICOA. Figure 1 illustrates the workflow of the MOEM-SMIADP model.

Min–max normalization

At first, the data preprocessing executes min–max normalization to convert input data into useful format²⁹. This is chosen because it can scale data to a fixed range, usually [0, 1] or [− 1, 1], ensuring uniform contribution from all features during the learning process. This technique is particularly effectual when features have varying scales, preventing dominant features from overshadowing smaller ones. Unlike standardization, which transforms data based on mean and standard deviation, min–max normalization preserves the data’s distribution and relationships, making it ideal for algorithms sensitive to an absolute scale, such as GCNs and LSTMs. Additionally, it improves the convergence speed of optimization algorithms, mitigating training time and improving stability. Its computational efficiency makes it well-suited for large datasets and real-time applications.

Min–max normalization is an effective data preprocessing mode, which measures feature values to a definite range from [0, 1], maintaining the relations within the data. In smart observing of indoor actions utilizing IoT applications, this model certifies consistency across data gathered from numerous sensors, which might have dissimilar ranges or units. For disabled persons, accurate recognition of anomalies and activities is vital, and min–max normalization decreases the impact of outliers and noise, thereby improving the excellence of data. It enhances the performance of ML techniques by delivering standardized inputs, permitting methods to classify refined variations in activity patterns. This ensures adaptive, reliable, and effective monitoring methods, which provide individual requirements, promoting safety and independence.

Feature selection using MPA

Furthermore, the MPA performs the feature selection process³⁰. This technique is chosen because it can effectually identify the most relevant features while discarding noisy or redundant ones. MPA replicates the intelligent foraging strategies of marine predators, effectively balancing exploration and exploitation during optimization. Unlike conventional techniques like correlation-based or wrapper methods, MPA can handle high-dimensional data and intrinsic interactions between features. This mitigates computational overhead and enhances the model’s performance by focusing only on significant features. Its adaptability and robustness in diverse data scenarios make it particularly appropriate for improving classifier accuracy and ensuring improved generalization.

The marine predator’s foraging movements stimulate MPA. Predators often switch between dual motion patterns: Brownian motion (BM), which involves consecutive moves in a similar position that improves exploitation, and Levy motion (LM), which involves short motions followed by higher jumps that increase exploration.

Stage (1). Initialization: the search space is packed using the randomly and uniformly distributed primary solutions.

Stage (2). The prey matrices are upgraded in stage (1), considered by higher-velocity ratios. This upgrade occurs in the first third of iterations while exploring problems.

$$\overrightarrow {{Step_{i} }} = \overrightarrow {{Step_{i} }} \otimes \left( {\overrightarrow {{Elite_{i} }} - \left( {\overrightarrow {{R_{B} }} \otimes \overrightarrow {{Prey_{i} }} } \right)} \right)$$

(1)

$$\overrightarrow {{Prey_{i} }} = \overrightarrow {{Prey_{i} }} + \left( {P \cdot \vec{R} - \left( {\overrightarrow {{R_{B} }} \otimes \overrightarrow {{Step_{i} }} } \right)} \right)$$

(2)

Meanwhile, RB denotes the vector of random numbers according to the standard distribution of BM, $P=0.5$, and $R$ signifies a vector of uniform randomly formed integers amongst 0 and 1.

Stage (3). Stage (2) upgrade is identified as the transitional optimizer stage, whereas the model shifts from exploration to exploitation. This procedure occurs in the second third of the iterations.

$$\overrightarrow {{Step_{i} }} = \overrightarrow {{R_{L} }} \otimes \left( {\overrightarrow {{Elite_{i} }} - \left( {\overrightarrow {{R_{L} }} \otimes \overrightarrow {{Prey_{i} }} } \right)} \right)$$

(3)

$$\overrightarrow {{Prey_{i} }} = \overrightarrow {{Prey_{i} }} + \left( {P \cdot \vec{R} \otimes \overrightarrow {{Step_{i} }} } \right)$$

(4)

$RL$, an arbitrary value vector according to the LM standard distribution, is multiplied through the prey during the step equation. For the second half of populations, it can be upgraded utilizing Eqs. (5) and (6).

$$\overrightarrow {{Step_{i} }} = \overrightarrow {{R_{B} }} \otimes \left( {\left( {\overrightarrow {{R_{B} }} \otimes \overrightarrow {{Elite_{i} }} } \right) - \overrightarrow {{Prey_{i} }} } \right)$$

(5)

$$\overrightarrow {{Prey_{i} }} = \overrightarrow {{Elite_{i} }} + \left( {P \cdot CF \otimes \overrightarrow {{Step_{i} }} } \right)$$

(6)

Determine that RB and the matrix of Elite in Eq. (5) are multiplied to mimic the BM. CF denotes the parameter for controlling the step dimensions and can be upgraded with Eq. (7).

$$CF= [1 - (1 / {MaxIter})] ^ {(2{Iter}/ {MaxIter})}$$

(7)

Stage (4). Stage 3, the last third of iterations, is measured as the last phase of the optimizer procedure. The predator moves to utilize LM, and subsequently, the prey matrix can be upgraded using Eqs. (8) and (9).

$$\overrightarrow {{Step_{i} }} = \overrightarrow {{R_{L} }} \otimes \left( {\left( {\overrightarrow {{R_{L} }} \to \otimes \overrightarrow {{Elite_{i} }} } \right) - \overrightarrow {{Prey_{i} }} } \right)$$

(8)

$$\overrightarrow {{Prey_{i} }} = \overrightarrow {{Elite_{i} }} + \left( {P \cdot CF \otimes \overrightarrow {{Step_{i} }} } \right).$$

(9)

The $RL$ and $Elite$ matrix multiplication mimics the predator’s motion in Lévy’s tactic.

Stage (5). Finalization Process: The optimal solutions are continually included in the matrix of the Elite succeeding all iterations. After achieving the maximal iteration counts, the last solution with the top fitness function (FF) will become apparent.

The FF imitates the accuracy of the classifier and the sum of the chosen features. It exploits the classifier’s accuracy and diminishes the set dimensions of the chosen features. Consequently, the FF is used to assess individual solutions, which is given in Eq. (10).

$$Fitness = \alpha *ErrorRate + \frac{{\left( {1 - \alpha } \right)*SF}}{{All_{F} }}$$

(10)

Here, $ErrorRate$ refers to the classification rate of error using the chosen features. $It$ has been computed as the ratio of improper classifications to the number of classifications between 0 and 1. $SF$ refers to the quantity of chosen features, and ${All}_{F}$ means the total amount of features. $\alpha$ is applied for controlling the import of classifier excellence and sub-set length.

Indoor activities detection using ensemble models

For the detection of indoor activities, the proposed MOEM-SMIADP model applies an ensemble of three classifiers: the GCN model, the LSTM-seq2seq method, and the CAE classifier. This ensemble technique is chosen for its ability to capture diverse and complementary data features. GCN outperforms in learning spatial and structural patterns, LSTM-seq2seq effectually handles temporal sequences and long-term dependencies, and CAE identifies complex latent features. This integration outperforms single-model approaches by utilizing the merits of each classifier, improving detection accuracy and robustness. Unlike standalone methods, the ensemble approach mitigates the risk of overfitting and adapts better to complex indoor activity patterns. Its versatility makes it specifically effective in handling multimodal and dynamic datasets.

GCN model

GCN is based on CNN, which concentrates on the convolutional process of operation³¹. It applies functional mapping to make novel node information by combining the neighbouring and the present node data. The space-based GCN straight aggregates the handling of graph-structured data based on nodes and edges, significantly decreasing the calculation sum, and is currently frequently utilized in networks. Here, a GCN‐based model is presented for learning to carry out the generator’s positions by removing power grid graph data. Applying the topologic architecture and bus attributes of the power grid, a graphical representation of the power grid $G$ can be originated, ${N}_{B}=\left\{1,\dots,{n}_{B}\right\}$ denotes a collection of buses in $G,$ ${N}_{F}=\left\{1,\dots,{n}_{F}\right\}$ represents a collection of bus feature sizes. Bus within the power grid is separated into dual types according to the absence or presence of the generator. This attribute is about loaded data and generator data related to limitations. The message concerning transmission lines related to limits also needs to be considered. Hence, the capability of transmission lines among buses must be combined into the feature matrix. The feature matrix construction $X$ for GCN can be established as shown.

$$\left\{ {\begin{array}{*{20}l} {\begin{array}{*{20}l} {x_{j} = \left[ {x_{{\grave{j}}}^{LF} ,x_{{\grave{j}}}^{G} } \right]} \hfill \\ {x_{{\grave{j}}}^{LF} = \left[ {p_{{\grave{j}},1}^{L} , \ldots ,p_{{{\grave{j}},n_{T} }}^{L} ,f_{{\grave{j}},1}^{L} , \ldots ,f_{{{\grave{j}}n_{B} }}^{L} } \right]} \hfill \\ {x_{{\grave{j}}}^{G} = g_{j} \left[ {P_{i}^{max} ,P_{i}^{min} , \ldots ,T_{i}^{on} ,T_{i}^{off} } \right]} \hfill \\ \end{array} } \hfill & {\forall i \in N_{G} ,\forall j \in N_{B} } \hfill \\ \end{array} } \right.$$

(11)

$$X={\left[{X}_{1},\dots,{X}_{nB}\right]}^T$$

(12)

When there are no lines of transmission among nodes $j$ and $j,{f}_{j,k}^{L}$ is equivalent to $0.$ ${g}_{j}$ denotes binary variable to designate if a generator lies in bus $j$; when a generator occurs at node $j,{g}_{j}$ is equivalent to $one$, or else it becomes $zero.$

Owing to the distinctive measurement elements related to the various features within the input feature data, straight calculations and evaluations are not possible. To tackle this problem, a normalization method is used to preprocess $X$. Essentially, this process is not utilized for the complete dataset. Still, it is performed distinctly for every feature. This method can improve the efficiency of the training. The succeeding equation is applied for the normalization procedure.

$${\grave{x}} = \frac{{x - x_{min} }}{{x_{max} - x_{min} }}$$

(13)

The GCN input is ${\grave{X}}$, and the initial hidden feature vector ${H}_{1}$ is gained after the initial graph convolution layer (GCL). This method has been executed by aggregating the neighbouring bus features and then passing over a linear transformation. Afterward numerous convolution layers, the last output outcome is gained over the fully connected layer (FCL). Meanwhile, $A$ is not normalized; handling $A$ would alter the feature vector scales. To resolve these issues, A and the matrix of degree $I$ are included and then standardized over the node matrix of degree $D.$

$${\grave{A}} = D^{{\frac{ - 1}{2}}} \left( {A + I} \right)D^{\frac{1}{2}}$$

(14)

Formerly, the computation of every GCL is stated as follows:

$$\left\{ {\begin{array}{*{20}l} {H_{l + 1} = h\left( {A,H_{l} } \right)} \hfill \\ {h\left( {H_{l} ,{\grave{A}}} \right) = \sigma \left( {{\grave{A}}*H_{l} *W_{l} } \right)} \hfill \\ \end{array} } \right.$$

(15)

While $\sigma \left( \cdot \right)$ denotes the activation function. Figure 2 illustrates the infrastructure of the GCN model.

As the preprocessed ${\grave{X}}$ aids as the input to GCN, each bus’s novel ${n}_{T}$ dimensional feature vectors are gained over the transformation of GCL and FCL, and the attainment of the forecast promise is characterized as shown.

$$U^{p} = F\left( {G\left( {{\grave{X}}} \right)} \right)$$

(16)

While $G\left( \cdot \right)$ specifies the GCN process, $F\left( \cdot \right)$ specifies the removal of consistent likelihood vectors.

GCN is applied to discover the binary variable patterns, so the learning task corresponds to a multi-class binary classification task.

$$Loss = BCELoss\left( {U^{p} ,{\grave{U}}} \right) = \mathop \sum \limits_{{i \in N_{G} }} \mathop \sum \limits_{{t \in N_{T} }} \left[ { - \left( {u_{i,t}^{p} \cdot log\left( {{\grave{u}}_{i,t} } \right) + \left( {1 + u_{i,t}^{p} } \right) \cdot log\left( {1 - {\grave{u}}_{i,t} } \right)} \right)} \right]$$

(17)

$Ontheotherhand,loss$ represents the variance between the target and the predicted values.

LSTM-seq2seq model

The LSTM-seq2seq framework contains a decoder and encoder³². In encoding NN, the input sequence $\left\{ {x_{1} ,x_{2} , \ldots ,x_{T} } \right\}$ with the time step counts $T$ is read a single time step at a period. Finally, the hidden layer ${h}_{T}$ creates a higher dimension $D$ vector, encoded to signify the data from the input series. The decoder NN framework proceeds the vector $D$ as the input to attain the resultant series $\left\{ {y_{1} ,y_{2} , \ldots ,y_{T} } \right\}$ across the loop directed. This computation, which involves the LSTM-seq2seq framework, is followed,

$$h_{t} = \psi \left( {x_{{t^{\prime}}} h_{t - 1} } \right)$$

(18)

$$D = \phi \left( {h_{{1^{\prime} }}, h_{t} } \right)$$

(19)

$$h_{t} = \theta \left( {y_{t - 1} ,h_{{t - 1^{\prime}}} D} \right)$$

(20)

While ${\grave{h}}_{t}$ and ${h}_{t}$ represent the HL in the decoder and encoder at the time step $t,$ correspondingly $\theta,$ $\psi,$ and $\phi$ denote non-linear function activation.

Assuming that $y^{m \times 1}$ is the monitoring deformation data, and $x^{m \times n}$ is the influencing factor sequence data, with $m$ the instance counts and $n$ the influencing factor counts, for example, time, temperature, water level, and more. The dual data sequences are first reordered into $x^{{\left( {m/T} \right) \times n \times T}}$ and $y^{{\left( {m/T} \right) \times 1 \times T}} ,$ respectively, by time steps $T$. The previous output of the HL ${h}_{T}$ is then led to the decoder. Every time step of the output decoder is given below:

$$y_{t} = \sigma \left( {{\grave{h}}_{t} } \right)$$

(21)

They are associated with sequence to form the last fitting outcome, and the distortion $y$ is lastly attained once the denormalization is over.

CAE classifier

The CAE effortlessly incorporates local convolution networks and conventional AEs, presenting a reconstruction feature to the convolutional method³³. This feature maps the transformation from input to output called convolutional decoding. Applying the fundamental unsupervised greedy training characteristic of AEs, it is becoming possible to calculate the parameters for either encoder or decoder processes. Now, $f$ signifies the convolutional encoder function, whereas $f$ represents the decoder counterparts. The input contains feature maps $x \in R^{{n \times l \times l}}$, both from the first layer or the previous opinion. This input includes $n$ feature mapping, all spanning a region of l × l pixels. The convolution AE process includes $m$ convolutional kernels, making $m$ feature mapping within the output layer. When such feature mapping derives from the input layer, $n$ describes the input channel counts. However, if they originated from previous layers, $n$ represents the complete output feature mapping of that last layer. The dimensions of the convolution kernels stand at $d$ × $d$, guaranteeing $d \le l.$

The group of parameters $\theta = \left\{ {W,{\grave{W}},b,{\grave{b}},} \right\}$ describes the learning basics of the convolution AE layer. During this, $b\in{R}^{m}$ and $W = \left\{ {w_{j} ,j = 1,2, \ldots ,m} \right\}$ be similar to the convolution encoding parameters. Now, every $w_{j} \in R^{n \times l \times l}$ may additionally be characterized as a vector $w_{j} \in R^{{nl^{2} }}$. However, $W = \left\{ {w_{j} ,j = 1,2, \ldots ,m} \right\}$ and ${\grave{b}}$ denotes parameters for the convolution decoding. For this one, ${\grave{b}} \in R^{{nl^{2} }}$ and every ${\grave{w}}_{j} \in R^{{1 \times nl^{2} }}$.

Firstly, the input image experiences an encoder method. In this stage, size patches $d$ × $d$ pixels signified as ${x}_{i}$ while $i = 1,2, \ldots ,p$, are removed from the input images. Then, for all patches, the weighting ${w}_{j}$ of the ${j}$th convolutional kernel was applied to convolutional processes. These outcomes within the calculation of the values of the neuron ${0}_{ij}$ for $j = 1,2, \ldots ,m$ denotes the output layer:

$$o_{ij} = f\left( {x_{i} } \right) = \sigma \left( {w_{j} \cdot x_{i} + b} \right)$$

(22)

While $\sigma$ signifies a non-linear activation function, the ReLU activation function has been applied in this study.

$$Relu\left( x \right) = \left\{ {\begin{array}{*{20}l} x \hfill & {x \ge 0} \hfill \\ 0 \hfill & {x < 0} \hfill \\ \end{array} } \right.$$

(23)

After this, the ${o}_{ij}$ output from the convolutional decoder experiences encoder, where ${x}_{i}$ is reconstructed with ${o}_{ij}$ to yield ${\grave{x}}_{i}$

$$x_{i} = f^{\prime}\left( {o_{ij} } \right) = \phi \left( {w_{i} \cdot o_{ij} + {\grave{b}}} \right)$$

(24)

Next, the convolutional encoder and decoder operations, ${\grave{x}}_{i}$, are produced for all samples. Deriving from the reconstruction process, $P$ patches and all dimensions of $d$ × $d$ are gained. The cost function can be described as the MSE among the novel patches of the input images ${x}_{i}$ $\left( {while \; i = 1,2, \ldots ,p} \right)$ and the reconstructed patches ${\grave{x}}_{i}$ $\left( {while \; i = 1,2, \ldots ,p} \right)$. The particular procedure of cost function is offered in Eq. (25), whereas the reconstruction error can be specified in Eq. (26).

$$J_{CAE} \left( \theta \right) = \frac{1}{p}\mathop \sum \limits_{i = 1}^{p} L\left[ {x_{i,} {\grave{x}}_{i} } \right]$$

(25)

$$L_{CAE} \left[ {x_{i,} {\grave{x}}_{i} } \right] = \left\| {x_{i} - {\grave{x}}_{i} } \right\|^{2} = \left\| {x_{i} - \phi \left( {\sigma \left( {x_{i} } \right)} \right)} \right\|^{2}$$

(26)

Applying SGD, the errors and weights are refined iteratively, resulting in the optimizer of the convolution AE layer. After training, this enhanced parameter yields the feature mapping, which is forwarded to the following layers.

During this study, the CAE method was accurately calculated using numerous layers, all providing particular functions for decoding and encoding the input data. This method begins using the input layer, which obtains the scalogram, which is then passed over consecutive convolution layers. This layer gradually decreases the image dimensionalities, separating the main features. Ensuing the encoder method, this method changes to the decoder stage. This reconstructed output is essential for classifying and identifying different error states in the pumps.

ICOA-based parameter selection

Eventually, the hyperparameter tuning method is implemented by ICOA to enhance the classification outcomes of ensemble models³⁴. This model was chosen because of its ability to optimize hyperparameters in complex ML models. The ICOA technique improves the standard COA approach by incorporating adaptive mechanisms for improved exploration and exploitation during the search process. Unlike grid or random search, ICOA dynamically navigates the search space, mitigating computation time while achieving more accurate parameter tuning. Its robustness in averting local optima ensures optimal configurations for ensemble models, enhancing accuracy and stability. This method is particularly advantageous for high-dimensional parameter spaces, where conventional techniques often face difficulty with efficiency and precision.

COA is a new metaheuristic intellectual optimizer model. Coati’s behaviours stimulate it, specifically their techniques of hunting and attacking iguanas while avoiding predators, to tackle optimization issues.

Initialization

The below-mentioned formulation signifies the original members of the coati population:

$$x_{i,j} = lb_{j} + r \cdot \left( {ub_{j} - lb_{j} } \right),\quad i = 1,2, \ldots ,N,\;\;j = 1,2, \ldots ,m$$

(27)

Here, ${x}_{i,j}$ signifies the $j$th dimensional location of $i$th coati, $r$ denotes a randomly generated actual number within an interval of $\left[ {0,1} \right],$ ${ob}_{j}$ and $l{b}_{j}$ indicate the upper and lower bounds in the dimension $j$, respectively; $N$ specifies an amount of coati population, $m$ represents several sizes.

Hunt and attack tactics

Coatis will search iguanas by climbing trees, and the below-given formulation will signify the coati’s location in the tree:

$$\begin{aligned} x_{i,j}^{P1} & = x_{i,j} + r \cdot \left( {Iguana_{j} - I \cdot x_{i,j} } \right), \\ & \quad for\;i = 1,2, \ldots ,\left\lfloor \frac{N}{2} \right\rfloor \wedge j = 1,2, \ldots ,m \\ \end{aligned}$$

(28)

The subsequent formulations express the coati’s location on the ground and then the iguana’s arrival:

$$\begin{aligned} & Iguana_{j}^{G} = lb_{j} + r \cdot \left( {ub_{j} - lb_{j} } \right),\quad j = 1,2, \ldots ,m, \\ & x_{i,j}^{P1} = \left\{ {\begin{array}{*{20}l} {x_{i,j} + r \cdot \left( {Iguana_{j}^{G} - I \cdot x_{ij} } \right),} \hfill & {\quad F_{{Iguana^{G} }} < F_{i} ,} \hfill \\ {x_{i,j} + r \cdot \left( {x_{i,j} - Iguana_{j}^{G} } \right),} \hfill & {\quad else,} \hfill \\ \end{array} } \right. \\ & \quad \quad \quad for\;i = \left\lfloor \frac{N}{2} \right\rfloor + 1,\left\lfloor \frac{N}{2} \right\rfloor + 2, \ldots ,N \wedge j = 1,2, \ldots ,m \\ \end{aligned}$$

(29)

If the coati’s novel site improves an objective value of a function, then it is accepted; or else, it stays unmoved. This process is stated to as the greedy law and is expressed below:

$$x_{i,j}^{P1} = \left\{ {\begin{array}{*{20}l} {x_{i,j}^{P1} ,} \hfill & {\quad F_{i}^{P1} < F_{j}, } \hfill \\ {x_{i,j}, } \hfill & {\quad else.} \hfill \\ \end{array} } \right.$$

(30)

While, ${x}_{i,j}^{P1}$ signifies the new site of $i$th coati in dimension $j$, $r$ refers to a stochastic actual numeral in the range of $\left[ {0,1} \right],$ $Iguan{a}_{j}$ represents the dimensional $j$ location of iguana, signifying the location of an optimum member of the population, $I$ is a number which is selected at random in the group of {1, 2}. $Iguan{a}_{j}^{G}$ indicates the $j$th dimensional location of arbitrarily generated iguana under the tree, ${F}_{i}$ represents the main function of $ith$ coati value, and ${F}_{Iguan{a}^{G}}$ specifies the main value of function of iguana on the base.

Escape from predators

Once a coati challenges a predator, the below-mentioned formulations can signify the random position of the coati’s escape:

$$\begin{aligned} & lb_{j}^{local} = \frac{{lb_{i} }}{t},ub_{j}^{local} = \frac{{ub_{i} }}{t},\quad where\;t = 1,2, \ldots ,T. \\ & x_{i,j}^{P2} = x_{i,j} + \left( {1 - 2r} \right) \cdot \left( {lb_{j}^{local} + r \cdot \left( {ub_{j}^{local} - lb_{j}^{local} } \right)} \right), \\ & \quad \quad \quad i = 1,2, \ldots ,N,\;j = 1,2, \ldots ,m, \\ \end{aligned}$$

(31)

$l{b}_{j}^{local}$ and $u{b}_{j}^{local}$ signify the local and upper bounds of the $j$th dimension, respectively; $t$ denotes the present number of iterations, while $T$ specifies the maximum iteration count. ${x}_{i,j}^{P2}$ refers to the novel location of $i$th coati in $j$th dimension throughout the 2nd phase.

The initialized population quality is essential in the meta-heuristic technique, considerably manipulating both the convergence speed and the accuracy of the last solution. The traditional COA utilizes a randomly generated model for initialization, which leads to a non-uniform spread of solution individuals. Here, the population initialization procedure is enhanced by applying the refractive opposite learning tactic to supplement the model’s performance by enlarging its range of searches.

The refractive index $n$ was determined from the regular relationship.

$$n = \frac{sin\alpha }{{cos\beta }} = \frac{{d\left( {\left( {lb + ub} \right)/2 - x} \right)}}{{d\left( {x - \left( {lb + ub} \right)/2} \right)}}$$

(32)

Assume that $k=l/{l}$, comprised in the abovementioned formulation and prolonged to the multi-dimensional space, yields the refractive opposite solution ${x}_{i,j}$:

$$x_{i,j} = \frac{{lb_{j} + ub_{j} }}{2} + \frac{{lb_{j} + ub_{j} }}{2k} - \frac{{x_{i,j} }}{k}$$

(33)

In the COA expansion stage, the coati adapts its site throughout the search procedure as per the present individual optimal, resulting in an early convergence to a local optimal solution by creating its efficiency in global exploration. This study provides a Levy flight method to improve the location upgrade procedure and tackle this limitation.

This technique yields randomized step distances and widens the exploration area, possibly improving the range of the coati population. The improved model for upgrading locations of coati is given below:

$$\sigma = \left( {\frac{{\Gamma \left( {1 + \beta } \right)sin\left( {\frac{\pi \beta }{2}} \right)}}{{\Gamma \left( {\frac{1 + \beta }{2}} \right)\beta \cdot 2^{{\frac{\beta - 1}{2}}} }}} \right)$$

(34)

$$Levy\left( \beta \right) = 0.01 \cdot u \cdot \frac{{r_{5} }}{{v^{{\frac{1}{\beta }}} }}$$

(35)

$\varGamma$ denotes the usual function of Gamma, $\beta$ refers to an arbitrarily produced variable within the interval of $\left[ {0,2} \right],$ ${r}_{5}$ signifies the generated variable at random within the range $\left[ {0,1} \right]$, $u$ and $v$ obey the usual distributions $u \sim N\left( {0,\sigma_{2} } \right)$ and $v \sim N\left( {0,1} \right)$, correspondingly. The mathematical formulation is expressed below:

$$x_{i,j}^{P2} = Levy\left( \beta \right) \cdot x_{i,j} + \left( {1 - 2r} \right).$$

$$\left( {lb_{j}^{local} + r \cdot \left( {ub_{j}^{local} - lb_{j}^{local} } \right)} \right)$$

(36)

The Levy flight approach has been presented to the COA growth stage to improve its global exploration capability and alleviate early convergence. By making longer-distance jumps in the solution space, Levy flight permits a more varied population distribution by enhancing the model’s capability to run away from local goals and efficiently discover the global optimal.

Fitness selection is one of the great factors inducing the outcome of the ICOA approach. The range of the hyperparameter model comprises the solution-encoded system for estimating the efficiency of the candidate solution. Currently, the ICOA approach studies accuracy as the foremost standard for planning FF.

$$Fitness = max\left( P \right)$$

(37)

$$P = \frac{TP}{{TP + FP}}$$

(38)

Where $TP$ signifies the positive value of true, and $FP$ represents the positive value of false.

Result analysis and discussion

The MOEM-SMIADP technique’s simulation validation is verified under the HAR dataset³⁵. The dataset contains 10,100 records under six classes, as shown in Table 1. The total number of features is 561, but only 285 are selected.

Table 1 Details of the HAR dataset.

Full size table

Figure 3 represents the classifier results of the MOEM-SMIADP methodology on the HAR dataset. Figure 3a and b displays the confusion matrices with correct recognition and classification of all classes under 70%TRPH and 30%TSPH. Figure 3c demonstrates the PR analysis, indicating superior performance over all class labels. At the same time, Fig. 3d demonstrates the ROC values, indicating capable results with better ROC analysis for dissimilar classes.

In Table 2 and Fig. 4, the indoor activity detection of the MOEM-SMIADP approach is established on the HAR dataset. The results reported that the MOEM-SMIADP approach correctly discriminated each sample. On 70%TRPH, the MOEM-SMIADP approach offers average $acc{u}_{y}$ of 98.61%, $pre{c}_{n}$ of 95.80%, $rec{a}_{l}$ of 95.75%, $F{1}_{score}$ of 95.77%, and ${G}_{Measure}$ of 95.77%. Similarly, on 30%TRPH, the MOEM-SMIADP model presents an average $acc{u}_{y}$ of 98.51%, $pre{c}_{n}$ of 95.58%, $rec{a}_{l}$ of 95.57%, $F{1}_{score}$ of 95.57%, and ${G}_{Measure}$ of 95.77%.

Table 2 Indoor activity detection of MOEM-SMIADP model on HAR dataset.

Full size table

Figure 5 illustrates the training (TRA) $acc{u}_{y}$ and validation (VAL) $acc{u}_{y}$ analysis of the MOEM-SMIADP technique on HAR dataset. The $acc{u}_{y}$ analysis is calculated over the range of 0–50 epochs. The figure highlights that the TRA and VAL $acc{u}_{y}$ analysis shows an increasing tendency, which informed the capacity of the MOEM-SMIADP methodology with maximal outcomes across various iterations. Furthermore, the TRA and VAL $acc{u}_{y}$ leftovers closer across the epochs, which specifies inferior overfitting and demonstrates a higher result of the MOEM-SMIADP method, assuring continuous prediction on hidden samples.

Figure 6 shows the TRA loss (TRALOS) and VAL loss (VALLOS) curves of the MOEM-SMIADP technique on the HAR dataset. The loss values are computed within the range of 0–50 epochs. It is denoted that the TRALOS and VALLOS values exemplify a diminishing tendency, notifying the capacity of the MOEM-SMIADP method in balancing a tradeoff between data fitting and simplification.

Table 3 and Fig. 7 compare the outcomes of the MOEM-SMIADP method on the HAR dataset with those of the existing techniques. The outcomes highlight that the CNN, CNN-LSTM, Lightweight CNN, EDA-LSTM, WISNet, MLP, CNN-2D, and Optimized ResNet-34 methodologies have reported lowest performance. Meanwhile, the CNN-BiLSTM method has attained closer outcomes. Simultaneously, the MOEM-SMIADP approach reported maximal performance with lesser $pre{c}_{n}$, $rec{a}_{l},$ $acc{u}_{y},$ and ${F1}_{score}$ of 95.80%, 95.75%, 98.61%, and 95.77%, correspondingly.

Table 3 Comparative analysis of the MOEM-SMIADP model on the HAR dataset.

Full size table

In Table 4 and Fig. 8, the comparative results of the MOEM-SMIADP method on the HAR dataset are identified in terms of processing time (PT). Based on PT, the MOEM-SMIADP method offers minimal CT of 1.05 s whereas the CCN, CNN-LSTM, Lightweight CNN, CNN-BiLSTM, EDA-LSTM, WISNet, MLP, CNN-2D, and Optimized ResNet-34 approaches achieve superior PT values of 2.53 s, 3.16 s, 5.06 s, 3.52 s, 6.33 s, 3.77 s, 4.52 s, 5.90 s, and 3.98 s, respectively.

Table 4 PT outcome of MOEM-SMIADP technique on HAR dataset.

Full size table

Likewise, the performance evaluation of the MOEM-SMIADP technique is verified below the WISDM dataset³⁶. The dataset consists of 30,000 instances below six classes, as shown in Table 5. There are 5 number of features, but only 3 features are selected.

Table 5 Details of WISDM dataset.

Full size table

Figure 9 shows the classifier performances of the MOEM-SMIADP method on the WISDM dataset. Figure 9a and b illustrates the confusion matrices with specific classification and identification of all classes below 70%TRPH and 30%TSPH. Figure 9c exemplifies the PR study, which noted the enhanced performance of all class labels. Eventually, Fig. 9d demonstrates the ROC study, which signifies proficient performances with great ROC values for all dissimilar classes.

In Table 6 and Fig. 10, the indoor activity detection of the MOEM-SMIADP approach is depicted on the WISDM dataset. The performances indicated that the MOEM-SMIADP approach accurately differentiated all the samples. Using 70%TRPH, the MOEM-SMIADP model provides average $acc{u}_{y}$ of 99.06%, $pre{c}_{n}$ of 97.20%, $rec{a}_{l}$ of 97.19%, $F{1}_{score}$ of 97.19%, and ${G}_{Measure}$ of 97.19%. Additionally, using 30%TRPH, the MOEM-SMIADP method delivers an average $acc{u}_{y}$ of 99.07%, $pre{c}_{n}$ of 97.22%, $rec{a}_{l}$ of 97.22%, $F{1}_{score}$ of 97.22%, and ${G}_{Measure}$ of 97.22%.

Table 6 Indoor activity detection of MOEM-SMIADP model on the WISDM dataset.

Full size table

Figure 11 depicts the TRA $acc{u}_{y}$ and VAL $acc{u}_{y}$ performances of the MOEM-SMIADP technique on the WISDM dataset. The $acc{u}_{y}$ values are calculated through an interval of 0–50 epochs. The figure implied that the values of TRA and VAL $acc{u}_{y}$ presents a increasing trend, indicating the capacity of the MOEM-SMIADP technique with higher performance across numerous repetitions. In addition, the TRA and VAL $acc{u}_{y}$ values remain close through the epochs, notifying lesser overfitting and displaying the MOEM-SMIADP model’s superior performance, which assurances reliable prediction on unnoticed samples.

Figure 12 shows the TRALOS and VALLOS graph of the MOEM-SMIADP approach on the WISDM dataset. The loss values are computed across a range of 0–50 epochs. The values of TRALOS and VALLOS show a diminishing trend, which indicates the proficiency of the MOEM-SMIADP model in harmonizing a tradeoff between generalization and data fitting.

Table 7 and Fig. 13 study the comparison results of the MOEM-SMIADP method on the WISDM dataset with the existing techniques^37,38,39,40. The performances indicated that the CNN-LSTM, CNN, CNN-BiLSTM, ALSTM-1D CNN, Mechanism-DL, WISNet, MLP, CNN-2D, and Optimized ResNet-34 models have testified to poorer performance. While, the MOEM-SMIADP model stated maximal performance with higher $pre{c}_{n}$, $rec{a}_{l},$$acc{u}_{y},$ and ${F1}_{score}$ of 97.22%, 97.22%, 99.07%, and 97.22%, respectively.

Table 7 Comparative analysis of the MOEM-SMIADP model on the WISDM dataset^37,38,39,40.

Full size table

In Table 8 and Fig. 14, the comparative analysis of the MOEM-SMIADP approach on the WISDM dataset is identified in terms of execution time (ET). According to ET, the MOEM-SMIADP approach presents minimal ET of 4.57 s while the CNN-LSTM, CNN, CNN-BiLSTM, ALSTM-1D CNN, Mechanism-DL, WISNet, MLP, CNN-2D, and Optimized ResNet-34 methodologies obtain better ET values of 7.33 s, 7.45 s, 17.58 s, 11.36 s, 16.21 s, 9.50 s, 9.03 s, 17.15 s, and 9.35 s, respectively.

Table 8 ET outcome of MOEM-SMIADP technique on WISDM dataset.

Full size table

Conclusion

In this study, a MOEM-SMIADP model is developed. The proposed MOEM-SMIADP model concentrates on detecting and classifying indoor activities using IoT applications for physically challenged people. It encompasses four steps: data normalization, MPA-based feature selection, an ensemble of classification models, and parameter selection using ICOA. At first, the data preprocessing executes min–max normalization to convert input data into useful format. Furthermore, the MPA has been applied to the process of feature selection. For the detection of indoor activities, the proposed MOEM-SMIADP model applies an ensemble of three classifiers: the GCN model, the LSTM-seq2seq method, and the CAE. Eventually, the hyperparameter tuning method is implemented by ICOA to enhance the classification outcomes of ensemble models. A wide range of experiments was accompanied to endorse the performance of the MOEM-SMIADP technique. The performance validation of the MOEM-SMIADP technique portrayed a superior accracy value of 99.07% over existing methods. The limitation of the MOEM-SMIADP technique is its dependence on specific data characteristics, which may not be generalized well to all indoor environments or activity types. The model’s performance might degrade when faced with noisy or incomplete data, as it needs high-quality, well-labelled datasets for optimal results. Furthermore, the computational complexity of the ensemble approach may limit its scalability in real-time applications. Future work may focus on incorporating more robust data preprocessing techniques to handle noise, exploring transfer learning to adapt to diverse environments, and improving the efficiency of the model for deployment in resource-constrained devices. Further investigation into hybrid models incorporating multiple data modalities could also improve activity detection accuracy.

Data availability

The data that support the findings of this study are openly available at https://archive.ics.uci.edu/dataset/240/human+activity+recognition+using+smartphones and https://www.cis.fordham.edu/wisdm/dataset.php, reference number^35,36.

References

Abokhoza, R. & Jahmani, A. Towards retention in airline industry using neutrosophic DEMATEL method: Does social media marketing activities affect passengers’ retention. Int. J. Neutrosophic Sci. IJNS 21(2), 161–176 (2023).
Article Google Scholar
Dhiman, C. & Vishwakarma, D. K. A review of state-of-the-art techniques for abnormal human activity recognition. Eng. Appl. Artif. Intell. 77, 21–45 (2019).
Article MATH Google Scholar
Gupta, N. et al. Human activity recognition in artificial intelligence framework: A narrative review. Artif. Intell. Rev. 55(6), 4755–4808 (2022).
Article PubMed PubMed Central MATH Google Scholar
Alotaibi, F. et al. Internet of Things-driven human activity recognition of elderly and disabled people using arithmetic optimization algorithm with LSTM autoencoder. J. Disabil. Res. 2(3), 136–146 (2023).
Article Google Scholar
Perez, A. J., Siddiqui, F., Zeadally, S. & Lane, D. A review of IoT systems to enable independence for the elderly and disabled individuals. Internet Things 21, 100653 (2023).
Article Google Scholar
Rakshanasri, S. L., Naren, J., Vithya, G., Akhil, S. & Kumar, D. A framework on health smart home using IoT and machine learning for disabled people. Int. J. Psychosoc. Rehabil. 24(2), 01–09 (2020).
Google Scholar
Brik, B., Esseghir, M., Merghem-Boulahia, L. & Snoussi, H. An IoT-based deep learning approach to analyze indoor thermal comfort of disabled people. Build. Environ. 203, 108056 (2021).
Article Google Scholar
Bibbò, L., Carotenuto, R. & Della Corte, F. An overview of indoor localization system for human activity recognition (HAR) in healthcare. Sensors 22(21), 8119 (2022).
Article ADS PubMed PubMed Central MATH Google Scholar
Lentzas, A. & Vrakas, D. Non-intrusive human activity recognition and abnormal behavior detection on elderly people: A review. Artif. Intell. Rev. 53(3), 1975–2021 (2020).
Article MATH Google Scholar
Arias, E. J., Paz, L. M. A. & Chalacan, L. M. Multi-sensor data fusion for accurate human activity recognition with deep learning. Fusion Pract. Appl. 13(2), 62–72 (2023).
Article Google Scholar
Chen, J., Xu, X., Wang, T., Jeon, G. & Camacho, D. An AIoT framework with multimodal frequency fusion for WiFi-based coarse and fine activity recognition. IEEE Internet Things J. 11, 39020–39029 (2024).
Article Google Scholar
Berkani, M. R. A., Chouchane, A., Himeur, Y., Ouamane, A. & Amira, A. An intelligent edge-deployable indoor air quality monitoring and activity recognition approach. In 2023 6th International Conference on Signal Processing and Information Security (ICSPIS), 184–189 (IEEE, 2023).
Sun, H. & Chen, Y. A rapid response system for elderly safety monitoring using progressive hierarchical action recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 32, 2134–2142 (2024).
Article PubMed MATH Google Scholar
Lee, C., Kang, H. M., Jeon, Y. & Kang, S. J. Ambient sound analysis for non-invasive indoor activity detection in edge computing environments. In 2023 IEEE Symposium on Computers and Communications (ISCC), 1–6 (IEEE, 2023).
Mohanaprakash, T. A., Kumar, D., Naveen, P. & Karuppiah, S. Cloud-Enabled Blockchain and IoT-Based Assisted Living System in 6G Networks: Enhancing Quality of Life and Privacy (2024).
Kan, R. et al. Indoor human action recognition based on dual kinect V2 and improved ensemble learning method. Sensors 23(21), 8921 (2023).
Article ADS PubMed PubMed Central Google Scholar
Srinivasan, S., Sridevi, V., Saravanan, K., Murugan, S., Srinivasan, C. & Muthulekshmi, M. Adaptive thermal clothing with IoT and random forest regression for dynamic outdoor comfort. In 2024 International Conference on Advances in Modern Age Technologies for Health and Engineering Science (AMATHE), 1–5 (IEEE, 2024).
Manimaran, M., Kumar, A. S., Natteshan, N. V. S., Baranitharan, K., Mahaveerakannan, R. & Sudhakar, K. Detecting the human activities of aging people using restricted Boltzmann machines with deep learning technique in IoT. In 2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS), 105–110 (IEEE, 2023).
Xiao, L., Luo, K., Liu, J. & Foroughi, A. A hybrid deep approach to recognizing student activity and monitoring health physique based on accelerometer data from smartphones. Sci. Rep. 14(1), 14006 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Shereef, S., Varghese, N. & Kamalraj, R. Unlocking the power of data: Leveraging IoT and cloud for better sleep health. In Revolutionizing Healthcare Systems Through Cloud Computing and IoT, 179–204 (IGI Global, 2025).
Rezaee, K. An advanced deep learning structure for accurate student activity recognition and health monitoring using smartphone accelerometer data. Health Manag. Inf. Sci. 11, 85–97 (2024).
MATH Google Scholar
Anitha, A., Nandhini, N., Balakrishnan, K. & Perumal, T. Improving elder care: Vision-based wearable technology for fall recognition and prevention. In Smart Healthcare Systems (eds Bhambri, P. et al.) 304–317 (CRC Press, 2025).
Google Scholar
Maddeh, M. et al. Ensemble learning-based smartbed system for enhanced patient care. J. Disabil. Res. 2(1), 26–34 (2023).
Article MATH Google Scholar
Akhmetshin, E., Nemtsev, A., Shichiyakh, R., Shakhov, D. & Dedkova, I. Evolutionary algorithm with deep learning based fall detection on Internet of Things environment. Fusion Pract. Appl. 14(2), 132–145 (2024).
Article Google Scholar
Namoun, A. et al. Service selection using an ensemble meta-learning classifier for students with disabilities. Multimodal Technol. Interact. 7(5), 42 (2023).
Article Google Scholar
Jawad, M. et al. Energy optimization and plant comfort management in smart greenhouses using the artificial bee colony algorithm. Sci. Rep. 15(1), 1752 (2025).
Article CAS PubMed PubMed Central Google Scholar
Kao, W. C., Fan, Y. L., Hsu, F. R., Shen, C. Y. & Liao, L. D. Next-generation swimming pool drowning prevention strategy integrating AI and IoT technologies. Heliyon 10(18), 1–15 (2024).
Article MATH Google Scholar
Yazici, A. et al. A smart e-health framework for monitoring the health of the elderly and disabled. Internet Things 24, 100971 (2023).
Article Google Scholar
Shantal, M., Othman, Z. & Bakar, A. A. A novel approach for data feature weighting using correlation coefficients and min–max normalization. Symmetry 15(12), 2185 (2023).
Article ADS MATH Google Scholar
Hattabi, I. et al. Enhanced power system stabilizer tuning using marine predator algorithm with comparative analysis and real time validation. Sci. Rep. 14(1), 28971 (2024).
Article CAS PubMed PubMed Central Google Scholar
Gao, L. et al. A topology-guided high-quality solution learning framework for security-constraint unit commitment based on graph convolutional network. Int. J. Electr. Power Energy Syst. 164, 110322 (2025).
Article Google Scholar
Wang, L., Wang, J., Tong, D. & Wang, X. A novel long short-term memory seq2seq model with chaos-based optimization and attention mechanism for enhanced dam deformation prediction. Buildings 14(11), 3675 (2024).
Article MATH Google Scholar
Zaman, W., Ahmad, Z. & Kim, J. M. Fault diagnosis in centrifugal pumps: A dual-scalogram approach with convolution autoencoder and artificial neural network. Sensors 24(3), 851 (2024).
Article ADS PubMed PubMed Central MATH Google Scholar
Gong, X. et al. Safety status prediction model of transmission tower based on improved coati optimization-based support vector machine. Buildings 14(12), 3815 (2024).
Article MATH Google Scholar
https://archive.ics.uci.edu/dataset/240/human+activity+recognition+using+smartphones.
https://www.cis.fordham.edu/wisdm/dataset.php.
Nafea, O., Abdul, W., Muhammad, G. & Alsulaiman, M. Sensor-based human activity recognition with spatio-temporal deep learning. Sensors 21(6), 2141 (2021).
Article ADS PubMed PubMed Central MATH Google Scholar
Sharen, H. et al. WISNet: A deep neural network based human activity recognition system. Expert Syst. Appl. 258, 124999 (2024).
Article Google Scholar
He, Z., Sun, Y. & Zhang, Z. Human activity recognition based on deep learning regardless of sensor orientation. Appl. Sci. 14(9), 3637 (2024).
Article CAS MATH Google Scholar
Khan, I., Guerrieri, A., Serra, E. & Spezzano, G. A hybrid deep learning model for UWB radar-based human activity recognition. Internet Things 2024, 101458. https://doi.org/10.1016/j.iot.2024.101458 (2024).
Article MATH Google Scholar

Download references

Acknowledgements

The authors extend their appreciation to the King Salman center For Disability Research for funding this work through Research Group no KSRG-2024- 426.

Author information

Authors and Affiliations

Department of Computer Science, Applied College at RijalAlmaa, King Khalid University, Abha, Saudi Arabia
Munya A. Arasi
Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
Hussah Nasser AlEisa
Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, AlKharj, Saudi Arabia
Amani A. Alneil
King Salman Center for Disability Research, 11614, Riyadh, Saudi Arabia
Amani A. Alneil
Department of Mathematics, Faculty of Science, Cairo University, Giza, 12613, Egypt
Radwa Marzouk

Authors

Munya A. Arasi
View author publications
Search author on:PubMed Google Scholar
Hussah Nasser AlEisa
View author publications
Search author on:PubMed Google Scholar
Amani A. Alneil
View author publications
Search author on:PubMed Google Scholar
Radwa Marzouk
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: Munya A. Arasi, Hussah Nasser AlEisa, Data curation and Formal analysis: Amani A Alneil and Radwa MarzoukInvestigation and Methodology: Hussah Nasser AlEisa, Amani A Alneil and Radwa Marzouk, Project administration and Resources: Munya A. Arasi, Writing—original draft: Munya A. Arasi, Hussah Nasser AlEisa, Amani A Alneil and Radwa MarzoukValidation and Visualization: Munya A. Arasi, Hussah Nasser AlEisa, Amani A Alneil and Radwa Marzouk, Writing—review and editing, Munya A. Arasi, Hussah Nasser AlEisa, Amani A Alneil and Radwa MarzoukAll authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Munya A. Arasi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Arasi, M.A., AlEisa, H.N., Alneil, A.A. et al. Artificial intelligence-driven ensemble deep learning models for smart monitoring of indoor activities in IoT environment for people with disabilities. Sci Rep 15, 4337 (2025). https://doi.org/10.1038/s41598-025-88450-1

Download citation

Received: 13 December 2024
Accepted: 28 January 2025
Published: 05 February 2025
DOI: https://doi.org/10.1038/s41598-025-88450-1

Keywords

This article is cited by

Ensemble of deep learning and IoT technologies for improved safety in smart indoor activity monitoring for visually impaired individuals
- Mesfer Al Duhayyim
Scientific Reports (2025)
Advanced internet of things enhanced activity recognition for disability people using deep learning model with nature-inspired optimization algorithms
- Mohammed Maray
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

Smart indoor monitoring for disabled individuals using an ensemble of deep learning models in an IoT environment

Internet of things enabled indoor activity monitoring for visually impaired people with hybrid deep learning and optimized algorithms for enhanced safety

Advanced internet of things enhanced activity recognition for disability people using deep learning model with nature-inspired optimization algorithms

Introduction

Related works

The proposed method

Min–max normalization

Feature selection using MPA

Indoor activities detection using ensemble models

GCN model

LSTM-seq2seq model

CAE classifier

ICOA-based parameter selection

Initialization

Hunt and attack tactics

Escape from predators

Result analysis and discussion

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Ensemble of deep learning and IoT technologies for improved safety in smart indoor activity monitoring for visually impaired individuals

Advanced internet of things enhanced activity recognition for disability people using deep learning model with nature-inspired optimization algorithms

Search

Quick links