Empowering people with intellectual disabilities using integrated deep learning architecture driven enhanced text-based emotion classification

Al-Hagery, Mohammed Abdullah; Shili, Hechmi; Aljohani, Nasser; Yaseen, Ishfaq

doi:10.1038/s41598-025-22525-x

Download PDF

Article
Open access
Published: 04 November 2025

Empowering people with intellectual disabilities using integrated deep learning architecture driven enhanced text-based emotion classification

Mohammed Abdullah Al-Hagery¹,
Hechmi Shili²,
Nasser Aljohani³ &
…
Ishfaq Yaseen^4,5

Scientific Reports volume 15, Article number: 38568 (2025) Cite this article

3080 Accesses
1 Citations
Metrics details

Subjects

Abstract

Emotion recognition is an important research field including psychology, healthcare, and human-computer interaction (HCI). However, conventional techniques mainly rely on textual analysis and facial expressions, and they also have potential flaws, making them unreliable. Textual language is the most common carrier of human emotions and its analysis relies on the available data. In natural language processing (NLP), textual emotion recognition (TER) has become a significant area of research due to its essential commercial and academic applications. With the growth of deep learning (DL) technologies, TER has seen a growing interest and has undergone considerable upgrades recently. This paper proposes an Intelligent Emotion Recognition from Text Using a Hybrid Deep Learning Model and Word Embedding Process (IERT-HDLMWEP) model. The aim is to develop a DL-based system for accurate text emotion recognition to support communication for people with disabilities. Initially, the text pre-processing stage involves several typical steps to develop the analysis and minimize the dimensionality of the input data. The IERT-HDLMWEP method creates a hybrid feature representation by integrating pre-trained Word2Vec vectors weighted by TF-DF category distribution and enriched with Part-of-Speech features to improve emotion detection in text. Finally, the hybrid of a convolutional neural network and a bidirectional gated recurrent unit with an attention mechanism (C-BiG-A) technique is employed for the classification process. A comprehensive simulation was implemented to verify the performance of the IERT-HDLMWEP method in emotion detection from the Text dataset. The empirical outcomes indicated that the IERT-HDLMWEP methodology emphasized improvement over other existing techniques.

Introduction

Disability is one of the significant issues which remains a challenge. Residual effects of disability is seen as barriers in the form of mental or physical impairments that affects the development and participation of an individual¹. As a result, intense attempts are made to eliminate this kind of obstruction. Individuals with disabilities rely on others to fulfil their needs. Artificial intelligence (AI) is a field of computer science that aims to devise intelligent computer systems, which display qualities associated with human intelligence, including learning, problem-solving, language comprehension, and decision-making². One significant role of AI has persisted in NLP, which combines linguistic and computational methods to enable computers to facilitate HCI and human language understanding. Sentiment analysis (SA), text summarisation, machine translation, and speech recognition fall under the research fields in the area of NLP³. Emotions are characterized into six kinds: sadness, joy, surprise, fear, disgust, and anger. Additionally, emotions are also defined in numerous types, including optimism and love⁴. In contrast to facial expressions and speech recognition, text sentences often fail to convey themselves as they are bland. Due to the difficulty and vagueness, it is a challenging task to accurately identify the emotions expressed in the text. Determining the emotion of the provided text is a complex process, as each word may have a distinct morphological style and meaning⁵. Emotional interaction is a prevalent cognitive event in humans’ everyday life. Precise emotion recognition is the principle of efficient human communication, decision-making, and interaction. With the growth of AI and big data, emotion recognition has become a standard research project in both industry and academia⁶. Since the straightforward and fundamental medium, textual data arising from emotional communication, is frequently employed to understand emotional conditions. TER for people with disabilities is dedicated to automatically recognizing emotional modalities in textual terminologies, like Angry, Sad, and Happy. TER is a more detailed examination when compared to SA, and has attained substantial interest from academic circles⁷. For humans, emotions are identified rapidly depending on their emotional states. At the same time, for automated TER systems, calculation approaches should be constantly created and enhanced to attain more precise emotion prediction. AI implemented in NLP leverages computational and linguistic methods to enable machines to comprehend basic phenomena, such as emotions or sentiments, from texts⁸. Thus, the primary objective is to examine thoughts, ideas, and opinions through the polarities task, encompassing both positive and negative perspectives. In the NLP domain, most analyses regarding TER are performed at a coarse-grained sentiment level and are based on conventional machine learning (ML) methods⁹. While they review advanced methodologies of TER, DL-driven TER methods are only concisely stated without a detailed and thorough outline. Currently, DL models have rectified their reliance on manual feature extraction and achieved good results. The DL approaches have proven their extraordinary performance, frequently achieving advanced outcomes in TER tasks¹⁰.

This paper develops an Intelligent Emotion Recognition from Text Using a Hybrid Deep Learning Model and Word Embedding Process (IERT-HDLMWEP) model. The key contributions of this manuscript are mentioned below:

A unique IERT-HDLMWEP model is designed for a TER system for people with disabilities.
The text pre-processing step involves various standard stages designed to enhance analysis and reduce input data dimensionality, thereby supporting better feature extraction and improving overall model performance by filtering noise and standardizing input formats.
The IERT-HDLMWEP model adopts a hybrid feature representation by integrating Word2Vec, TF-IDF-CDW, and POS encoding to improve emotion detection in textual data. This integration captures semantic meaning, contextual weight, and syntactic structure, enriching feature quality. It enables the model to detect subtle emotional cues across varied linguistic expressions and improves classification robustness and generalization across diverse emotional contexts.
The hybrid C-BiG-A technique, integrating a CNN and BiGRU with an AM model, is utilized for the final classification. This architecture captures both local and long-range dependencies in the text, improving contextual understanding. The AM also refines its focus on emotion-relevant words, thereby improving interpretability. Overall, it enhances the model’s capability to distinguish complex emotional patterns in textual data.
This IERT-HDLMWEP methodology uniquely integrates convolutional and BiGRU with an AM to capture multi-level contextual features. By effectively incorporating spatial and temporal data, it improves emotion detection beyond conventional methods. The AM dynamically highlights crucial emotional cues, enhancing model focus and accuracy. This novel integration advances the handling of intrinsic linguistic patterns in emotional text classification.

Related works on TER

Kumar et al.¹¹ implemented a multimodal emotion recognition method, Visual Spoken Textual Additive Net (VISTANet), for classifying emotions displayed by input images, text, and speech into separate types. A novel interpretability approach, K-Average Additive exPlanation (KAAP), is established, which classifies significant textual, spoken, and visual features, resulting in the forecasting of a specific emotion type. The VISTANet merges data from text, speech, and image conditions, utilizing a hybrid of late and intermediate fusion. It autonomously alters the intermediate output’s weights by calculating the weighted average. Di Luzio et al.¹² explored explainability methods for binary deep neural network (DNN) frameworks in the model of emotion recognition via video analysis. The authors investigated the input features for binary classifiers that recognize emotions, utilizing facial behaviour analysis and an enhanced form of the Integrated Gradients explainability technique. Fu et al.¹³ proposed a spectral domain reconstruction graph neural network (SDR-GNN) model for unfinished multimodal learning in conversational emotion recognition. This model creates an utterance semantic interaction graph utilizing a sliding window that relies on context and speaker relations. Li et al.¹⁴ proposed a new emotion recognition system depending on a curriculum learning (CL) approach (ERNetCL). This approach integrates a spatial encoder (SE), a temporal encoder (TE), and a CL loss. To mitigate the severe effects of emotional change and provide a framework for individuals to learn a curriculum from simple to complex, employ the CL concept in the task of ERC to enhance the network’s parameters continually. Kusal et al.¹⁵ presented a hybrid DL network depending on the convolutional-recurrent network utilized for detecting the individual’s emotions, depending on the conversational text. A convolutional network can extract local dependencies and patterns and is intrinsically shift-invariant. Simultaneously, the recurrent network extracts long-term relationships in sequential data. Feng et al.¹⁶ developed a multimodal speech emotion recognition technique that relies on multiscale MFCCs and a multiview AM, which can capture numerous audio emotional features and effectively merge emotion-specific features from two features. Under various attention configurations and audio input states, it is observed that the finest emotion recognition precision is achieved by mutually leveraging four attention modules and three diverse scales of MFCCs. Zhang et al.¹⁷ proposed a novel technique for building emotion classification labels using language resources and density-based spatial clustering of applications with noise (DBSCAN). Moreover, incorporate the spatial features and frequency area of emotional EEG signals and pass those features to the serial network, which integrates LSTM and CNN for EEG emotion learning and identification. Omarov and Zhumanov¹⁸ proposed an innovative Bi-LSTM methodology for analyzing emotions in textual content. This technique implements the strength of RNNs to extract both past and text context, providing a detailed interpretation of emotional content. By incorporating the backwards and forward layers of LSTM model, this model efficiently learns the semantic representation of words and their dependencies within sentences.

Hicham and Nassera¹⁹ proposed the stacked DL models integrating a robustly optimized Bert pretraining approach (RoBERTa) with gated recurrent unit (GRU), long short-term memory (LSTM), bidirectional GRU (BiGRU), and bidirectional LSTM (BiLSTM). These hybrid architectures are optimized using adaptive moment estimation (Adam) models. Mahajan, More, and Shah²⁰ developed a novel multilabel dataset and evaluate various ML and DL techniques comprising logistic regression (LR), support vector machine (SVM), naïve bayes (NB), random forest (RF), LSTM, BiLSTM, GRU, and convolutional neural network (CNN) to capture both single and mixed emotional states. Zhu et al.²¹ developed a reliable medical question-answering system by utilizing knowledge embedding and a transformer-based architecture. The model also integrates a knowledge understanding layer and an answer generation layer to improve both the accuracy and ethical quality of responses. Khan et al.²² improved violence detection in surveillance videos by utilizing a two-stream DL model that integrates 3D convolutional networks with depth-wise convolutions. The model also utilizes RGB frame analysis with background suppression and optical flow to accurately capture violent actions while maintaining computational efficiency suitable for edge devices. Arumugam et al.²³ proposed an Audio, Visual, and Text Emotions Fusion Network (AVTEFN) model that utilizes graph attention networks (GAT), hybrid wav2vec 2.0 with CNN, and bidirectional encoder representations from transformers (BERT) combined with bidirectional gated recurrent units (Bi-GRU). Khan et al.²⁴ proposed an advanced Multimodal Emotion Recognition (MER) approach by utilizing a Joint Multi-Scale Multimodal Transformer (JMMT) with Recursive Cross-Attention. Alyoubi and Alyoubi²⁵ presented an optimized multimodal emotion recognition framework that integrates Bidirectional Encoder Representations from Transformers (BERT)/Robustly Optimized BERT Pretraining Approach (RoBERTa) for text, wav2vec 2.0 for speech, and Residual Networks (ResNet50)/Visual Geometry Group Network (VGG16) for facial expressions. The model also utilizes a transformer-based cross-modal attention mechanism and Shapley Additive Explanations (SHAP) to improve both classification accuracy and interpretability. Vani et al.²⁶ introduced Text Fusion+, an integrated application that utilizes optical character recognition (OCR), NLP, and Text-to-Speech (TTS) technologies. The model also employs DL-based summarization and an NLP-driven question-answering module. Khan et al.²⁷ presented a technique by utilizing sequence learning techniques, including Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), with their advanced forms such as bi-directional and multi-layer architectures, to enhance auditory emotion recognition. Ghous, Najam, and Jalal²⁸ detected emotional states in individuals with cognitive disabilities by applying advanced ML methods. Initially, bandpass filtering (BF) and downsampling is used for pre-processing. The model also incorporates an adaptive automated feature selection and transformation (AAFST) technique with a multi-class SVM to improve the accuracy and reliability of emotion recognition. Patil et al.²⁹ enabled early prediction of learning disabilities, specifically dyslexia, by employing handwritten text recognition and DNN. Mishra et al.³⁰ introduced a model by utilizing MLP and transformer techniques such as GPT-4 and BERT. The model utilizes sentiment analysis, emotion classification, and is trained using the adaptive moment estimation (Adam) optimizer to achieve high accuracy and generalization. Table 1 presents a comparative analysis of existing textual emotion recognition systems for individuals with disabilities.

Table 1 A comparative study of the reviewed techniques.

Full size table

Existing studies exhibit various limitations and research gaps, despite their advancements. Multi-model approaches, such as VISTANet and SDR-GNN, are computationally intensive, which restricts their real-time applicability. Explainability methods, while improving interpretability, often concentrate on binary classification or limited modalities. Most models do not adequately address the complexity of mixed or multilabel emotions, particularly in conversational or code-mixed contexts. Additionally, few studies integrate curriculum learning or adaptive fusion mechanisms to improve the robustness of the model. There is also a research gap in utilizing lightweight yet effective hybrid DL architectures for scalable emotion recognition. Addressing these gaps can improve generalizability and efficiency across diverse datasets and applications.

Research design and methodology

In this work, a new IERT-HDLMWEP model is developed for emotion identification using textual data. The proposed model aims to enhance the accuracy of a DL-based TER system in identifying and interpreting emotions in text, thereby improving communication support for individuals with disabilities. It encompasses text pre-processing, word embedding, and a hybrid classification method. Figure 1 indicates the complete process of the IERT-HDLMWEP model.

Text pre-processing method

Initially, the text pre-processing step involves several typical levels to enhance analysis and reduce the dimensionality of the input data. Information gained from various resources, primarily social media, is often unstructured³¹. Raw data may be noisy and contain grammatical and spelling errors. Therefore, texts need to be cleaned before examining. Considering that several words are insignificant and add nothing to the text (for example, special characters, prepositions, stop words, and punctuation), pre-processing is used to improve the study and reduce the input data dimensionality. Some usual work is involved in the complete process, as shown.

Substitute negative words. Negations are words like never, not yet, and no that specify the unrelated meaning of phrases or words. The SA aims to substitute the negation with an antonym. For instance, the word not good is substituted with bad which is the opposite of good. Therefore, a sentence like “The car is not good” is converted into “The car is bad. “Still, particular negative words, like never, not, and no, and negative reductions, like doesn’t, mustn’t, and couldn’t, are frequently a part of stopword lists. Therefore, replacing each negative word and contraction with not, and then correcting this problem after stop-word removal as part of the spelling corrections.
Lowercasing: It converts each character in the dataset to lowercase, as opposed to capitalizing correct words, names, and initials at the beginning of the sentence. Capitalization is challenging on Twitter, as users often avoid capital letters, which can create a tone of ease in the chat, making the messages more conversational. Exchanging each of the texts’ letters for a similar circumstance. The word ‘Ball’, for example, will turn out to be ‘ball’.
Converting emoticons. Now, users use emoticons to explain their emotions, thoughts, and feelings. Therefore, converting each emoticon to consistent words will give better outcomes.
Removing redundant information [comprising hashtags (#), additional spaces, punctuation, special characters $\:(\$,\:\&,\:\%,\:.\:.\:.)$, @username, URL references, stop words (e.g. ‘the’, ‘is’, ‘at’, etc.), non-ASCII English letters and numbers are neglected to preserve the individuality of the information encoding in English]. This type of information fails to predict the emotions expressed by users accurately.
Expanded acronyms to their new words, utilizing the acronym dictionary. Slang and abbreviations are improperly organized words that are often utilized on Twitter. They should be restored to their original form.
Change words with repetitive characters to their English roots. Each frequently uses words with repetitive letters (for example: ‘coooool’) to express their emotions.
PoS tagging. Numerous constructive modules in the text, such as nouns, verbs, adjectives, and adverbs, are recognized in this phase.
Tokenization. It deconstructs text into smaller textual units (for example, documents into expressions, sentences into words).
Lemmatization. There are derivationally related families of words with comparable meanings, like presidential, president, and presidency. This model aims to lower derivational and inflectional forms of the word to the standard base. It uses morphological and vocabulary studies to eliminate inflectional ends and return the dictionary or base form of the word, such as the lemma. If lemmatization were applied to the word ‘saw’, it would attempt to return both ‘see’ and ‘saw’ according to whether it was used as a noun or a verb. It is the procedure of decreasing a specific word to its simpler method, equal to the stem, but it preserves word-related data such as PoS tags.

Word embedding-based Word2Vec approach

For the word embedding process, the IERT-HDLMWEP model employs a hybrid feature representation that combines Word2Vec, TF-IDF-CDW, and POS Encoding for enhanced emotion detection in textual data³². This model is chosen for its superior capability in capturing semantic relationships between words by representing them as continuous vector spaces, which conventional bag-of-words models fail to do. The context-dependent word representations are effectively learned by the Word2Vec model, enabling it to comprehend subtle variations and similarities in language usage. This results in richer feature representations that enhance downstream tasks, such as emotion detection. This model is appropriate for handling extensive textual datasets and is computationally efficient and scalable to large corpora. The robustness of the model is enhanced by its ability to capture both syntactic and semantic data, outperforming simpler encoding techniques in capturing complex language patterns. Overall, Word2Vec presents a balance of effectiveness and efficiency, making it an ideal choice over conventional or more resource-intensive embedding methods.

Word2Vec

Word2Vec is a widely used static embedding model that encodes words as dense vectors in a fixed semantic space, where semantically relevant terms are located. It utilizes two different training mechanisms: CBOW and Skip-gram. It focuses on inferring context words from specified centre words, while CBOW forecasts the central term by combining data from its neighbouring words. To leverage the CBOW model to develop word embeddings. CBOW calculates the likelihood of the centre word $\:{w}_{n}$, given its adjacent context words $\:{w}_{c}$, depending on the embeddings of numerous surrounding words $\:{w}_{c}$.

$$\:p\left({w}_{n}|{w}_{c}\right)=\frac{\text{e}\text{x}\text{p}\left({w}_{n}{h}_{n}\right)}{{\varSigma\:}_{{w}^{{\prime\:}}\in\:corpus}\text{exp}\left({w}^{{\prime\:}}{h}_{n}\right)}$$

(1)

Now, $\:{h}_{n}$ depicts an average embedded vector of the nearby contextual window, $\:{w}_{n}$, and $\:{w}_{c}$ represents the context words. Figure 2 illustrates the architecture of the Word2Vec model.

TF-IDF-CDW

TF-IDF is a commonly endorsed term-weighting model in NLP intended for assessing the importance of words in specific documents. Term frequency $\:\left(TF\right)$ imitates how frequently a term $\:{t}_{i}$ arises inside a document $\:{d}_{j}$:

$$\:TF\left({t}_{i},{d}_{j}\right)=\frac{{n}_{i,j}}{\left|{d}_{j}\right|}$$

(2)

Now, $\:{n}_{i,j}$ represents the $\:TF$ of $\:{t}_{i}$ in document $\:{d}_{j}$, and $\:\left|{d}_{j}\right|$ refers to the overall word count in $\:{d}_{j}$. IDF assesses the rarity of the term through the overall corpus.

$$\:IDF\left({t}_{i}\right)=\text{l}\text{o}\text{g}\frac{\left|D\right|}{1+\left|\left\{{d}_{j}:{t}_{i}\in\:{d}_{j}\right\}\right|}$$

(3)

Here $\:\left|D\right|$ depicts the total number of documents inside the corpus, and $\:\left|\left\{{d}_{j}:{t}_{i}\in\:{d}_{j}\right\}\right|$ indicates document counts comprising term $\:{t}_{i}$. To multiply $\:TF$ and $\:IDF$, the TF-IDF score is attained to depict the importance of term $\:{t}_{i}$ in document $\:{d}_{j}$:

$$\:TP-IDP\left({t}_{i},{d}_{j}\right)=\frac{{n}_{i,j}}{\left|{d}_{j}\right|}\times\:\text{l}\text{o}\text{g}\frac{\left|D\right|}{1+\left|\left\{{d}_{j}:{t}_{i}\in\:{d}_{j}\right\}\right|}$$

(4)

Consequently, some terms that occur regularly within particular classes but rarely elsewhere can receive inadequately low weights. To tackle this restriction, project an improved metric named $\:CDW$.

$$\:CDW\left({t}_{i}\right)=1+\frac{\sum\:_{c\varepsilon C}P\left(c|{t}_{i}\right)\text{l}\text{o}\text{g}P\left(c|{t}_{i}\right)}{\text{log}\left|C\right|+1}$$

(5)

Now, $\:P\left(c|{t}_{i}\right)$ depicts the proportion of documents including $\:{t}_{i}$ which belong to class $\:c$, and $\:\left|C\right|$ refers to overall class counts. This model is designed to identify terms that are consistently distributed across classes, while highlighting those with category-specific preferences. CDW is primarily efficient at recognizing terms strongly associated with specific classes. To combine CDW, the methodology allocates a higher weight to category-specific terms. At last, the score of TF‐IDF‐CDW is calculated to multiply the value of TF‐IDF with an equivalent value of CDW, thus acquiring either document- or category‐level term significance:

$$\:TP-IDP-CDW\left({t}_{i},{d}_{j}\right)=TP\left({t}_{i},{d}_{j}\right)\cdot\:IDP\left({t}_{i}\right)\cdot\:CDW\left({t}_{i}\right)$$

(6)

This formulation enables the method to assign higher significance to terms that contribute more significantly to the distinction class, thereby enhancing the performance of classification and improving the quality of semantic representation.

POS encoding.

POS acts as a basic syntactic indicator in language study, depicting the grammatical part of a word within the sentence. Integrating POS data could enhance the understanding of the grammatical framework and the semantic context. Specifically, in tasks of text classification, POS tags such as verbs and nouns tend to be more effective for classifier outcomes. Therefore, integrating POS aspects enhances the syntactic awareness of the methodology and the performance of classification.

To accept an arbitrary vector-based POS encoding approach. In particular, every dissimilar POS class is allocated an arbitrary 10-D vector that can adjust over training. Then calculate the TF‐IDF‐CDW weight for every word $\:TI{C}_{1:n}=\{ti{c}_{1},\:ti{c}_{2},\:\cdots\:,\:ti{c}_{n}\}$ as described in the equation mentioned above. POS encoding vectors $\:PO{S}_{1:n}=\{po{s}_{1},\:po{s}_{2},\:\cdots\:po{s}_{n}\}$ are derived utilizing the method. At last, the improved embedding vector $\:{V}_{1:n}$ for every word is developed to scale the Word2Vec vector $\:p{v}_{i}$ with its equivalent TF‐IDF‐CDW weight $\:ti{c}_{j}$ and aimed at a 10-D POS encoding vector $\:po{s}_{i}$, producing a 310-D representation:

$$\:{V}_{1:n}=\left\{p{v}_{1}\cdot\:ti{c}_{1}+po{s}_{1},p{v}_{2}\cdot\:ti{c}_{2}+po{s}_{2},\dots\:,p{v}_{n}\cdot\:ti{c}_{n}+po{s}_{n}\right\}$$

(7)

An innovative improved word embedding vector reflects the significant differences of similar words through diverse texts and further acquires POS data. The created embedding matrix emphasizes key semantic components while reducing noise interference in the classification method.

Hybrid classification process

At last, the hybrid of the C-BiG-A technique is employed for the classification process. This hybrid model is chosen for its ability to effectively capture both local patterns and long-range dependencies in textual data. The Bi-GRU model processes sequential data in both forward and backwards directions, and CNN outperforms in extracting spatial features and local n-gram patterns, capturing contextual relationships. The AM model further enhances the model by selectively focusing on the most relevant parts of the input, thereby improving both interpretability and performance. This fusion model effectually balances complexity and efficiency, presenting superior emotion classification accuracy, particularly for complex and mixed emotions compared to standalone models. Its architecture is appropriate for handling varying text lengths and diverse linguistic structures, making it a robust choice over simpler or less adaptive models. Figure 3 specifies the structure of the C-BiG-A technique.

CNN can well remove the features and information³³. By selecting the convolutional kernel across various input sequence positions, it can successfully capture the changing local features and patterns of the sequence data. The convolutional layer takes local features from the time series data by using a sliding window. While the RNN has the benefit of being able to remove contextually related data, it may admit a minimum range of contextual information and suffer from the long-term dependency problem. Then, it is essential to introduce a threshold mechanism into the RNN architecture to maintain the required data. It can not only be an efficient solution to the problems of gradient explosion and gradient vanishing, but also partially address the disadvantages of transferring information over longer distances. The generally applied RNN frameworks with gate mechanisms are the LSTM and GRU structures, which possess temporal solid processing abilities. Still, if GRU fails to maintain memory, the internal architecture of the network has only dual gates that prevent the memory area, decrease the training parameters, and outcomes at the fastest training speed. The fundamental structure of a GRU method, and the forward GRU computational equation, is as shown:

$$\:{r}_{t}=\sigma\:\left({W}_{r}\left[{h}_{t-1},{x}_{t}\right]+{b}_{r}\right)$$

(8)

$$\:{z}_{t}=\sigma\:\left({W}_{z}\left[{h}_{t-1},{x}_{t}\right]+{b}_{z}\right)$$

(9)

$$\:{\stackrel{\sim}{h}}_{t}=\text{t}\text{a}\text{n}\text{h}\left({W}_{h}\left[{r}_{t}*{h}_{t-1},{x}_{t}\right]+{b}_{h}\right)$$

(10)

$$\:{h}_{t}=(1-{z}_{t})*{h}_{t-1}+{z}_{t}*{\stackrel{\sim}{h}}_{t})$$

(11)

$$\:{y}_{t}=softmax\left({W}_{0}\cdot\:{h}_{t}+{b}_{0}\right)$$

(12)

whereas, $\:{x}_{t}$ characterizes the input vector of the present time step; $\:{W}_{r},$ $\:{W}_{z}$, and $\:{W}_{h}$ characterize the weighted matrices; and $\:{b}_{r},$ $\:{b}_{z}$, and $\:{b}_{h}$ symbolize the biased vectors.

The Bi-GRU model can learn either past or future information. It may learn the input sequence information more broadly to prevent missing information after processing longer sequences. $\:{g}_{t}^{{\prime\:}}$ and$\:\:{g}_{t}$ represent the consistent outputs for the Backwards and Forward GRU layers at moment $\:t$, correspondingly presented as shown:

$$\:{g}_{t}^{forward}=GRU\left({x}_{t},{h}_{t-1}^{forward}\right)$$

(13)

$$\:{g}_{t}^{backward}=GRU\left({x}_{t},{h}_{t+1}^{backward}\right)$$

(14)

The Bi-GRU output $\:{O}_{t}$ is presented as shown:

$$\:{O}_{t}=\overrightarrow{W}{g}_{t}+\overleftarrow{W}{g}_{t}^{{\prime\:}}\leftarrow\:+{b}_{t}$$

(15)

whereas $\:\overleftarrow{W}$ and $\:\overrightarrow{W}$ represent the weighted matrices for the backwards and forward GRU structures, individually, while $\:{b}_{t}$ symbolizes the output layer’s bias vector.

They present a linear generalized attention mechanism, which achieves near-linear growth in both time and space complexity. This model significantly reduces training time, particularly after processing longer sequences. The attention matrix $\:A\in\:{\mathbb{R}}^{L\times\:L}$ as demonstrated:

$$\:A(i,j)=K({q}_{i}^{T},{k}_{j}^{T})$$

(16)

Here, $\:{k}_{j}{\:and\:q}_{i}$ characterize the $\:jth$ and $\:ith$ row vectors of the key $\:K$ and query $\:Q,$ respectively. The kernel $\:K$ is represented as demonstrated:

$$\:K\left(x,y\right)=\mathbb{E}\left[\varphi\:{\left(x\right)}^{T}\varphi\:\left(y\right)\right]$$

(17)

Now, $\:\varphi\:\left(x\right)$ signifies mapping functions. If $\:{Q}^{{\prime\:}},{\:K}^{{\prime\:}}\in\:{\mathbb{R}}^{L\times\:p}$, their row vectors are embodied as $\:\varphi\:\left({q}_{i}^{T}\right)$ and $\:\varphi\:\left({k}_{j}^{T}\right)$, individually. The efficient attention, depending on the kernel description, is formulated as shown:

$$\:\widehat{Att}\leftrightarrow\:\left(Q,K,\:V\right)={\widehat{D}}^{-1}\left(BV\right)$$

(18)

whereas, $\:B={Q}^{{\prime\:}}({K}^{{\prime\:}}{)}^{T}$ and $\:\widehat{D}=diag\left(B{1}_{L}\right)$. Now, $\:\widehat{Att}$ represents the estimated attention, while the brackets specify the computing sequence. Table 2 specifies the key hyperparameters of the C-BiG-A technique.

Table 2 Key hyperparameters of the C-BiG-A model.

Full size table

Experimental validation

The performance assessment of the IERT-HDLMWEP method is examined under the Emotion detection from the Text dataset³⁴. The technique runs on Python 3.6.5 with an i5-8600k CPU, 4GB GPU, 16GB RAM, 250GB SSD, and 1 TB HDD, using a 0.01 learning rate, ReLU, 50 epochs, 0.5 dropout, and batch size 5. This dataset consists a total of 39,173 samples, categorized into 12 sentiments, as outlined in Table 3 below. Table 4 represents the sample text.

Table 3 Details of the dataset.

Full size table

Table 4 Sample text.

Full size table

Figure 4 shows the confusion matrices formed by the IERT-HDLMWEP approach at 80:20 and 70:30 ratios of TRPHE to TSPHE. The results state that the IERT-HDLMWEP technique effectively detects and identifies each class.

Table 5; Fig. 5 present the textual emotion detection outcome of the IERT-HDLMWEP technique at 80:20. Under 80% TRPHE, the IERT-HDLMWEP technique attains an average $\:acc{u}_{y}$ of 99.64%, $\:pre{c}_{n}$ of 94.95%, $\:rec{a}_{l}$ of 89.19%, $\:{F}_{Measure}\:$of 91.22%, $\:{AUC}_{Score}\:$of 94.49%, and Kappa of 94.56%. Also, on 20% TSPHE, the IERT-HDLMWEP model obtains an average $\:acc{u}_{y}$ of 99.67%, $\:pre{c}_{n}$ of 93.61%, $\:rec{a}_{l}$ of 88.54%, $\:{F}_{Measure}\:$of 89.97%, $\:{AUC}_{Score}\:$of 94.18%, and Kappa of 94.25%.

Table 5 Textual emotion detection outcome of IERT-HDLMWEP model under 80%:20%.

Full size table

Table 6; Fig. 6 portray the textual emotion detection outcome of the IERT-HDLMWEP method on 70:30. Based on 70% TRPHE, the IERT-HDLMWEP method attains an average $\:acc{u}_{y}$ of 99.31%, $\:pre{c}_{n}$ of 90.71%, $\:rec{a}_{l}$ of 80.81%, $\:{F}_{Measure}\:$of 82.83%, $\:{AUC}_{Score}\:$of 90.12%, and Kappa of 90.28%. Similarly, on 30% TSPHE, the IERT-HDLMWEP technique obtains an average $\:acc{u}_{y}$ of 99.38%, $\:pre{c}_{n}$ of 93.88%, $\:rec{a}_{l}$ of 80.87%, $\:{F}_{Measure}\:$of 82.93%, $\:{AUC}_{Score}\:$of 90.26%, and Kappa of 90.32%.

Table 6 Textual emotion detection outcome of IERT-HDLMWEP model under 80%:20%.

Full size table

Figure 7 exemplifies the training (TRAIN) $\:acc{u}_{y}$ and validation (VALID) $\:acc{u}_{y}$ of an IERT-HDLMWEP approach at an 80:20 ratio over 25 epochs. Primarily, both TRAIN and VALID $\:acc{u}_{y}\:$rise quickly, representing efficient pattern learning from the data. Around the epoch, the VALID $\:acc{u}_{y}$slightly exceeds the training accuracy, signifying good generalization without over-fitting. As training advances, it reflects higher performance and a lower performance gap between TRAIN and VALID. The close alignment of both curves during training implies that the model is well-regularised and well-generalized. This shows the approach’s stronger capability in learning and retaining beneficial features across both seen and unseen data.

Figure 8 demonstrates the TRAIN and VALID losses of the IERT-HDLMWEP technique at 80:20 over 25 epochs. Initially, both TRAIN and VALID losses are higher, showing that the method begins with a partial understanding of the data. As training evolves, both losses continually decrease, indicating that the technique is efficiently learning and optimizing its parameters. The close alignment between the TRAIN and VALID loss curves in training implies that the model hasn’t overfitted and retains good generalization to unseen data. This reliable and steady decrease in loss shows a well-trained, stable, and consistent DL model.

In Fig. 9, the precision-recall (PR) inspection study of the IERT-HDLMWEP methodology on the 80:20 dataset provides insights into its performance by charting Precision against Recall for all classes. The figure illustrates that the IERT-HDLMWEP technique consistently achieves increased PR values across diverse classes, indicating its potential in maintaining a significant share of true positive predictions among all positive predictions (precision), while also capturing a substantial portion of actual positives (recall). The steady improvement in PR results across each class depicts the efficiency of the IERT-HDLMWEP technique during the classifier process.

In Fig. 10, the ROC analysis of the IERT-HDLMWEP technique is examined under an 80:20 ratio. The results indicate that the IERT-HDLMWEP method achieves elevated ROC values across all classes, demonstrating a significant ability to differentiate between class labels. This consistent pattern of increased values of ROC across numerous class labels indicates the effective results of the IERT-HDLMWEP method in class prediction, underscoring the robust nature of the classification process.

Table 7; Fig. 11 demonstrate the comparative analysis of the IERT-HDLMWEP method with current techniques under various metrics^19,20,35,36. The outcomes emphasized that the IERT-HDLMWEP model got higher $\:acc{u}_{y}$, $\:pre{c}_{n}$, $\:rec{a}_{l},$ and $\:{F}_{Measure}$ of 99.67%, 93.61%, 88.54%, and 89.97%, respectively. While the existing methodologies, namely RoBERTa, BiGRU, RF, U-Net, Improved DBN-SVM, Bi-GRU, XLNet, DAN, Bi-LSTM, and SVM, have shown worse performance under various metrics.

Table 7 Comparative analysis of the IERT-HDLMWEP model with existing methods^19,20,35,36.

Full size table

In Table 8; Fig. 12, the computational time (CT) of the IERT-HDLMWEP methodology is compared with that of existing techniques. The IERT-HDLMWEP methodology presents a lower CT of 8.06 s while the RoBERTa, BiGRU, RF, U-Net, Improved DBN-SVM, Bi-GRU, XLNet, DAN, Bi-LSTM, and SVM methodologies attained superior CTs of 16.99 s, 13.00 s, 17.90 s, 18.56 s, 21.82 s, 28.89 s, 15.36 s, 21.64 s, 12.59 s, and 10.43 s, respectively.

Table 8 CT outcome of IERT-HDLMWEP methodology with existing models.

Full size table

Table 9; Fig. 13 demonstrates the error analysis of the IERT-HDLMWEP technique with existing methods. The error analysis of various models exhibit significant differences in performance across evaluation metrics. RoBERTa achieved the highest overall results with $\:acc{u}_{y}$ of 10.25%, $\:pre{c}_{n}$ of 13.19%, $\:rec{a}_{l}$ of 13.27%, and $\:{F}_{Measure}$ of 17.07%. U-Net followed closely with $\:acc{u}_{y}$ of 11.03%, $\:pre{c}_{n}$ of 13.80%, $\:rec{a}_{l}$ of 13.95%, and $\:{F}_{Measure}$ of 17.63%. In contrast, conventional models such as RF and SVM performed poorly, with RF illustrating an $\:acc{u}_{y}$ of 1.87%, $\:pre{c}_{n}$ of 6.97%, $\:rec{a}_{l}$ of 17.86%, and $\:{F}_{Measure}$ of 17.84%, while SVM had $\:acc{u}_{y}$ of 1.97%, $\:pre{c}_{n}$ of 8.93%, $\:rec{a}_{l}$ of 13.58%, and $\:{F}_{Measure}$ of 15.67%. BiGRU and Bi-GRU showed low $\:acc{u}_{y}$ of 5.21% and 2.61% respectively, but relatively high recall values above 17%, indicating a bias in detecting positives. XLNet balanced its metrics well, achieving $\:acc{u}_{y}$ of 3.43%, $\:pre{c}_{n}$ of 7.82%, $\:rec{a}_{l}$ of 19.25%, and the highest $\:{F}_{Measure}$ of 18.78%. The IERT-HDLMWEP model attained an $\:acc{u}_{y}$ of 0.33%, $\:pre{c}_{n}$ of 6.39%, $\:rec{a}_{l}$ of 11.46%, and $\:{F}_{Measure}$ of 10.03%, indicating severe issues in both $\:pre{c}_{n}$ and $\:rec{a}_{l}$. These results indicate that as more advanced representations and architectural enhancements are introduced, both $\:pre{c}_{n}$ and $\:rec{a}_{l}$ improve consistently, resulting in more balanced and reliable classification performance.

Table 9 Error analysis of IERT-HDLMWEP technique with existing methods.

Full size table

Table 10; Fig. 14 indicates the ablation study of the IERT-HDLMWEP approach. The ablation study illustrates the progressive improvement in performance as diverse components are integrated into the model. Starting with Word2Vec, the baseline model attained an $\:acc{u}_{y}$ of 97.06%, $\:pre{c}_{n}$ of 90.92%, $\:rec{a}_{l}$ of 85.81%, and $\:{F}_{Measure}$ of 87.06%. The TF-IDF-CDW approach slightly improved performance with $\:acc{u}_{y}$ of 97.81%, $\:pre{c}_{n}$ of 91.46%, $\:rec{a}_{l}$ of 86.57%, and $\:{F}_{Measure}$ of 87.75%. Incorporating the PSE technique further enhanced the metrics to $\:acc{u}_{y}$ of 98.50%, $\:pre{c}_{n}$ of 92.23%, $\:rec{a}_{l}$ of 87.30%, and $\:{F}_{Measure}$ of 88.51%. The C-BiG-A model significantly outperformed previous variants, reaching $\:acc{u}_{y}$ of 99.03%, $\:pre{c}_{n}$ of 92.98%, $\:rec{a}_{l}$ of 87.80%, and $\:{F}_{Measure}$ of 89.28%. The final IERT-HDLMWEP model attained the best results across all metrics, with an $\:acc{u}_{y}$ of 99.67%, $\:pre{c}_{n}$ of 93.61%, $\:rec{a}_{l}$ of 88.54%, and $\:{F}_{Measure}$ of 89.97%, confirming the efficiency of the complete hybrid architecture.

Table 10 Ablation study-based comparative analysis of the IERT-HDLMWEP methodology.

Full size table

Conclusion

In this study, a novel emotion detection model, named the IERT-HDLMWEP method, is developed to accurately identify and interpret emotions in text, thereby enhancing communication support for people with disabilities. Primarily, the text pre-processing stage involves several typical levels to increase the study and minimize the dimensionality of input data. For the word embedding process, the IERT-HDLMWEP method generates a hybrid feature representation by integrating pre-trained Word2Vec vectors weighted using a TF-IDF-CDW and enriched with POS feature vectors to enhance emotion detection in textual data. At last, the hybrid of the C-BiG-A technique is employed for the classification process. A comprehensive simulation was implemented to verify the performance of the IERT-HDLMWEP methodology. The empirical results indicated that the IERT-HDLMWEP methodology emphasized development over other recent techniques. The limitations of the IERT-HDLMWEP methodology include its dependence on general textual data. Though disability-specific data is not required due to the text-based nature of the approach, the present model may not capture all contextual variations relevant to specific user groups and may not generalize well to other languages or domains with diverse linguistic styles or informal expressions. The performance may also be affected when dealing with highly informal or noisy text, which is typical of some real-world communications. Future work should focus on enhancing the system’s robustness to diverse linguistic styles and expanding its adaptability to various domains. Incorporating user feedback mechanisms could additionally personalize emotion recognition. Additionally, integrating this text-based model with other assistive technologies may improve overall support for people with intellectual disabilities. Finally, developing real-time processing capabilities and user-friendly interfaces will be crucial for practical deployment.

Data availability

The data that support the findings of this study are openly available in the Kaggle repository at [https://www.kaggle.com/datasets/pashupatigupta/emotion-detection-from-text](https:/www.kaggle.com/datasets/pashupatigupta/emotion-detection-from-text), reference number³⁴.

References

Alshahrani, H. M., Yaseen, I. & Drar, S. Textual emotion analysis-based disabled people talking using improved metaheuristics with deep learning techniques for intelligent systems. J. Disabil. Res. 2 (3), 40–47 (2023).
Article Google Scholar
Bharti, S. K. et al. Text-based emotion recognition using deep learning approach. Comput. Intell. Neurosci. 1, 2645381 (2022).
Google Scholar
Sivakumar, M. et al. Transforming Arabic Text Analysis: Integrating Applied Linguistics with m-Polar Neutrosophic Set Mood Change and Depression on Social Media. Int. J. Neutrosophic Sci. (IJNS) 25, 2 (2025).
Google Scholar
Lyakso, E. et al. November. Recognition of the emotional state of children with down syndrome by video, audio and text modalities: human and automatic. In International Conference on Speech and Computer (pp. 438–450). Cham: Springer International Publishing. (2022).
Mohammad, S. M. Ethics sheet for automatic emotion recognition and sentiment analysis. Comput. Linguistics. 48 (2), 239–278 (2022).
Article Google Scholar
Uddin, M. Z., Dysthe, K. K., Følstad, A. & Brandtzaeg, P. B. Deep learning for prediction of depressive symptoms in a large textual dataset. Neural Comput. Appl. 34 (1), 721–744 (2022).
Article Google Scholar
Singh, P., Srivastava, R., Rana, K. P. S. & Kumar, V. A multimodal hierarchical approach to speech emotion recognition from audio and text. Knowl.-Based Syst., 229:107316 (2021).
Subramanian, M., Sathiskumar, V. E., Deepalakshmi, G., Cho, J. & Manikandan, G. A survey on hate speech detection and sentiment analysis using machine learning and deep learning models. Alexandria Eng. J. 80, 110–121 (2023).
Article Google Scholar
Alrasheedy, M. N., Muniyandi, R. C. & Fauzi, F. October. Text-based emotion detection and applications: a literature review. In 2022 international conference on cyber resilience (ICCR) (pp. 1–9). IEEE. (2022).
Sreedhar, P. et al. Automated EEG based emotion detection using bonobo optimizer with deep learning on human computer interaction. J. Intell. Syst. Internet Things 12, 1 (2024).
Google Scholar
Kumar, P., Malik, S., Raman, B. & Li, X. VISTANet: VIsual Spoken Textual Additive Net for Interpretable Multimodal Emotion Recognition. IEEE Trans. Affect. Comput.
Di Luzio, F., Rosato, A. & Panella, M. An explainable fast deep neural network for emotion recognition. Biomed. Signal Process. Control. 100, 107177 (2025).
Article Google Scholar
Fu, F. et al. SDR-GNN: Spectral Domain Reconstruction Graph Neural Network for incomplete multimodal learning in conversational emotion recognition. Knowl.-Based Syst. 309:112825. (2025).
Li, J., Wang, X., Liu, Y. & Zeng, Z. ERNetCL: A novel emotion recognition network in textual conversation based on curriculum learning strategy. Knowl.-Based Syst. 286:111434. (2024).
Kusal, S., Patil, S., Choudrie, J., Kotecha, K. & Vora, D. Transfer learning for emotion detection in conversational text: a hybrid deep learning approach with pre-trained embeddings. Int. J. Inform. Technol. 8, 1–18 (2024).
Google Scholar
Feng, L. et al. Multimodal speech emotion recognition based on multiscale MFCCs and multiview attention mechanism. Multimedia Tools Appl. 82 (19), 28917–28935 (2023).
Article Google Scholar
Zhang, L., Xia, B., Wang, Y., Zhang, W. & Han, Y. A fine-grained approach for EEG-based emotion recognition using clustering and hybrid deep neural networks. Electronics 12(23), 4717 (2023).
Article Google Scholar
Omarov, B. & Zhumanov, Z. Bidirectional long-short-term memory with attention mechanism for emotion analysis in textual content. Int. J. Adv. Comput. Sci. Appl. 14, 6 (2023).
Google Scholar
Hicham, N. & Nassera, H. Multilingual opinion mining of imbalanced customer review datasets using GPT and an ensemble hybrid deep learning model. Data Sci. Manag. (2025).
Mahajan, R., More, A. S. & Shah, U. Navigating emotion in code-mixed languages: performance of Ml and Dl models on hindi-english text. Procedia Comput. Sci. 258, 4029–4037 (2025).
Article Google Scholar
Zhu, X., Khan, M., Taleb-Ahmed, A. & Othmani, A. Advancing medical question answering with a knowledge embedding transformer. PLoS One. 20 (8), e0329606 (2025).
Article CAS PubMed PubMed Central Google Scholar
Khan, M., Gueaieb, W., El Saddik, A., De Masi, G. & Karray, F. September. An efficient violence detection approach for smart cities surveillance system. In 2023 IEEE International Smart Cities Conference (ISC2) (pp. 1–5). IEEE. (2023).
Arumugam, L., Arumugam, S., Chidambaram, P. & Govindasamy, K. A multi-model deep learning approach for human emotion recognition. Cogni. Neurodyn. 19(1), 123 (2025).
Article Google Scholar
Khan, M. et al. Joint Multi-Scale multimodal transformer for emotion using consumer devices. IEEE Trans. Consum. Electron. (2025).
Alyoubi, A. A. & Alyoubi, B. A. Interpretable multimodal emotion recognition using optimized transformer model with SHAP-based transparency. J. Supercomput. 81(9), 1044 (2025).
Article Google Scholar
Vani, G., Kalyan, K., Karthik, S., Reddy, R. N. V. S. & Haritha, P. B. and January. Text Fusion+: Advanced Integrated Image-to-Speech and Text Analysis Systems for Enhanced Accessibility and Interactive Learning. In International Conference on Advanced Computing Techniques in Engineering & Technology (pp. 174–184). Cham: Springer Nature Switzerland. (2025).
Khan, M., Ishaq, M., Swain, M. & Kwon, S. Advanced sequence learning approaches for emotion recognition using speech signals. In Intelligent Multimedia Signal Processing for Smart Ecosystems 307–325 (Springer International Publishing, 2023).
Chapter Google Scholar
Ghous, G., Najam, S. & Jalal, A. Human Emotion Recognition from EEG-brain Signals using Enhanced Machine Learning Method. In 2025 6th International Conference on Advancements in Computational Sciences (ICACS) (pp. 1–7). IEEE. (2025).
Patil, S. P., Apare, R. S., Borhade, R. H. & Mahalle, P. N. July. Handwritten Text-Based Learning Disability Prediction in Children Using Deep Neural Networks. In International Conference on Data Science and Big Data Analysis (pp. 265–276). Singapore: Springer Nature Singapore. (2024).
Mishra, A. R., Rai, A., Nandan, D., Kshirsagar, U. & Singh, M. K. Unveiling emotions: NLP-based mood classification and well-being tracking for enhanced mental health awareness. Math. Modell. Eng. Probl. 12, 2 (2025).
Google Scholar
Al-Tameemi, I. K. S., Feizi-Derakhshi, M. R., Pashazadeh, S. & Asadpour, M. A comprehensive review of visual–textual sentiment analysis from social media networks. J. Comput. Social Sci. 7 (3), 2767–2838 (2024).
Article Google Scholar
Li, C., Xie, Z. & Wang, H. Short text classification based on enhanced word embedding and hybrid neural networks. Appl. Sci. 15(9), 5102 (2025).
Article CAS Google Scholar
Si, S., Mu, D. & Tang, H. A hybrid CBiGRUPE model for accurate grinding wheel wear prediction. Sensors 25(9), 2935 (2025).
Article ADS PubMed PubMed Central Google Scholar
https://www.kaggle.com/datasets/pashupatigupta/emotion-detection-from-text
Jiang, X., Zhang, Y., Lin, G. & Yu, L. Music emotion recognition based on deep learning: A review. IEEE Access (2024).
Peng, S. et al. A survey on deep learning for textual emotion analysis in social networks. Digit. Commun. Networks. 8 (5), 745–762 (2022).
Article Google Scholar

Download references

Acknowledgements

The authors extend their appreciation to the King Salman center For Disability Research for funding this work through Research Group no KSRG-2024- 350.

Funding

The authors extend their appreciation to the King Salman Centre for Disability Research for funding this work through Research Group no. KSRG-2024-350.

Author information

Authors and Affiliations

Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
Mohammed Abdullah Al-Hagery
Department of Computer Science, Haql University College, University of Tabuk, Tabuk, Saudi Arabia
Hechmi Shili
Department of Information Systems, Faculty of Computer and Information Systems, Islamic University of Madinah, Medina, 42351, Saudi Arabia
Nasser Aljohani
Department of Computer and Self-Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, AlKharj, Saudi Arabia
Ishfaq Yaseen
King Salman Centre for Disability Research, Riyadh, 11614, Saudi Arabia
Ishfaq Yaseen

Authors

Mohammed Abdullah Al-Hagery
View author publications
Search author on:PubMed Google Scholar
Hechmi Shili
View author publications
Search author on:PubMed Google Scholar
Nasser Aljohani
View author publications
Search author on:PubMed Google Scholar
Ishfaq Yaseen
View author publications
Search author on:PubMed Google Scholar

Contributions

Mohammed Abdullah Al-Hagery: Conceptualization, methodology, validation, investigation, writing—original draft preparation, Hechmi Shili: Conceptualization, methodology, writing—original draft preparation, writing—review and editingNasser Aljohani: methodology, validation, writing—original draft preparationIshfaq Yaseen: software, visualization, validation, data curation, writing—review and editing.

Corresponding author

Correspondence to Mohammed Abdullah Al-Hagery.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Al-Hagery, M.A., Shili, H., Aljohani, N. et al. Empowering people with intellectual disabilities using integrated deep learning architecture driven enhanced text-based emotion classification. Sci Rep 15, 38568 (2025). https://doi.org/10.1038/s41598-025-22525-x

Download citation

Received: 14 June 2025
Accepted: 29 September 2025
Published: 04 November 2025
Version of record: 04 November 2025
DOI: https://doi.org/10.1038/s41598-025-22525-x