Table 1 A comparative study of the reviewed techniques.

From: Empowering people with intellectual disabilities using integrated deep learning architecture driven enhanced text-based emotion classification

Author

Objective

Method

Dataset

Result

Kumar et al.11

A hybrid-fusion-based, innovative, and interpretable multimodal emotion detection method, VISTANet, is presented to categorize an input comprising an image, equivalent text, and speech into discrete emotion segments.

KAAP

IIT-R MMEmoRec dataset

Accuracy of 80.11%

Di Luzio et al.12

A novel explainable AI model to recognize vital facial movements and distinctive emotional feelings.

DNN

Extended Cohn-Kanade dataset (CK+)

–

Fu et al.13

A novel structure for handling inadequate conversational information in the task of MERC that deliberates the higher-order data of modality and multi-frequencies, and fully employs the semantical dependencies.

GRU and Spectral Domain Reconstruction Graph Neural Network

IEMOCAP, CMU-MOSI, and CMU-MOSEI datasets

The Time of SDR-GNNmini is 7.52s

Li et al.14

An innovative ERNetCL model incorporates RNN- and MHA techniques in a simplified method to acquire spatial and temporal contextual data.

TE, SE, and CL

MELD, IEMOCAP, EmoryNLP, and DailyDialog dataset

Weighted-F1 of 66.31%, 69.73%, 39.71%, and 53.09% of Four Datasets

Kusal et al.15

Presents a hybrid DL model that depends on the convolutional-recurrent network employed to identify the emotions of individuals, depending on the conversational text.

Neural Network Language Model (NNLM), CNN, Recurrent Neural Network (RNN)

Empathetic Dialogues dataset

Accuracy of 73.62%

Feng et al.16

Establish a multimodal technique that combines text and speech information to capture the full benefits of emotion-relevant data, surpassing the application of a multiscale MFCC multiview AM.

Multiscale MFCC and Multi-Head Attention (MHA)

IEMOCAP and MSP-IMPROV dataset

IEMOCAP is 0.754 in WA and 0.742 in UA

Zhang et al.17

An automated emotion study method is utilized to allow the machine to understand the emotional inference transferred by the person’s EEG signals.

CNN and LSTM

DEAP dataset

Accuracy of 92.98%

Omarov and Zhumanov18

Projects an innovative Bi-LSTM technique for emotion study and identification in textual content, able to acquire either preceding or upcoming context to enhance performance.

Bi-LSTM

Kaggle Emotion Detection Dataset

Weighted Average of 90% Precision, 90% Recall, and 90% F-Score

Hicham and Nassera19

To develop a stacked DL technique for efficient multilingual opinion mining.

RoBERTa-GRU, RoBERTa-LSTM, RoBERTa-BiGRU, RoBERTa-BiLSTM, Adam optimizer, AEDA, SMOTE, GPT

French, English, Arabic, Cohen’s kappa, ROC-AUC, Accuracy, MCC, K-fold

High Efficacy, Improved Classification

Mahajan, More, and Shah20

To develop and evaluate models for recognizing single and mixed emotions in multilingual, code-mixed YouTube comments.

LR, SVM, NB, RF, LSTM, BiLSTM, GRU, CNN

13,000 multilabel YouTube comments, Accuracy, F1 score

SVM Highest Accuracy

Zhu et al.21

To develop an accurate and efficient medical question-answering system using advanced AI techniques.

Knowledge Embedding, Transformer-based Architecture, Knowledge Understanding Layer, Answer Generation Layer

MCMLE and USMLE Datasets

82.92% on MCMLE and 64.02% on USMLE

Khan et al.22

To develop an efficient DL method for accurate violence detection in surveillance videos using a two-stream approach.

Two-Stream Architecture, 3D Convolution Network, Background Suppression, Optical Flow Analysis, Depth-Wise 3D Convolutions

RWF2000, RLVS

Accuracy, Efficiency

Arumugam et al.23

To develop a novel multimodal by integrating audio, visual, and text inputs for accurate emotion recognition.

AVTEFN, Hybrid Wav2Vec 2.0 + CNN, BERT with Bi-GRU, Attention-Based Fusion

Benchmark Dataset

Accuracy at 98.7%, Precision at 98.2%, Recall, at 97.2%, and F1-score of 97.49%

Khan et al.24

To enhance multimodal emotion recognition by capturing inter- and intra-modal relationships using a joint transformer-based model.

MER, JMMT

IEMOCAP, MELD

Accuracy, F1-Score

Alyoubi and Alyoubi25

To develop an optimized transformer-based multimodal emotion recognition framework for accurate emotion classification.

BERT/RoBERTa for Text, wav2vec 2.0 for Speech, ResNet50/VGG16 for Visuals, Cross-Modal Attention, SHAP Explainability

Multimodal EmotionLines (MELD) Dataset

Accuracy, Explainability

Vani et al.26

To develop and evaluate Text Fusion + through advanced text analysis and audio output.

OCR, NLP, TTS, DL Summarization, NLP-based Q&A Module

Standard Dataset

Summarization Accuracy, User Accessibility

Khan et al.27

To explore sequence learning techniques for accurate auditory emotion recognition using advanced RNNs.

LSTM, GRU, Bi-Directional LSTM, Deep/Multilayer Architectures

Emotion Dataset

Accuracy, Model Robustness

Ghous, Najam, and Jalal28

To detect emotional states in individuals with cognitive disabilities using EEG data and advanced ML models.

BF, Downsampling, AAFST, Multi-Class SVM

SEED-IV

Accuracy, Emotion Detection

Patil et al.29

To predict learning disabilities using handwritten text analysis and intelligent systems.

DNN, Character Confusion Detection, Pattern Extraction Techniques

IAM Handwriting Dataset

Accuracy, Scalability

Mishra et al.30

To develop a transformer-based NLP model to support mental health monitoring.

Transformer Architecture, GPT-4 and BERT, Adam Optimizer

English Twitter Dataset

94% Accuracy