Table 1 A comparative study of the reviewed techniques.

From: Empowering people with intellectual disabilities using integrated deep learning architecture driven enhanced text-based emotion classification

Author	Objective	Method	Dataset	Result
Kumar et al.¹¹	A hybrid-fusion-based, innovative, and interpretable multimodal emotion detection method, VISTANet, is presented to categorize an input comprising an image, equivalent text, and speech into discrete emotion segments.	KAAP	IIT-R MMEmoRec dataset	Accuracy of 80.11%
Di Luzio et al.¹²	A novel explainable AI model to recognize vital facial movements and distinctive emotional feelings.	DNN	Extended Cohn-Kanade dataset (CK+)	–
Fu et al.¹³	A novel structure for handling inadequate conversational information in the task of MERC that deliberates the higher-order data of modality and multi-frequencies, and fully employs the semantical dependencies.	GRU and Spectral Domain Reconstruction Graph Neural Network	IEMOCAP, CMU-MOSI, and CMU-MOSEI datasets	The Time of SDR-GNN_mini is 7.52s
Li et al.¹⁴	An innovative ERNetCL model incorporates RNN- and MHA techniques in a simplified method to acquire spatial and temporal contextual data.	TE, SE, and CL	MELD, IEMOCAP, EmoryNLP, and DailyDialog dataset	Weighted-F1 of 66.31%, 69.73%, 39.71%, and 53.09% of Four Datasets
Kusal et al.¹⁵	Presents a hybrid DL model that depends on the convolutional-recurrent network employed to identify the emotions of individuals, depending on the conversational text.	Neural Network Language Model (NNLM), CNN, Recurrent Neural Network (RNN)	Empathetic Dialogues dataset	Accuracy of 73.62%
Feng et al.¹⁶	Establish a multimodal technique that combines text and speech information to capture the full benefits of emotion-relevant data, surpassing the application of a multiscale MFCC multiview AM.	Multiscale MFCC and Multi-Head Attention (MHA)	IEMOCAP and MSP-IMPROV dataset	IEMOCAP is 0.754 in WA and 0.742 in UA
Zhang et al.¹⁷	An automated emotion study method is utilized to allow the machine to understand the emotional inference transferred by the person’s EEG signals.	CNN and LSTM	DEAP dataset	Accuracy of 92.98%
Omarov and Zhumanov¹⁸	Projects an innovative Bi-LSTM technique for emotion study and identification in textual content, able to acquire either preceding or upcoming context to enhance performance.	Bi-LSTM	Kaggle Emotion Detection Dataset	Weighted Average of 90% Precision, 90% Recall, and 90% F-Score
Hicham and Nassera¹⁹	To develop a stacked DL technique for efficient multilingual opinion mining.	RoBERTa-GRU, RoBERTa-LSTM, RoBERTa-BiGRU, RoBERTa-BiLSTM, Adam optimizer, AEDA, SMOTE, GPT	French, English, Arabic, Cohen’s kappa, ROC-AUC, Accuracy, MCC, K-fold	High Efficacy, Improved Classification
Mahajan, More, and Shah²⁰	To develop and evaluate models for recognizing single and mixed emotions in multilingual, code-mixed YouTube comments.	LR, SVM, NB, RF, LSTM, BiLSTM, GRU, CNN	13,000 multilabel YouTube comments, Accuracy, F1 score	SVM Highest Accuracy
Zhu et al.²¹	To develop an accurate and efficient medical question-answering system using advanced AI techniques.	Knowledge Embedding, Transformer-based Architecture, Knowledge Understanding Layer, Answer Generation Layer	MCMLE and USMLE Datasets	82.92% on MCMLE and 64.02% on USMLE
Khan et al.²²	To develop an efficient DL method for accurate violence detection in surveillance videos using a two-stream approach.	Two-Stream Architecture, 3D Convolution Network, Background Suppression, Optical Flow Analysis, Depth-Wise 3D Convolutions	RWF2000, RLVS	Accuracy, Efficiency
Arumugam et al.²³	To develop a novel multimodal by integrating audio, visual, and text inputs for accurate emotion recognition.	AVTEFN, Hybrid Wav2Vec 2.0 + CNN, BERT with Bi-GRU, Attention-Based Fusion	Benchmark Dataset	Accuracy at 98.7%, Precision at 98.2%, Recall, at 97.2%, and F1-score of 97.49%
Khan et al.²⁴	To enhance multimodal emotion recognition by capturing inter- and intra-modal relationships using a joint transformer-based model.	MER, JMMT	IEMOCAP, MELD	Accuracy, F1-Score
Alyoubi and Alyoubi²⁵	To develop an optimized transformer-based multimodal emotion recognition framework for accurate emotion classification.	BERT/RoBERTa for Text, wav2vec 2.0 for Speech, ResNet50/VGG16 for Visuals, Cross-Modal Attention, SHAP Explainability	Multimodal EmotionLines (MELD) Dataset	Accuracy, Explainability
Vani et al.²⁶	To develop and evaluate Text Fusion + through advanced text analysis and audio output.	OCR, NLP, TTS, DL Summarization, NLP-based Q&A Module	Standard Dataset	Summarization Accuracy, User Accessibility
Khan et al.²⁷	To explore sequence learning techniques for accurate auditory emotion recognition using advanced RNNs.	LSTM, GRU, Bi-Directional LSTM, Deep/Multilayer Architectures	Emotion Dataset	Accuracy, Model Robustness
Ghous, Najam, and Jalal²⁸	To detect emotional states in individuals with cognitive disabilities using EEG data and advanced ML models.	BF, Downsampling, AAFST, Multi-Class SVM	SEED-IV	Accuracy, Emotion Detection
Patil et al.²⁹	To predict learning disabilities using handwritten text analysis and intelligent systems.	DNN, Character Confusion Detection, Pattern Extraction Techniques	IAM Handwriting Dataset	Accuracy, Scalability
Mishra et al.³⁰	To develop a transformer-based NLP model to support mental health monitoring.	Transformer Architecture, GPT-4 and BERT, Adam Optimizer	English Twitter Dataset	94% Accuracy

Back to article page

Table 1 A comparative study of the reviewed techniques.

Search

Quick links