Table 14 Comparison of the accuracy of the proposed sconn model with that of existing approaches for the RAVDESS dataset.

From: Stacked convolutional neural network for emotion recognition using multi feature speech analysis

Authors/Year

Features Used

Methodology

Accuracy

Issa et al.16

Mel Spectrogram, Chromagram, Spectral Contrast, MFCC, Tonnetz,

1-D CNN

71.61%

Alnuaim et al.20

MFCC, STFT, Mel Spectrogram

MLP

81.00%

Andayani et al. 21

MFCC

LSTM-Transformer

75.62%

Kakuba et al.23

MFCC, Chromagram, Mel Spectrogram

ABMD

85.89%

Dolka et al.14

MFCC

ANN

88.72%

Jahangir et al.24

Spectral contrast, tonnetz, MFCCS, delta-MFCCS, delta-delta MFCCS

1-D CNN

90.60%

Li. et al.40

IMel Spectrogram, Mel Spectrogram

CNN-SSAE

83.18%

Bhattacharya et al.41

MFCC, Chroma, Tonnetz, Contrast, Mel Spectrogram

CNN

90.86%

Khan et al.42

-

DeepESN, Dilated CNN, Multi-Headed Attention Mechanism

77.02%

Proposed Work

Mel Spectrogram

SCoNN

90.63%

MFCC

91.51%

Mel Spectrogram + MFCC

93.30%