Table 14 Comparison of the accuracy of the proposed sconn model with that of existing approaches for the RAVDESS dataset.
From: Stacked convolutional neural network for emotion recognition using multi feature speech analysis
Authors/Year | Features Used | Methodology | Accuracy |
|---|---|---|---|
Issa et al.16 | Mel Spectrogram, Chromagram, Spectral Contrast, MFCC, Tonnetz, | 1-D CNN | 71.61% |
Alnuaim et al.20 | MFCC, STFT, Mel Spectrogram | MLP | 81.00% |
Andayani et al. 21 | MFCC | LSTM-Transformer | 75.62% |
Kakuba et al.23 | MFCC, Chromagram, Mel Spectrogram | ABMD | 85.89% |
Dolka et al.14 | MFCC | ANN | 88.72% |
Jahangir et al.24 | Spectral contrast, tonnetz, MFCCS, delta-MFCCS, delta-delta MFCCS | 1-D CNN | 90.60% |
Li. et al.40 | IMel Spectrogram, Mel Spectrogram | CNN-SSAE | 83.18% |
Bhattacharya et al.41 | MFCC, Chroma, Tonnetz, Contrast, Mel Spectrogram | CNN | 90.86% |
Khan et al.42 | - | DeepESN, Dilated CNN, Multi-Headed Attention Mechanism | 77.02% |
Proposed Work | Mel Spectrogram | SCoNN | 90.63% |
MFCC | 91.51% | ||
Mel Spectrogram + MFCC | 93.30% |