Table 15 Comparison of the accuracy of the proposed sconn model with that of existing approaches for the SAVEE dataset.
From: Stacked convolutional neural network for emotion recognition using multi feature speech analysis
Authors/Year | Features Used | Methodology | Accuracy |
|---|---|---|---|
Kakuba et al.23 | MFCC, Chromagram, Mel Spectrogram | ABMD | 93.75% |
Dolka et al.14 | MFCC | ANN | 86.80% |
Li. et al.40 | Mel Spectrogram, Imel Spectrogram | CNN-SSAE | 88.96% |
Jahangir et al.24 | Spectral contrast, tonnetz, MFCCS, delta-MFCCS, delta-delta MFCCS, and chromagram | 1-D CNN | 93.75% |
Singh et al.43 | MFCC, pitch, ZCR, RMS | SVM | 77.38% |
Mishra et al.44 | MRVMMFCC, MRVMAE, MRVMPE | DNN | 83.40% |
Mountzouris et al. 45 | MFCC | CNN + ATN | 74.00% |
Saeed et al. 46 | MFCC, Mel Spectrogram, Chroma, Poly Feature | DNN | 90.00% |
Liu et al.47 | MFCC, Chromarequency, ZCR, MFCC, Chroma, Mel Spectrogram, Spectral Centroid, Spectral Contrast | CNN-A-LSTM | 94.50% |
Li et al. 48 | Log Mel Spectrogram | DeepCNN | 92.97% |
Proposed Work | Mel Spectrogram | SCoNN | 94.76% |
MFCC | 91.43% | ||
Combined | 95.00% |