Table 6 Comparison with existing work.
From: Speech emotion recognition with light weight deep neural ensemble model using hand crafted features
Author | Technique | Features | Datasets | Accuracy (%) |
|---|---|---|---|---|
Akinpelu et al.57 | VGGNet | -MFCC | RAVDESS | 86.25 |
TESS | 100 | |||
EmoDB | 96 | |||
Ottoni et al.58 | Meta-Learning | -MFCC | RAVDESS | 97.01 |
RMSE | SAVEE | 90.62 | ||
ZCR | TESS | 100.00 | ||
| Â | CREMA-D | 83.28 | ||
Jothimani et al.59 | CNN1D | MFCC | RAVDESS | 92.60 |
RMSE | SAVEE | 84.90 | ||
-ZCR | TESS | 99.60 | ||
| Â | CREMA-D | 89.90 | ||
Jiang et al.60 | Parallelized CRNN | Log Mel Spectrogram | EMODB | 84.53 |
-Frame Level Features | SAVEE | 59.40 | ||
Mustaqeem et al.69 | Bi-LSTM | -Spatial Features | EMODB | 85.57 |
RAVDESS | 77.02 | |||
Wen et al.64 | Transfer Learning | -Log Mel Spectrogram | EMODB | 84.14 |
SAVEE | 52.09 | |||
Guizzo et al.65 | Quantarion CNN | Real-valued spectrograms | EMODB | 73.00 |
RAVDESS | 55.15 | |||
TESS | 99.76 | |||
Meng et al.66 | Bi-LSTM | 3-D Log-Mel spectrums | EMODB | 84.99 |
Kwon67 | CNN | Spatial Features | EMODB | 90.01 |
Krishnan et al.68 | LDA | Entropy Feature | TESS | 93.30 |
Proposed method | Averaging ensemble | MFCC | RAVDESS | 97.57 |
RMSE | SAVEE | 98.43 | ||
ZCR | TESS | 100 | ||
Chroma | CREMA-D | 98.66 | ||
EmoDB | 98.60 |