Table 4 Classification accuracy of the deep learning models on the RAVDESS datasets.
From: Stacked convolutional neural network for emotion recognition using multi feature speech analysis
Feature Sets | CNN + LSTM | CNN | LSTM | RNN + LSTM |
|---|---|---|---|---|
Combined (196) | 82.21 | 88.69 | 87.72 | 86.31 |
MFCC (40) | 80.36 | 81.02 | 90.48 | 90.84 |
LPC (13) | 77.68 | 73.36 | 60.79 | 59.67 |
Mel Spectrogram (128) | 81.10 | 81.02 | 79.09 | 79.02 |
Chroma & Others (15) | 33.63 | 32.07 | 30.88 | 29.91 |