Table 1 Performance contribution of individual modules within the proposed framework.
From: Deep spectrotemporal network based depression severity estimation from speech
Modules | AVEC2013 | AVEC2014 | ||
|---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
Holistic features Based Spectral Stream with Transformer | 7.08 | 8.48 | 6.81 | 8.16 |
Multi-patch local features Based Spectral Stream with Transformer | 7.31 | 8.78 | 7.09 | 8.67 |
Proposed Spectral Stream with Transformer | 6.91 | 8.08 | 6.41 | 7.95 |
VLNEP Based Temporal Stream with Transformer | 6.57 | 7.76 | 6.21 | 7.62 |
Proposed Spectrotemporal Network | 5.860 | 7.109 | 5.78 | 6.918 |