Table 4 Comparison of depression severity estimation performance between the proposed deep spectrotemporal network and existing state-of-the-art methods on the AVEC2013 and AVEC2014 datasets.
From: Deep spectrotemporal network based depression severity estimation from speech
Modules | AVEC2013 | AVEC2014 | ||
|---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
Baseline (2013)17 | 10.35 | 14.12 | - | - |
Baseline (2014)18 | - | - | 10.03 | 12.56 |
MHH+PLS (2013)27 | 9.14 | 11.19 | - | - |
Fisher Vector (2014)28 | - | - | 9.74 | 11.51 |
PCA+PLS (2017)29 | - | - | 8.07 | 10.28 |
Deep CNN (2018)19 | 8.2 | 10.00 | 8.19 | 9.99 |
CNN+LSTM+DNN (2019)30 | 7.48 | 9.79 | 8.02 | 9.66 |
Hybrid Network (2020)32 | 7.38 | 9.65 | 7.94 | 9.57 |
STA Network (2020)35 | 7.14 | 9.50 | 7.65 | 9.13 |
SR+SER (2021)36 | 7.316 | 8.730 | 6.795 | 8.822 |
MFCCs+Spe+ADTP (2022)20 | - | - | 7.26 | 9.27 |
STN (2022)16 | 6.70 | 8.16 | 6.95 | 8.46 |
WavDepressionNet (2023)12 | 6.14 | 8.20 | 6.60 | 8.61 |
WavMHSANet (2024)37 | 6.98 | 9.09 | 6.96 | 8.85 |
SpectrumFormer (2025)13 | 6.09 | 8.12 | 6.36 | 8.31 |
DSDD (2025)26 | 6.09 | 8.27 | 6.22 | 8.14 |
Proposed Spectrotemporal Network | 5.860 | 7.109 | 5.78 | 6.918 |