Table 4 Comparison of depression severity estimation performance between the proposed deep spectrotemporal network and existing state-of-the-art methods on the AVEC2013 and AVEC2014 datasets.

From: Deep spectrotemporal network based depression severity estimation from speech

Modules

AVEC2013

AVEC2014

MAE

RMSE

MAE

RMSE

Baseline (2013)17

10.35

14.12

-

-

Baseline (2014)18

-

-

10.03

12.56

MHH+PLS (2013)27

9.14

11.19

-

-

Fisher Vector (2014)28

-

-

9.74

11.51

PCA+PLS (2017)29

-

-

8.07

10.28

Deep CNN (2018)19

8.2

10.00

8.19

9.99

CNN+LSTM+DNN (2019)30

7.48

9.79

8.02

9.66

Hybrid Network (2020)32

7.38

9.65

7.94

9.57

STA Network (2020)35

7.14

9.50

7.65

9.13

SR+SER (2021)36

7.316

8.730

6.795

8.822

MFCCs+Spe+ADTP (2022)20

-

-

7.26

9.27

STN (2022)16

6.70

8.16

6.95

8.46

WavDepressionNet (2023)12

6.14

8.20

6.60

8.61

WavMHSANet (2024)37

6.98

9.09

6.96

8.85

SpectrumFormer (2025)13

6.09

8.12

6.36

8.31

DSDD (2025)26

6.09

8.27

6.22

8.14

Proposed Spectrotemporal Network

5.860

7.109

5.78

6.918