Table 1 Performance contribution of individual modules within the proposed framework.

From: Deep spectrotemporal network based depression severity estimation from speech

Modules

AVEC2013

AVEC2014

MAE

RMSE

MAE

RMSE

Holistic features Based Spectral Stream with Transformer

7.08

8.48

6.81

8.16

Multi-patch local features Based Spectral Stream with Transformer

7.31

8.78

7.09

8.67

Proposed Spectral Stream with Transformer

6.91

8.08

6.41

7.95

VLNEP Based Temporal Stream with Transformer

6.57

7.76

6.21

7.62

Proposed Spectrotemporal Network

5.860

7.109

5.78

6.918