Figure 1

Schematic depiction of the outline of the paper. There are three different phases in this work (a) Pre-training for speaker embeddings using a large non-medical speech data collected from N different speakers, (b) Depression analysis using speaker embeddings extracted from pre-trained models on longitudinal data, and (c) Depression detection and severity estimation using speaker embeddings extracted from pre-trained models.