Fig. 3: Audio files undergo a series of processes to identify acoustic features.

Zoom allows the audio from each speaker to be saved to separate files, here labeled Recording file 1 and Recording file 2. These files are then renamed S1 and S2, corresponding to the order in which the participants speak, with S1 designating the first speaker and S2 the second. During a pre-processing step, a step function is used to identify valid speech signals. The resulting recordings are used to extract two types of acoustic features: low-level descriptors (LLDs) and higher-level ‘functional’ features, the latter of which represents global properties of a participant’s acoustic signal.