Table 4 The most stable features for predicting BvA and alogia from vocal acoustics.

From: Using machine learning of computerized vocal expression to measure blunted vocal affect and alogia

Feature name

How feature is computed

What feature means

Alogia

Unvoiced Segment Length: SD (StddevUnvoicedSegmentLength)

Standard deviation of unvoiced segments length

Captures the variability in pause length. This is potentially related to articulation rate and speech production, and conceptually critical to alogia.

Blunted affect

Mel-Frequency-Capstral-Coefficients – 2: SD (mfcc2_sma3_stddevNorm)

Computed as a spectrum of transformed frequency values over time

Captures variability in the global signature of the signal spectrum over time, based on a short-term frequency representation based on a nonlinear mel scale of frequency. It broadly reflects global changes in the vocal tract and is critical for speech recognition in humans and in automated systems. The MFCC2 reflects finer spectral details than MFCC1.

Harmonic Difference: H1 – A3 (logRelF0-H1-A3_sma3nz_amean)

Mean ratio of energy of the first F0 harmonic (H1) to the energy of the highest harmonic in the third formant range (A3)

Ratio of energy of the first F0 harmonic to the third F0 harmonic - generated from the vocal folds as opposed to the vocal tracts. A measure of “spectral tilt” (i.e., tendency for lower frequencies to have less volume), and associated with breathy voice in men, and lack of “creaky voice”

Both blunted vocal affect and alogia

Second Formant: M (F2frequency_sma3nz_amean)

Average of formant 2 frequency values

Captures spectral shaping of vocal signal, computed as the average frequency from vowel shaping. The second formant typically reflects tongue body movement from front to back.

  1. Acoustic features determined to be most stable using stability selection.
  2. BvA blunted vocal affect.