Table 8 An overview of acoustic features. for more details, see the cooperative voice analysis repository (COVAREP).

From: A systematic review on automated clinical depression diagnosis

Acoustic feature	Description
Source features	Features reflecting airflow from the lungs through the glottis (i.e., glottal features) or vocal fold vibrations (i.e., voice quality features), which is the sound source later filtered by the vocal tract following the source-filter theory of speech production.
Jitter (%)	Deviations in the consecutive lengths of the f₀ period, which suggests irregular and uneven vocal fold vibrations.
Shimmer (%)	The variation in the peak amplitudes of consecutive f₀ periods, which implies unevenness in voice loudness.
Tremor (Hz)	The number of occurrences of the most powerful low-frequency fundamental frequency-modulating element within a defined examination range.
Harmonics-to-noise ratio (HNR) (dB)	Ratio between f₀ and noise components, which indirectly correlates with perceived aspiration.
Frequency disturbance ratio (FDR) (%)	The average relative value of the frequency variation over 5 to 5 cycles (calculated using an average of five data points).
Amplitude disturbance ratio (ADR) (%)	Relative mean amplitude value over a set of windows.
Quasi-open quotient	Ratio of the vocal folds opening time. Functional dysphonias often reduce QOQ range.
Normalized amplitude quotient (NAQ)	A measurement that compares the amplitude between the highest and lowest points of the differentiated flow glottogram to the amplitude of the negative peak and normalizing it with respect to the period time. It can be used as an approximation of glottal adduction.
Peak slope	Slope of the regression line that is fit to log10 of the maxima of each frame.
Filter features	The resonant properties of the vocal and nasal tracts filter the sound source from the vocal folds: the filter attenuates certain frequencies and strengthens others by the shape of the vocal and nasal tracts.
F₁ mean (Hz)	First peak in the spectrum that results from a resonance of the human vocal tract.
F₂ mean (Hz)	Second peak in the spectrum that results from a resonance of the human vocal tract.
F₁ variability (Hz)	Measures of dispersion of F₁ (variance, standard deviation).
F₂ variability (Hz)	Measures of dispersion of F₂ (variance, standard deviation).
F₁ range (Hz)	Difference between the lowest and highest F₁ values.
Vowel space	F₁ and F₂ 2D space for the vowels.
Linear predictive coding (LPC) coefficients	Coefficients that best predict the values of the next time point of the audio signal using the values from the previous n time points, which is used to reconstruct filter properties.
Spectral features	Features characterizing the frequency distribution of the speech signal at a particular moment in time.
Mel-frequency cepstral coefficients (MFCCs)	The coefficients derived by analyzing the Mel-spectrum of the log-magnitude of an audio segment.
Prosodic features	Changes over longer segments of time, which is perceived in the rhythm, stress, and intonation of speech.
f₀ mean (Hz)	Fundamental frequency: lowest frequency of the speech signal, perceived as pitch (mean, median).
f₀ variability (Hz)	Measures of dispersion of f₀ (variance, standard deviation).
f₀ range (Hz)	Difference between the lowest and highest f₀.
Intensity (dB)	Defined as the acoustic intensity (i.e., power carried by sound per unit area in a direction perpendicular to that area in decibels relative to a reference value, perceived as loudness).
Intensity variability (dB)	Measures of dispersion of intensity (variance, standard deviation).
Energy velocity	Measured as the mean-squared central difference across frames and may correlate with motor coordination.
Maximum phonation time (s)	The mean of three attempts of the following measure is taken: the maximum time during which phonation of a vowel is sustained as long as possible with an upright position, deep breath, and a comfortable pitch and loudness.
Speech rate	Number of speech utterances per second over the duration of the speech sample (including pauses).
Articulation rate	Number of speech units per second throughout the speech sample (excluding pauses).
Time talking (s)	Sum of the duration of all speech segments.
Utterance duration mean (s)	Mean duration of utterance length.
Pause duration mean (s)	Mean duration of pause length.
Pause variability (s)	Measures of dispersion of pause duration (variance, standard deviation).
Pause rate (s)	Total length of pauses divided by the total length of speech (including pauses).
Pause total (s)	Total duration of pauses.

Back to article page

Table 8 An overview of acoustic features. for more details, see the cooperative voice analysis repository (COVAREP).

Search

Quick links