Table 3 Description of selected data augmentation techniques commonly used in SELD

From: Environmental acoustic intelligence through sound event localization and detection: a review

Method

Description

SpecAugment102

Randomly masks blocks along the time and/or frequency dimensions of a spectrogram, akin to time-frequency masking or cutout103, to improve model robustness against partial data loss.

ACS74

Applies directional transformations to multi-channel recordings (FOA or MIC) by swapping or rotating channels, effectively simulating up to eight different DOA configurations while preserving reverberation characteristics.

MCS74

Synthesizes new multi-channel audio by combining beamformed spectral features with spatial cues derived from covariance analysis, simulating novel spatial audio scenarios.

Mixup109

Blends two monophonic audio waveforms and their labels. Generally used to create synthetic polyphonic mixtures to improve robustness in densely polyphonic acoustic scenes.

FilterAugment104

Simulates spectral coloration imparted by diverse acoustic environments by applying random gain offsets to randomly sized frequency bands.

Frequency Shifting51

Reflects (shifts) a randomly selected frequency band upward or downward, altering the spectral distribution while largely preserving the overall structure of the spectrogram.

SpatialMixup106

Applies selective directional gains to multi-channel audio signals to simulate variations in directional loudness and suppression, increasing the spatial diversity of the data.

SpecMix101

Mixes two spectral feature representations (e.g., spectrograms) to form a new mixture, aiming to retain crucial frequency content from both sources.