Table 3 Description of selected data augmentation techniques commonly used in SELD
From: Environmental acoustic intelligence through sound event localization and detection: a review
Method | Description |
|---|---|
SpecAugment102 | Randomly masks blocks along the time and/or frequency dimensions of a spectrogram, akin to time-frequency masking or cutout103, to improve model robustness against partial data loss. |
ACS74 | Applies directional transformations to multi-channel recordings (FOA or MIC) by swapping or rotating channels, effectively simulating up to eight different DOA configurations while preserving reverberation characteristics. |
MCS74 | Synthesizes new multi-channel audio by combining beamformed spectral features with spatial cues derived from covariance analysis, simulating novel spatial audio scenarios. |
Mixup109 | Blends two monophonic audio waveforms and their labels. Generally used to create synthetic polyphonic mixtures to improve robustness in densely polyphonic acoustic scenes. |
FilterAugment104 | Simulates spectral coloration imparted by diverse acoustic environments by applying random gain offsets to randomly sized frequency bands. |
Frequency Shifting51 | Reflects (shifts) a randomly selected frequency band upward or downward, altering the spectral distribution while largely preserving the overall structure of the spectrogram. |
SpatialMixup106 | Applies selective directional gains to multi-channel audio signals to simulate variations in directional loudness and suppression, increasing the spatial diversity of the data. |
SpecMix101 | Mixes two spectral feature representations (e.g., spectrograms) to form a new mixture, aiming to retain crucial frequency content from both sources. |