npj Acoustics

npj Acoustics has APC waivers available that can be allocated upon acceptance on an ad-hoc basis. For additional information, contact the Journal Publisher, Chew Juan Low.

Table 3 Description of selected data augmentation techniques commonly used in SELD

From: Environmental acoustic intelligence through sound event localization and detection: a review

Method	Description
SpecAugment¹⁰²	Randomly masks blocks along the time and/or frequency dimensions of a spectrogram, akin to time-frequency masking or cutout¹⁰³, to improve model robustness against partial data loss.
ACS⁷⁴	Applies directional transformations to multi-channel recordings (FOA or MIC) by swapping or rotating channels, effectively simulating up to eight different DOA configurations while preserving reverberation characteristics.
MCS⁷⁴	Synthesizes new multi-channel audio by combining beamformed spectral features with spatial cues derived from covariance analysis, simulating novel spatial audio scenarios.
Mixup¹⁰⁹	Blends two monophonic audio waveforms and their labels. Generally used to create synthetic polyphonic mixtures to improve robustness in densely polyphonic acoustic scenes.
FilterAugment¹⁰⁴	Simulates spectral coloration imparted by diverse acoustic environments by applying random gain offsets to randomly sized frequency bands.
Frequency Shifting⁵¹	Reflects (shifts) a randomly selected frequency band upward or downward, altering the spectral distribution while largely preserving the overall structure of the spectrogram.
SpatialMixup¹⁰⁶	Applies selective directional gains to multi-channel audio signals to simulate variations in directional loudness and suppression, increasing the spatial diversity of the data.
SpecMix¹⁰¹	Mixes two spectral feature representations (e.g., spectrograms) to form a new mixture, aiming to retain crucial frequency content from both sources.

Back to article page

Search

Advanced search

Quick links