Fig. 6: Comparison of SELD Output Representation Strategies.
From: Environmental acoustic intelligence through sound event localization and detection: a review

(Top Left) The conventional class-wise two-branch design5 outputs separate SED probabilities and DOA coordinates for each class. (Top Right) The class-wise ACCDOA format112 integrates detection and localization into a single 3-D vector per class, where the vector’s direction encodes the DOA and its magnitude encodes event activity. (Bottom Left) The track-wise two-branch design113 replicates the SED and DOA outputs across multiple tracks to handle overlapping sound events of the same class. (Bottom Right) The multi-ACCDOA format116 extends ACCDOA to multiple tracks, combining a unified representation with multi-track functionality. Active sound events are denoted by green boxes.