Table 4 Overview of model architectures evaluated on the STARSS23 validation set
From: Environmental acoustic intelligence through sound event localization and detection: a review
Approach | Format | Params (M) | Data (h) | Model Architecture(s) | \({\text{ER}}_{\le 2{0}^{\circ }}\,\downarrow\) | \({\text{F}}_{\le 2{0}^{\circ }}\,\uparrow\) | LECD ↓ | LRCD ↑ | \({{\mathcal{E}}}_{{\rm{SELD}}}\,\downarrow\) |
|---|---|---|---|---|---|---|---|---|---|
DCASE | |||||||||
Wang et al.78 | FOA | 63 | 192 | Sound Separation, ResNet-Conformer, Model Ensemble | 0.38 | 66.0% | 12. 8° | 75.0% | 0.260 |
Xue et al.79 | FOA | 33 | - | ResNet-Conformer, MS-CAM | 0.44 | 54.2% | 13. 9° | 67.9% | 0.324 |
Hu et al.149 | FOA | 85 | 49 | EINV2 | 0.48 | 47.3% | 16. 1° | 62.6% | 0.368 |
Kang et al.121 | FOA | 202 | 212 | ResNet-Conformer, Model Ensemble | 0.43 | 55.8% | 15. 9° | 71.5% | 0.311 |
Kim and Ko122 | FOA | 138 | 192 | ResNet, SqEx, Model Ensemble | 0.47 | 51.7% | 15. 2° | 70.2% | 0.334 |
Zhang et al.150 | FOA | 104 | 128 | CNN-Conformer, Model Ensemble | 0.46 | 52.0% | 14. 0° | 59.5% | 0.356 |
Wu151 | FOA | 85 | - | EINV2 | 0.54 | 41.1% | 22. 3° | 62.3% | 0.407 |
Shul et al.83 | FOA | 2 | 192 | Divided Spectro-Temporal Attention | 0.49 | 42.7% | 16. 7° | 55.2% | 0.401 |
Kumar et al.152 | FOA | 3 | 52 | CNN-Conformer | 0.39 | 56.0% | 20. 3° | 63.0% | 0.328 |
DCASE Baseline | FOA | 0.7 | 24 | SELDNet | 0.57 | 29.9% | 22. 0° | 47.7% | 0.479 |
DCASE Baseline | MIC | 0.7 | 24 | SELDNet | 0.62 | 27.8% | 27. 0° | 44.3% | 0.512 |
Non-DCASE | |||||||||
Shul et al.87 | FOA | - | 24 | CST-Former | 0.41 | 57.7% | 13. 8° | 68.3% | 0.307 |
Berghi et al.20 | FOA | - | 272 | CNN-Conformer | 0.51 | 50.2% | 15. 4° | 56.4% | 0.382 |
Hu et al.89 | FOA | 34.6 | - | HTS-AT, PSELDNet | 0.39 | 62.4% | 14. 4° | 77.7% | 0.267 |
Mu et al.86 | FOA | 26.9 | - | Multi-Feature Fusion, EINV2 | 0.54 | 42.5% | 18. 7° | 62.6% | 0.398 |
Jiang et al.153 | FOA | 63 | 192 | ResNet-Conformer | 0.42 | 57.0% | 14. 3° | 67.0% | 0.310 |
He et al.48 | FOA | - | 24 | Pretrained SSAST | 0.49 | 44.4% | 18. 8° | 62.1% | 0.382 |
Zhang et al.81 | FOA | 4 | 28.3 | CRNN10, AADA | 0.53 | 32.9% | 31. 3° | 40.5% | 0.405 |