Table 4 Overview of model architectures evaluated on the STARSS23 validation set

From: Environmental acoustic intelligence through sound event localization and detection: a review

Approach

Format

Params (M)

Data (h)

Model Architecture(s)

\({\text{ER}}_{\le 2{0}^{\circ }}\,\downarrow\)

\({\text{F}}_{\le 2{0}^{\circ }}\,\uparrow\)

LECD ↓

LRCD ↑

\({{\mathcal{E}}}_{{\rm{SELD}}}\,\downarrow\)

DCASE

Wang et al.78

FOA

63

192

Sound Separation, ResNet-Conformer, Model Ensemble

0.38

66.0%

12. 8°

75.0%

0.260

Xue et al.79

FOA

33

-

ResNet-Conformer, MS-CAM

0.44

54.2%

13. 9°

67.9%

0.324

Hu et al.149

FOA

85

49

EINV2

0.48

47.3%

16. 1°

62.6%

0.368

Kang et al.121

FOA

202

212

ResNet-Conformer, Model Ensemble

0.43

55.8%

15. 9°

71.5%

0.311

Kim and Ko122

FOA

138

192

ResNet, SqEx, Model Ensemble

0.47

51.7%

15. 2°

70.2%

0.334

Zhang et al.150

FOA

104

128

CNN-Conformer, Model Ensemble

0.46

52.0%

14. 0°

59.5%

0.356

Wu151

FOA

85

-

EINV2

0.54

41.1%

22. 3°

62.3%

0.407

Shul et al.83

FOA

2

192

Divided Spectro-Temporal Attention

0.49

42.7%

16. 7°

55.2%

0.401

Kumar et al.152

FOA

3

52

CNN-Conformer

0.39

56.0%

20. 3°

63.0%

0.328

DCASE Baseline

FOA

0.7

24

SELDNet

0.57

29.9%

22. 0°

47.7%

0.479

DCASE Baseline

MIC

0.7

24

SELDNet

0.62

27.8%

27. 0°

44.3%

0.512

Non-DCASE

Shul et al.87

FOA

-

24

CST-Former

0.41

57.7%

13. 8°

68.3%

0.307

Berghi et al.20

FOA

-

272

CNN-Conformer

0.51

50.2%

15. 4°

56.4%

0.382

Hu et al.89

FOA

34.6

-

HTS-AT, PSELDNet

0.39

62.4%

14. 4°

77.7%

0.267

Mu et al.86

FOA

26.9

-

Multi-Feature Fusion, EINV2

0.54

42.5%

18. 7°

62.6%

0.398

Jiang et al.153

FOA

63

192

ResNet-Conformer

0.42

57.0%

14. 3°

67.0%

0.310

He et al.48

FOA

-

24

Pretrained SSAST

0.49

44.4%

18. 8°

62.1%

0.382

Zhang et al.81

FOA

4

28.3

CRNN10, AADA

0.53

32.9%

31. 3°

40.5%

0.405

  1. This table summarizes selected systems, detailing their recording format, model complexity (in parameter count), training data volume, and key evaluation metrics. DCASE submissions are sorted by their challenge ranking, and only systems outperforming the baseline are listed.