Table 2 Comparison experiment of the AVDor with other state-of-the-art object detectors.

From: Multimodal learning audio-visual detection for obtaining object-level sound sources in Japanese-language teaching room

Detector

mAP (%)

AVMR (%)

YOLOv5s

43.74

38.70

YOLOv6s

46.60

40.23

YOLOv8s

48.02

40.64

YOLOv8x

54.27

47.76

Faster Rcnn-r101

50.82

44.21

Cascade Rcnn- × 101

53.20

45.22

Ours-r50

55.45

49.18

Ours-r101

56.19

52.54

  1. r50, r101, and × 101 represent ResNet-50, ResNet-101 15, and ResNeXt-101 20, respectively.
  2. Significant values are in bold.