Scientific Reports

Table 2 Comparison experiment of the AVDor with other state-of-the-art object detectors.

From: Multimodal learning audio-visual detection for obtaining object-level sound sources in Japanese-language teaching room

Detector	mAP (%)	AVMR (%)
YOLOv5s	43.74	38.70
YOLOv6s	46.60	40.23
YOLOv8s	48.02	40.64
YOLOv8x	54.27	47.76
Faster Rcnn-r101	50.82	44.21
Cascade Rcnn- × 101	53.20	45.22
Ours-r50	55.45	49.18
Ours-r101	56.19	52.54

r50, r101, and × 101 represent ResNet-50, ResNet-101 ¹⁵, and ResNeXt-101 ²⁰, respectively.
Significant values are in bold.

Back to article page

Search

Advanced search

Quick links