Table 3 Comparison experiment of the TPAVI + and TPAVI.

From: Multimodal learning audio-visual detection for obtaining object-level sound sources in Japanese-language teaching room

Method

mAP (%)

AVMR (%)

r50&A + V

47.75

40.08

r50&TPAVI

53.78

48.88

r50&TPAVI + 

55.45

49.18

r101&A + V

51.33

44.89

r101&TPAVI

55.02

51.95

r101&TPAVI + 

56.19

52.54

  1. A + V represent addition operation of audio feature and visual feature.
  2. Significant values are in bold.