Table 4 Experiment results on text prompts.

(a) CLAP-HTS-AT (2.1M) performance of recognizing birds in the background. Higher is better
Is this a sound of {} or frogs?		Ap
Birds Birds singing Birds singing in the background Birds singing *far* in the background Supervised baseline ap:		0.54 0.63 0.73 0.79 0.88
(b) CLAP-HTS-AT (2.1 M) performance of recognizing gunshot sounds in tropical rain forest. Higher is better
Is this a sound of {A} or {B}?		Ap
A: Gunshots, B: Noise A: Gunshots in the distance, B: Noise A: Gunshots in the distance, B: Broken branches or noise Supervised baseline ap:		0.36 0.57 0.67 0.64
(c) CLAP-PANN (128 K) performance of recognizing meerkat sounds using 2-second window
Is this a sound of {} or non-animal noise?	Ap
Meerkats Meerkats growling Meerkats clucking Meerkats clucking or growling Growling Clucking Clucking or growling Animals Animals growling Animals clucking Animals clucking or growling Supervised baseline ap:	0.56 0.68 0.80 0.79 0.63 0.82 0.78 0.85 0.82 0.86 0.88 0.94

Quick links

Search