Table 3 Average Precision (AP) comparisons of different source window sizes on BirdVox and Meerkat. Higher is better.

From: Multi-modal Language models in bioacoustics with zero-shot transfer: a case study

Settings

Models

Rfcx-Bird

10-sec window 7-sec window

Meerkat

2-sec window 7-sec window

Supervised

ResNet-18

0.88

0.89

0.94

0.97

Zero-Shot Transfer

CLAP-HTS-AT (450 K)

0.70(↓)

0.72(↓)

0.81(↓)

0.97(-)

CLAP-HTS-AT (2.1 M)

0.79(↓)

0.82(↓)

0.87(↓)

0.98()