Table 3 Average Precision (AP) comparisons of different source window sizes on BirdVox and Meerkat. Higher is better.
From: Multi-modal Language models in bioacoustics with zero-shot transfer: a case study
Settings | Models | Rfcx-Bird 10-sec window 7-sec window | Meerkat 2-sec window 7-sec window | ||
|---|---|---|---|---|---|
Supervised | ResNet-18 | 0.88 | 0.89 | 0.94 | 0.97 |
Zero-Shot Transfer | CLAP-HTS-AT (450 K) | 0.70(↓) | 0.72(↓) | 0.81(↓) | 0.97(-) |
CLAP-HTS-AT (2.1 M) | 0.79(↓) | 0.82(↓) | 0.87(↓) | 0.98(↑) | |