Fig. 5: Frequency distribution of 3-second samples per anuran species.
From: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring

The long-tailed distribution is a typical distribution of a real-world species diversity dataset. We split species into the classes of ‘common’, ‘frequent’, and ‘rare’ to determine the effect of sample size on the performance of the species identification problem. Additionally, the occurrence of the same species in different sites is represented by different colored squares at the bottom of the histogram. The training and test set distributions obtained by using the split strategy are depicted with black and blue lines, respectively.