A Public Dataset of Annotated Orcinus orca Acoustic Signals for Detection and Ecotype Classification

Palmer, K. J.; Cummings, Emma; Dowd, Michael G.; Frasier, Kait; Frazao, Fabio; Harris, Alex; Houweling, April; Kanes, Jasper; Kirsebom, Oliver S.; Klinck, Holger; LeBlond, Holly; Laturnus, Lauren; Matkin, Craig; Murphy, Olivia; Myers, Hannah; Olsen, Dan; O’Neill, Caitlin; Padovese, Bruno; Pilkington, James; Quayle, Lucy; Vuibert, Amalis Riera; Trounce, Krista; Vagle, Svein; Veirs, Scott; Veirs, Val; Wladichuk, Jen; Wood, Jason; Yack, Tina; Yurk, Harald; Joy, Ruth

doi:10.1038/s41597-025-05281-5

Download PDF

Data Descriptor
Open access
Published: 03 July 2025

A Public Dataset of Annotated Orcinus orca Acoustic Signals for Detection and Ecotype Classification

Scientific Data volume 12, Article number: 1137 (2025) Cite this article

6398 Accesses
1 Citations
4 Altmetric
Metrics details

Subjects

Abstract

Killer whales (Orcinus orca) exhibit significant ecological and genetic diversity, with three primary sympatric populations in the Northeast Pacific: Resident, Bigg’s (Transient), and Offshore. Each population is characterized by distinct foraging habits, social structures, and vocal repertoires, which complicate accurate monitoring and conservation efforts. This dataset, compiled from diverse sources, provides a comprehensive resource for the detection and classification of killer whale vocalizations. The dataset includes annotated acoustic recordings spanning 11 years from various locations in Alaska, British Columbia, and Washington, collected using multiple hydrophone systems. It addresses the challenge of differentiating killer whale calls from other marine species and environmental noise, including specific instances of confounding signals that may help enhance model robustness. Detailed annotations capture a diverse suite of vocalizations and their associated metadata, facilitating the development of advanced machine learning models for ecological monitoring. This curated dataset aims to improve the accuracy of killer whale detection algorithms, support conservation efforts, and advance our understanding of killer whale acoustic communication across different populations.

Killer whale call detection rates vary among subspecies and populations in the North Pacific

Article Open access 01 July 2025

Passive acoustic monitoring of killer whales (Orcinus orca) reveals year-round distribution and residency patterns in the Gulf of Alaska

Article Open access 13 October 2021

ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation

Article Open access 10 July 2023

Background & Summary

Killer whales (Orcinus orca) are cosmopolitan, with distinct representatives found in every ocean. The killer whale lineage is complex and presently delineated into multiple ecotypes that are genetically distinct^1,2. In the Northeast Pacific, killer whales have diverged into genetically and culturally distinct lineages that overlap in distribution. These lineages presently include three sympatric forms (ecotypes): Resident, Transient, and Offshore killer whales^3,4,5. The known ecotypes are sympatric but socially isolated and do not interbreed^1,5.

Ecotypes are distinguished by genetic, morphological, behavioral, and acoustic traits^6,7. Within the Resident ecotype there are four levels of social structure. Matrilines form the basic social unit and are composed of the eldest female and her offspring. Groups of matrilinies that share recent maternal heritage are referred to as pods and clans represent groups of pods that share an acoustic dialect and distant maternal heritage⁸. The term community refers to pods and clans that regularly associate, interbreed but do not share matrilines or vocal similarity. Not all matrilines in a community necessarily share calls or dialects. There are three resident killer whale communities in the Northeast Pacific, Southern Resident killer whales (SRKW), Northern Resident Killer Whales (NRKW), and Southern Alaskan Residents (SAR). Social association within the Transient ecotype is similarly matrilineal but less rigid with adult males and females frequently splitting from the matrilineal group. As with the Resident ecotype, the Transient ecotype in the Northeast Pacific comprises several genetically and acoustically distinct communities, including the AT1 Transients, Gulf of Alaska Transients, and West Coast Transients (or Bigg’s killer whales)^9,10. Little is known of the social organization of Offshore killer whales¹¹.

Conservation status similarly varies between ecotypes and communities. The NRKW community, West Coast Transients, and the Offshore ecotype are considered “threatened” under the Species at Risk Act in Canada; SRKWs are “endangered”, and Offshore killer whales are a “species of special concern” under the same act^11,12. Meanwhile, the Alaska Resident community is not designated as depleted under the US Marine Mammal Protection Act nor as threatened or endangered under the Endangered Species Act¹³. The population is functionally extinct with only seven animals thought to be alive in 2020 and no births in the last thirty years.

Each ecotype and community faces different stressors, SRKW are vulnerable to extinction due to a lack of available food, physical and acoustic disturbance, and persistent pollution in their environment¹⁴, e.g. masking of foraging and communication signals from transiting vessels^15,16,17,18. There are significant and sustained efforts to improve the outcome population status of SRKW including reducing competition for salmon through fishing closures, and noise reduction efforts in both US and Canadian waters^19,20. Critical Habitat designations, determined by years of visual and acoustic detections, inform these efforts^20,21.

Acoustic monitoring is an integral part of monitoring the behavior, habitat, and efficacy of conservation projects such as vessel slowdowns^{19,22,23,24,25,26,27}. However, passive acoustic methods typically generate large volumes of data which require automated processing to produce results within reasonable timeframes. A variety of generalized detection algorithms are available that work reasonably well as binary detectors of killer whale calls^28,29 and neural network based killer whale detectors have similarly been developed to detect SRKW calls^30,31. However, most of the existing detectors lack the ability to robustly distinguish between the highly variable killer whale calls and other signals in the same frequency band. While progress has been made in developing automated detection algorithms for killer whale vocalizations, there is considerable room for improving automated pipelines that discriminate between killer whale calls and other species signals, as well as between ecotypes and clans of killer whales.

All killer whale vocalizations can be grouped into three broad categories: echolocation clicks, whistles, and pulsed calls^32,33,34. Echolocation clicks are impulsive sounds used in feeding and navigation with the majority of the energy between 20 and 100 kHz^35,36. Rasps are less common and defined as a series of frequency modulated clicks that have been associated with foraging in other odontocetes³⁷. Whistles are narrow band signals that aid in close-range communication, generally spanning from 0.5 to 25 kHz, and may be involved in coordinating movements and maintaining group cohesion^38,39,40. Pulsed calls are broadband signals with energy between 0.5 to over 40 kHz⁴¹ and are the most common signal type used for communication by killer whales. They are composed of a series of pulses produced in such rapid succession as to sound tonal with multiple harmonics⁴². Pulsed calls form distinct, complex vocalizations (discrete calls) often characterized by a series of tonal elements that can have one or two overlapping fundamental frequencies^43,44,45 that vary in contour and amplitude over time³². Pulsed calls are primarily used for social communication within and between individuals and communities, serving functions in social cohesion, mating, travel, foraging coordination^43,44,45 and conveying social and behavioral cues. It is possible to discriminate between ecotypes, clans, and sometimes, pods or maternally related family groups by analyzing features of killer whale vocalizations. Resident killer whales produce calls in higher frequency ranges than Transient killer whales with significantly higher minimum, peak, and median call frequencies^10,46,47. The Offshore killer whales produce calls with a higher minimum frequency than other ecotypes^47,48. Such differences contribute to the distinct vocal repertoires and form the motivation for harnessing the power of modern classification methodologies to make the most of acoustic surveys in both archived or near real-time settings.

Accurate machine learning models rely on extensive and well-curated labeled datasets in order to reliably detect and classify killer whales in underwater sound recordings^49,50. In acoustic ecology, the data used to train machine learning algorithms should ideally represent the full range of the animals’ vocalization repertoire, and those vocalizations should remain relatively static over time³⁴. Many machine learning applications in conservation are targeted at longitudinal datasets to assess changes in occupancy of species on the scale of years or decades^{23,24,51,52,53}. In species capable of cultural adaptation of their repertoires, including humpback (Megaptera novaenglea) and killer whales, data for machine learning algorithms must then contain signals that were previously heard in the environment (e.g. antiquated song, and killer whale calls from now deceased animals). Furthermore, environmental factors including but not limited to background noise, instrument parameters, and sound propagation conditions can all influence how robust detection and classification algorithms are.

This work represents the largest curated dataset of audio and annotations to date as part of the 2026 Biennial Conference and Workshop on Detection, Classification, Localization, and Density Estimation of Marine Mammals using Passive Acoustics (DCLDE). Datasets associated with the DCLDE workshops have allowed for the continual development and evaluation of detection and classification algorithms for these challenging species. As part of the DCLDE, the goal of this dataset is to facilitate the construction and evaluation of detectors that are capable of 1) discriminating killer whale calls from other acoustically similar species and 2) discriminating between different Northeast Pacific ecotypes of killer whales.

Multiple groups have collaborated to produce over 225,000 bounding box acoustic annotations from 23 locations, encompassing 1.6 TB of audio data collected in the Northeast Pacific Ocean from Washington State to Southeast Alaska. These recordings, captured at depths ranging from 8 to 253 m, span a nine-year period between May 2013 to April 2023. The dataset represents a diverse set of projects with varying deployment, processing, and annotation methodologies, contributed by a collaboration of industry partners, not-for-profits, universities, and governmental organizations (Tables 1, 2). Data records are organized by provider, with details on deployment, processing, and annotation methodologies outlined in the following sections. Additionally, we provide a uniform and collated *.csv file with ecotype-level classifications.

Table 1 Deployment summary for the data included in the detection and classification dataset. Provider indicates the group contributing the audio and annotations files. Lat and Lon are the instrument latitude and longitude in decimal degrees note that field deployments for UAF data represent approximate center of focal follow tracks .Fs is the audio sample rate, Annotation start and finish dates represent first and last annotation included in the dataset. Dataset is the name of the deployment location used in the annotations table. Within the sample rate column, *, indicates the presence of a low-pass filter rendering usable frequencies lower than the Nyquist frequency. See provider information for more details on audio processing.

Full size table

Table 2 Summary of annotations for each contributor’s dataset.

Full size table

Data Records

Audio, annotation and meta files (where available) have been archived and made available by the US National Center for Environmental Information⁵⁴.

Data in the repository are organized into folders by the provider. Under each provider, there are folders for Audio, Annotations, and Metadata (where applicable). The Audio folder contains all audio files contributed by the provider and are organized by deployment locations. Any additional information provided by the contributor including hydrophone or deployment methods, additional resources for accessing mirrored copies of the data, or applicable reports describing the data collection methodologies is stored in an optional ‘meta’ folder under the provider. To limit the size of the complete dataset, only audio files with annotations are included.

To aid in rapid usability we also provide a standardized annotation file collated across all providers (Annotations.csv). The collated annotation file includes standardized annotations from across all datasets with labels described in Technical Validation section (Table 3).

Table 3 Annotation file descriptors in the pre-cleaned dataset.

Full size table

The original annotations often contain considerable information that is beyond the scope of the DCLDE challenge including a variety of different labels for biologic and anthropogenic sounds and finer resolution on killer whale calls including matriline, pulsed call type, or other non-standard calls such as ‘buzzes’ or ‘rasps’. These annotation details may be of interest to those knowledgeable in the field of killer whale acoustics hence their inclusion. However, as this information data were not consistently collected across or within projects it was not included in the combined Annotations csv described below. Additional information about the analysis procedure, where applicable, is stored in the ‘meta’ folder in each organization’s data along with any additional deployment information or relevant reports provided by the dataset authors.

Methods

In this study, we sought to build an “ecologically representative” dataset with comprehensive coverage of annotated audio signals spanning the entire vocal repertoire of the three ecotypes of killer whales in the Northeast Pacific Ocean: Resident, Transient, and Offshore killer whales. The dataset encompasses recordings sourced from a variety of geographical locations and varying recording conditions. A critical requirement for the dataset is its capability to facilitate the discrimination of target species vocalizations from those produced by other organisms within the survey area. In particular, humpback whale song units and whistles from other odontocetes, such as Pacific white-sided dolphins, are easily confused with killer whale pulsed calls. Effort was made to include anthropogenic noises such as ship propeller cavitation and other abiotic sounds that can sometimes confuse both humans and machine learning models. Therefore, the dataset includes specific instances of a variety of confounding signals to potentially enhance the robustness of any detection and classification algorithm developed with these data.

Building such a dataset is challenging and often cost prohibitive for a single organization. Thus, in this effort we have combined smaller annotated datasets from multiple commercial, non-profit, academic, and governmental organizations to build an ecologically representative annotation dataset. Much of the annotation effort was provided through the Humans and Algorithms Listening and Looking for Orcas (HALLO) project which used a standardized annotation procedure included in the Supporting Document. The following section provides detailed information on the 1) Deployment 2) Processing and 3) Annotation procedures for each data contributor. Deployment information, where available, is presented in Table 1 and detection details are presented in Table 2. While every effort has been made to regularize metadata across the entirety of the dataset, this was not always possible. Rather than exclude data not meeting an arbitrary threshold, we provide as much detail as possible and leave the final decision on which datasets to include or exclude to the user’s discretion. The following sections provide details for the audio and annotations contributed by each provider. A brief description of the goals of the datasets are provided as well as deployment information, any pre-processing algorithms used to re-sample the audio data or automatically detect cetaceans, and details on the annotation process including software used, and audio settings where possible. We also indicate whether the provided annotations were strong (all calls annotated) or weak (some calls likely not annotated in the files), and for what Class/Species label.

JASCO and the Vancouver Fraser Port Authority (JASCO/VPFA)

The Vancouver Fraser Port Authority (VFPA) in collaboration with JASCO Applied Sciences, collected data from two locations: Haro Strait and Boundary Pass. These data were part of the Enhancing Cetacean Habitat Observation (ECHO) program which aims to improve killer whale acoustic habitat through voluntary vessel speed reductions¹⁹.

Deployment

Two AMAR recorders were deployed between 210 and 251 m, directly adjacent to the southbound and northbound shipping lanes in Haro Strait (Table 1, Fig. 1). Instruments were deployed and recovered twice over the study length. The first deployment extended between July 6^th and September 8^th 2017. Instruments were deployed, recovered and refurbished before being re-deployed at the same Haro Strait locations on September 8^th and recovered on October 26^th 2017. AMARs from the Haro Strait locations sampled at 96 kHz.

AMAR recorders were deployed in Boundary Pass at 193 m depth, adjacent to the shipping lanes. Instruments collected data at two locations between September 2^nd, 2018 and April 2^nd, 2019. The AMARs sampled at 96 kHz.

Processing

For all deployments, likely marine mammal encounters were initially identified using custom detection and classification algorithms within PAMlab, a proprietary acoustics toolbox developed by JASCO Applied Science. Files that detected marine mammal signals were selected for human inspection and detailed annotation effort.

Annotation

Acoustics encounters (period in which multiple animal sounds are detected) were manually annotated by expert analysts for the presence of killer whale calls following the HALLO protocol. Expert annotators used Raven Pro v 1.5 to identify killer whale calls and, where possible, classify calls to call type. Annotators also noted the presence of a variety of non-target calls and abiotic sounds including unknown signals, background noise, fish, and potential Pacific-white-sided dolphins. Annotators were allowed to vary the spectrogram settings as needed in order to identify killer whale signals but generally set spectrogram settings to FFT length of 2600 and 50% overlap (1300 samples), 20 dB amplification, 20 sec timescale and 0 to 11 khz frequency scale. The audio files have been fully annotated for the presence of killer whale pulsed calls and whistles. All other annotation including killer whale clicks, abiotic sounds, and other biological sounds have been weakly annotated. Files are annotated to the call level.

JASCO, Vancouver Fraser Port Authority, Ocean Networks Canada (JASCO/VPFA/ONC)

The Strait of Georgia underwater listening station (ULS) was a collaborative project between the Vancouver Fraser Port Authority, Transport Canada, Ocean Networks Canada and JASCO Applied Sciences operational. The deployment aims to monitor noise along the northbound shipping lane en route to the ports along Vancouver’s coastline. Data from this hydrophone were formerly used by the ECHO Program for evaluating vessel noise emissions and marine mammal detections¹⁹.

Deployment

The Strait of Georgia ULS is situated on the seabed at approximately 170 m depth, near the northbound shipping lane in Boundary Pass. Synchronized data from four hydrophones were streamed to shore from September 23, 2015 to March 30, 2018. Data was streamed via the Victoria Experimental Network Under the Sea (VENUS), an observatory operated by Ocean Networks Canada. Data were sampled at 64 kHz (effective bandwidth 10 Hz to 32 kHz) until 2017, and at 128 kHz per second (effective bandwidth 10 Hz to 64 kHz).

Processing

For all deployments, likely marine mammal encounters were initially identified using custom detection and classification algorithms within PAMlab, a proprietary acoustics toolbox developed by JASCO Applied Science. Files that detected marine mammal signals were selected for human inspection and detailed annotation effort.

Annotation

Acoustic encounters identified with PAMlab were manually annotated by expert analysts for the presence of killer whale calls following the HALLO protocol (HALLO Annotation Guidelines) using Raven Pro v 1.5 classifying calls to call type level, where possible. Annotators changed spectrogram settings it increase the detectability of killer whale sounds but generally set spectrogram settings to FFT length of 2600 samples with 50% overlap (1300 samples), 20 dB amplification, 20 s timescale and 0 to 11 kHz frequency scale. Annotators also noted the presence of a variety of non-target calls and abiotic sounds including unknown signals, background noise, fish, sonar, and potential Pacific-white-sided dolphins. The audio files have been fully annotated for the presence of killer whale pulsed calls and whistles. All other annotation including killer whale clicks, abiotic sounds, and other biological sounds have been weakly annotated. Files are annotated to the call level.

SMRU Consulting (SMRU)

SMRU Consulting in collaboration with the Whale Museum on San Juan Island have maintained a cabled hydrophone within SRKW core habitat for two decades. These data have also been used in evaluating the potential benefits of voluntary ship slowdowns¹⁹. Data are routinely evaluated for the presence of killer whales and humpback whales. The hydrophone location is also within visual range of the Lime Kiln Lighthouse which houses volunteers trained for whale and dolphin identification. Data for the DCLDE coincided with periods of visually confirmed killer whale and humpback whale presence around the Lime Kiln hydrophone.

Deployment

The recording setup consists of a cabled Reson TC4032 hydrophone ~70 m from shore mounted to the seafloor at 23 m depth. Data were collected across several sequential deployments between November 6^th, 2016 and September 13^th, 2020. Data were digitized at 250 kHz sample rate, 16-bit resolution using a SMRU Consulting data acquisition board, recorded as.wav files and uploaded to a cloud-based system for long-term storage.

Processing

Audio data from the Lime Kiln hydrophone were processed for the presence of biological sounds with the PAMGuard whistle and moan detector⁵⁵.

Annotation

Acoustic encounters identified with PAMguard were manually annotated by expert analysts for the presence of killer whale calls following the HALLO protocol (HALLO Annotation Guidelines) using Raven Pro v 1.5. classifying calls to call type level, where possible. Annotators adjusted spectrogram settings to better identify calls from background noise but generally used a 2600-sample FFT with 50% overlap (1300 samples), 20 dB amplification, 20 s timescale and 0 to 11 kHz frequency scale. The audio files have been fully annotated for the presence of killer whale pulsed calls and whistles. All other annotation including killer whale clicks, abiotic sounds, and other biological sounds have been weakly annotated. Files are annotated to the call level.

Ocean Networks Canada (ONC)

Ocean Networks Canada (ONC) contributed data from an underwater observatory in Canadian waters. The ONC observatory collects continuous oceanographic data for the benefit of science, society, and industry. The observatory nodes are equipped with calibrated hydrophones⁵⁶ to record long term data on changing ocean soundscapes, supporting research on noise and soniferous animals. Calibration information and other metadata are available on the Ocean Data Portal. Audio files from this provider are collated from three publicly available datasets (https://doi.org/10.34943/d644336d-eb3e-4bf0-b2ef-0cdf3d8bd0db, https://doi.org/10.34943/e03fd4fb-3029-4a40-9174-0bd3e4d99276, https://doi.org/10.34943/7bef925c-de7e-4e31-80d6-a78c71f9aec5).

Deployment

Acoustic data were collected using an Ocean Sonics SC2 recording system deployed on the Barkley Canyon Upper Slope platform of ONC’s Northeast Pacific Time-series Underwater Networked Experiments observatory. The hydrophone was mounted 1 m above the sea floor at 168 m depth. Acoustic data were sampled at 64 kHz. Note that these data contain broadband clicks every 2 seconds produced by an Acoustic Doppler Current Profiler deployed 70 m away. Continuous noise at 12.5 kHz and 25 kHz from the power supply is also present. Audio files were recorded at 64 khz with 24 bit resolution. A 25.6 kHz anti-aliasing filter was applied during data collection and digitization yielding reduced apparent sound intensities above 25.6 kHz.

Processing

No automatic processing software was used to identify acoustic encounters.

Annotation

PAMlab was used to manually identify marine mammal encounters and create bounding boxes around signals of interest. Spectrogram settings were allowed to vary to increase visibility but generally a Hamming window, 16.25 ms time step, 31.25 ms frame length, yielding a frequency resolution of 32 Hz. Audio data were completely annotated for the presence of killer whale pulsed, buzzes, and whistles. Data were incompletely annotated for the presence of killer whale echolocation clicks, humpback whales, and dolphin. All dolphin annotations in this deatset were ‘Lb|Lo’ indicating uncertainty between Pacific right whale dolphin (Lissodelphis borealis) and white-sided dolphin (Lagenorhynchus obliquidens) whose calls are not acoustically differentiable. All dolphin annotations are labeled as ‘UndBio’ in the ClassSpecies column of the in the combined annotations.csv file.

The ‘Call.Type’ column in the original data contains information about the call type category. For killer whales this includes clicks (‘CK’), whistles (‘W’), and pulsed calls (BP). For humpback whales this includes song (S) and non-song (NS) communication. Note that call type here refers to the category of call rather than the catalogue of SRKW calls. No uncertain killer whale calls were included in these annotations. No uncertain marine mammal calls were included in this dataset.

Orcasound

Orcasound is a cooperative hydrophone network and an open-source software and hardware project. Orcasound audio and annotations were compiled from multiple recording efforts spanning from 2017 to 2020, from low-cost hydrophones. This public dataset includes nine labeling efforts with the ‘Pod.Cast’ annotation tool, an open-source web app developed by Microsoft Hackathon volunteers to efficiently analyze audio data to detect the presence of killer whale calls. Original audio recordings and annotations are accessible via Orcasound’s open labeled data bucket. The dataset is organized into annotation rounds that used audio data from various Orcasound locations with a range of signal-to-noise ratios for SRKW calls and background noise characteristics. Full details of Orcasound data are available on the GitHub repositories for these projects.

Deployment

The Orcasound data were gathered from three shallow (<10 m at low tide) sites in Washington State, USA. the Orcasound Lab on San Juan Island tested a wide variety of hydrophone elements, including HTI 99-MIN, Aquarian AS-1, and ITC1032 models between September 27^th, 2017 and September 7^th, 2020. The Bush Point on Whidbey Island deployed a single CRT26-08 between September 7^th 2020 and October 19^th 2020. At Port Townsend, LabCore-40 or CRT26-08 elements were deployed for one month starting September 8^th, 2020. All hydrophones were deployed using bespoke, affordable live-streaming equipment (Raspberry Pi with the Pisound ADC HAT [24-bit resolution, stereo, at multiple sampling rates; max 192 kHz]) and the open-source code that generates compressed, lossy audio segments in HLS format and uploads it to an open S3 bucket sponsored by Amazon. Hydrophones and recording systems for these projects have not been calibrated.

Processing

Audio data were collected and processed in a variety of formats. Data from Orcasound locations with ‘OS’ or ‘1562344334 ‘prefix were sampled at 20 kHz and a 9 kHz respectively and lowpass was applied. Files from the Orcasound lab with an ‘rpi’ prefix were recorded in stereo and sampled at 48 kHz with a steep 16.5 kHz lowpass filter. The Bush Point and Port Townsend data were similarly recorded at 48 kHz with the same 16.5 kHz lowpass filter although the Port Townsend data were recorded in stereo.

Annotation

Candidate periods of SRKW activity were initially flagged by citizen scientists using live-streamed audio, then reviewed and annotated by expert analysts. The dataset consists of nine ‘Rounds’ from the Orcasound Pod.Cast project, where SRKW presence was pre-labeled using a high-recall classifier and crowd-sourced validation. Annotations are ‘SRKW’ (with start/end times) or ‘non-SRKW’ (labelled ‘Abiotic,’ without time/frequency boundaries). Non-SRKW labels may include ship noise or other sounds but were not further validated. FFT parameters used to create bounding boxes were not retained as they varied across deployments. No uncertain killer whale calls were included in this dataset. Users seeking to include these data in anthropogenic sound detectors should therefore validate these data further or exclude from the final dataset. Orcasound files are strongly annotated for killer whale pulsed calls and weakly annotated for other signals.

Scripps Institute of Oceanography (SIO)

Audio data were collected at two locations off the coast of Washington State as part of a long-term monitoring project between 2008 and 2012. Recordings were made using high-frequency acoustic recording packages (HARPs), autonomous underwater systems designed for long-term passive acoustic monitoring⁵⁷. These data consist of encounters included in previously published work^25,58.

Deployment

Two HARPs were deployed: one in 100 m depth nearshore (Cape Elizabeth) between June 17^th, 2008 and January 17^th 2012, and one in 1400 m depth at an offshore (Quinault Canyon) location between January 27^th, 2011 and June 30^th, 2013. The HARPs sampled continuously at 200 kHz with 16-bit resolution. Data from this project represents the most southerly locations as well as the deepest deployment.

Processing

Original pulsed annotations described in Rice et al.²⁵ were identified using Triton software click detection algorithms. Files containing known killer whale calls were assigned to ecotype based on distinct tonal signals associated with known pods off Washington State.

Annotation

Files containing killer whale calls were completely annotated for the presence of pulsed calls, and weakly annotated for whistles. Humpback whale calls were added opportunistically and examples of self-noise, tagged as abiotic signals, were included as these signals show structural similarities to biological signals. All annotations were created using Raven Pro v1.6 (FFT: 19.2 ms, 22.4 Hz). Only calls that could be confidently identified as killer whales or humpback whales are included in the annotation files. Killer whale ecotype classes were defined off the original encounter labels⁵⁸. Though present in the encounters, echolocation clicks were not labeled during the annotation effort.

Saturna Island Marine Research and Education Society (SIMRES)

SIMRES maintains hydrophones within Boundary Pass as part of the BC Hydrophone Network. This network collaborates to enable quantification and monitoring of the ocean soundscape within SRKW habitat. Data were collected from two hydrophones located near the eastern peninsula of Saturna Island (East Point). Data and annotations represent periods when SRKW were both acoustically and visually detected within a few kilometers of the hydrophones.

Deployment

Acoustic recordings were made with an Ocean Sonic’s icListen high-frequency smart hydrophone (RB9-ETH) cabled to a shore station. The hydrophone is deployed near a commercial shipping lane in Boundary Pass and is approximately 120 m from shore at 18 m depth. Recordings were made June through October 2022. Data are continuously sampled at 128 kHz with 24-bit resolution but were decimated to 64 kHz in the files provided.

Processing

Audio files were decimated to 55 kHz and used a high pass filter with reduced apparent sound intensities above 50 kHz. Audio data were not pre-processed with any detection algorithms for this study.

Annotation

SRKW communication signals were annotated in Raven Pro v 1.6 with 2048 sample FFT length and 50% overlap yielding a time and frequency resolution of 8 ms and 62.5 Hz. Bounding boxes demarcated the start and end time of the signal as well as the low and high frequency boundaries. Data were strongly annotated for the presence of killer whale pulsed calls, whistles, buzzes, and rasps. When possible, pulsed calls were further classified into the specific call types⁴⁴. Data were weakly annotated for clicks with a single example marked in each audio where present. Annotated signals were assigned a confidence rating of either ‘low’, medium’, or ‘high’ to specify the level of certainty provided by the annotator. All killer whale annotations were included in the combined annotation dataset regardless of quality. Annotations with ‘low’ confidence scores were labeled as ‘Uncertain” or 0 in the KW_certain column of the combined Annotations.csv.

Fisheries & Oceans Canada (DFO)

Two groups within Department of Fisheries and Oceans Canada (DFO) provided datasets to the DCLDE challenge; the Cetacean Research Program and the Whale Detection and Localization Program. Data processing methods were consistent across projects within each lab but varied slightly between labs. Exact hydrophone locations are not publicly available for any DFO hydrophone dataset. Instead, general location descriptors are provided (Fig. 1). The focus of the original analysis effort that resulted in these datasets was simply to identify which of the recording files contained killer whales calls for use in various habitat studies. The two DFO datasets are discussed below. Data from the DFO providers represent the only Northern Resident Killer Whale annotations and recordings in this dataset.

DFO Cetacean Research Program (DFO CRP)

Data from the Cetacean Research Program (DFO CRP) lab consisted of two deployments, one was an AURAL-M2 deployed on the continental shelf edge off the west coast of Vancouver Island and another from a Sound Metrics SM2M hydrophone deployed off northern coastal British Columbia. Data include 375 days between May 18^th, 2011 and May 24^th, 2012 for west Vancouver Island, and 116 days between October 18^th, 2013 and February 3^rd, 2014 for recordings from northern coastal BC. Both hydrophones sampled at 16.384 kHz.

Deployment

Data were collected using two different hydrophones: an AURAL-M2 was moored at 114 m depth off the west coast of Vancouver Island and sampled audio at 16.4 kHz; an SM2M was moored at 35 m depth off the Northern mainland coast of BC and sampled audio at 16 kHz with 16-bit resolution. Exact locations were not made available for this competition.

Processing

The raw audio recordings (.wav) were processed using the whistle and moan detector in PAMGuard version 1.12.08²⁸ with an FFT length of 512 samples, and 50% overlap (256 samples). The detector was configured with a high-pass filter of 800 Hz to limit the number of humpback whale detections and lessen the manual validation burden. The signal-to-noise detection threshold was set to 6 dB. All detections in the first two seconds of each five-minute file were excluded because the detection algorithm produced false detections within this period.

Annotation

All detections including whistles and pulsed calls were aurally and visually reviewed by expert annotators using PAMGuard and identified to species (for biotic) and sound type (for abiotic) using the same spectrogram settings as the whistle and moan detector. Where applicable and as time allowed, detections were identified to species or ecotype level depending on the clarity of the call or acoustic encounter. In post processing, overlapping detection boundaries (start time, end time, low frequency, high frequency) were merged based on timing and annotation labels. The original annotations files representing un-merged detections are provided in the ‘original’ subfolder in the DFO_CRP Annotations folder. Individual detections identified by PAMGuard may represent separate components of the same call (i.e. harmonics or sidebands), thus, not every detection represents a unique vocalization. The PAMGuard Whistle and Moan detector detects individual contours, so all individual harmonics within a call may constitute separate detections if they meet the detector’s criteria (this happens quite frequently).

DFO Whale Detection and Localization Program (DFO WDLP)

DFO’s Whale Detection and Localization Program (DFO WDLP) worked in collaboration with DFO’s Acoustics Program to provide data from four deployment locations in Canadian waters. These included Carmanah Point, Swanson Channel, and two locations in the Strait of Georgia (SOG North and SOG South where north and south are in relation to each other).

Deployment

Four locations were chosen for the study area: Carmanah Point, Swanson Channel, and the southern region of the Canadian waters of the Strait of Georgia, SOG North and SOG South. As with all DFO data, the exact locations are not publicly available. A SoundTrap ST600 HF was used at Carmanah Point, while AMAR G4 hydrophones were used at the three other locations. There was one deployment at each of Carmanah and Swanson Channel, lasting between 3 and 5 months from September 2021 through June 2022. There were two deployments at each of SOG North and SOG South, each deployment lasted between 1 day, and 4 weeks. Audio data were continuously sampled at either 192 kHz for the SoundTrap or 256 kHz for the AMARs. AMARs had 24-bit resolution.

Processing

Audio recordings were processed with the Whistle and Moan Detector in PAMGuard version 2.02.03²⁸ for the presence of potential killer whale calls. Audio files were decimated within PAMGuard to 48 kHz, and a weak IIR Butterworth high-pass filter with a threshold of 2 kHz and an order of 1 was applied to reduce background noise in the lower frequency bands. The SNR detection threshold was set to 8 dB. Nominal sensitivities of −164.1 dB and −176.2 dB were used for the AMARs and SoundTrap, respectively. The Whistle and Moan Detector used a minimum frequency threshold of 200 Hz, a maximum frequency threshold of 24 kHz (the Nyquist frequency), and a minimum contour length of 15 time-slices (about 341 ms); otherwise, all other detection settings were kept at their defaults. In the detector’s noise and thresholding tab, all boxes on PAMGuard dashboard were selected except “Run Gaussian Kernel Smoothing” and any input values were kept at their default values as well. The FFT engine used with the detector selected an FFT length of 2048, a hop size of 1024, and a Hann window function with the same noise parameters as those used with the detector.

Annotation

All detections produced by the Whistle and Moan Detector were evaluated for the presence of killer whales and annotated as such using a custom PAMGuard plugin using the same spectrogram settings as the whistle and moan detector. Detected sounds included whistles and pulsed calls; echolocation clicks were not included as they typically do not trigger the detector due to their short length. As with the Cetacean Research Program’s dataset, detections were merged in post-processing to reduce the number of duplicated annotations. The original annotations files representing un-merged detections are provided in the ‘original’ subfolder in the DFO_WLDP Annotations folder. Killer whale calls not detected by the PAMGuard whistle and moan detector were not added to the annotations. Therefore, while these data likely contain the majority of the visually or aurally identifiable calls, the annotation labels are considered weak for all species. No effort was made to annotate killer whale clicks.

University of Alaska Fairbanks and North Gulf Oceanic Society (UAF)

Data contributed by the University of Alaska Fairbanks and North Gulf Oceanic Society are part of a long-term killer whale monitoring project in the Gulf of Alaska (Fig. 2). This includes recordings of the Southern Alaska Resident, Gulf of Alaska Transients, AT1 Transients, and Offshore killer whales from both stationary moorings and focal follows²³. Transient and Offshore killer whales were rarely encountered during vessel surveys, and Transient killer whales vocalize less often than Residents^59,60 making field recordings of these ecotypes difficult to obtain. We therefore also contributed killer whale recordings from moored hydrophones in the region, on which we detected Gulf of Alaska Transients, AT1 Transients, or Offshore killer whales. The metadata folder associated with these data contains three files. The Myers_DCLDE_2026_files.xls file was used to relate filenames, ecotypes, and locations in the original annotation files to the final annotations. It contains three headings, Filename, Ecotype, Population (i.e. community), Location, and recording time in UTC. Filename refers to Soundtrap audio file names containing the start time, UTC is the corrected start time. Location values are abbreviations for Hinchinbrook Entrance (HE), Kachemak Bay (KE), Montague Strait (MS), Resurrection Bay (RS). These represent fixed hydrophone locations. Location values for the focal fallows are labeled ‘field’ in the location column. Audio files are organized according to the instrument name, or ‘field’ for field recordings. Metadata for fixed instrument locations is contained in an external file (Hydrophone locations.xls). Information pertaining to the field recordings is in the attached report “20120114-N_Matkin_FY20_Annual_Report.pdf”. GPS tracks from the focal follows are not provided.

Deployment

Focal follows

Recordings of southern Alaska Residents were taken with a dipping hydrophone during vessel survey encounters in Prince William Sound and Kenai Fjords (Fig. 1) between May and October in 2019, 2020, and 2021. When killer whales were encountered, we photographically identified as many individuals present as possible. We then maneuvered the vessel approximately 500 m in front of the animals, shut off the engine, and collected a field recording. Recordings before June 16^th, 2021 were made with a High-Tech, Inc. HTI-96-Min hydrophone deployed at approximately 8 to 10 m depth with a two channel TASCAM DR100 portable digital recorder (sampling rate 24 kHz). Only the first channel was used. Recordings after June 16^th, 2021 were made with an Ocean Instrument’s SoundTrap ST300 hydrophone (sampling rate 24 kHz 16 bit resolution) deployed at 20–30 m depth (Table 1).

Moored hydrophones

Moored hydrophones were deployed in Hinchinbrook Entrance, Kachemak Bay Montague Strait, and Resurrection Bay (Table 1). All moored hydrophones were Ocean Instruments SoundTrap ST300s, except for the hydrophone in Montague Strait in 2023 which was a model ST600. Hydrophones were deployed at depths of 25 – 42 m on primarily gravel and sand substrate and moored approximately 2 m above the seafloor. Hinchenbrook Entrace included one 2-week deployment, Kachemak Bay included short(1-day) and long deployments (6 month) deployments, between 2020 and 2022. Montague Strait included three deployments lasting between 1-day and 8.5 months between 2019 and 2023. Resurrection Bay had a single 3 month deployment between November 21^st 2019 and February 23^rd, 2020. All moored hydrophones recorded at a 24 kHz sampling rate with 16-bit resolution and were duty cycled (primarily 5 min on, 10 min off) based on battery requirements.

Processing

All acoustic data from moored hydrophones were processed using the whistle and moan detector in the open-source software package PAMGuard v.1.15.17⁵⁵. Spectrograms were created with a 1024 sample FFT with 50% overlap. The Whistle and Moan detector identified tonal signals in the 700 to 12,000 Hz frequency range with a minimum length of 15 time slices, minimum size of 30 pixels that met an 8 dB signal-to-noise ratio threshold.

Annotation

In recordings with killer whales, discrete pulsed calls were manually annotated by an expert acoustician in Raven Pro v.1.6.5 using similar spectrogram settings as the PAMGuard analysis. Bounding boxes were drawn around each call, noting the call start time, end time, low frequency, high frequency, and call length in selection tables for each audio file. Spectrogram settings were varied to allow for increased resolution across the annotation process. Recordings with at least three PAMGuard detections were visually and aurally checked by an expert acoustician and classified to the ecotype and/or community level. Gulf of Alaska Transients and AT1 Transients were identified using published call catalogues⁶⁰. Offshore killer whale detections were confirmed by J. Pilkington. A minority of recordings were excluded if they included multiple killer whale ecotypes or killer whale and humpback whale vocalizations in the same recording. Annotations for killer whale pulsed calls are strong and other signals should be considered weak.

Technical Validation

All annotations were created by expert analysts at their respective institutes based on a canonical catalogue of killer whale calls⁴⁴ and experience. As with all biological signals, the sound quality of the killer whale vocalizations and certainty of the classification varies considerably based on background noise, distance between the animal and the hydrophone, and propagation conditions. Furthermore, despite manual validation of audio files representing the ‘gold-standard’ in bioacoustics inter-observer variation is common⁶¹.

Annotations are comprehensive but are not intended to be exhaustive. In no project was there a concerted effort to label every potential call and signal type in each audio file.

This dataset was collated for the purpose of building killer whale detection and ecotype classification algorithms. In doing so, users should consider both their intended applications and potential limitations. For instance, users will immediately note that sample rates and filter settings differ considerably between contributed datasets. Much of the effort in classifying killer whale ecotypes, communities, and clans has utilized lower frequency sound < 12 kHz²⁶. However, as seen in this dataset, killer whale vocalizations may have fundamental frequencies at or above 20 kHz. Whether or not the features present at higher frequencies represent useful information for discrimination is yet to be determined. Standardizing the data through decimating higher frequency audio will likely result in the loss of clicks, and harmonic structure that may be informative to classification algorithms. Conversely, standardizing the data through excluding audio files sampled at lower rates will greatly restrict the available data rendering it less ecologically representative.

No efforts have been made to normalize audio files across the providers to account for different gain and calibration settings between the various instruments and individual project goals. Where these data are available, they are provided in the ‘meta’ folder for each provider.

The data presented here represent a variety of annotation levels across projects. Efforts have been made to produce a collated dataset representing the lowest common denominator of these categories (ecotype). However, all killer whales produce a variety of calls and only a portion of those calls are known to be indicative of the different ecotypes or pods^43,44. The HALLO annotation protocol asked experts to identify both the pod (SRKW) and stereotyped call type³². Therefore, these annotations may be of use in appropriately augmenting datasets to balance not only ecotypes but stereotyped calls. Similarly, echolocation clicks in the sound files have not been consistently annotated but are included in the collated annotation csv file. As echolocation clicks can be diagnostic of species and potentially ecotype⁵⁸, further annotation of this dataset could feed into training or validation based on echolocation characteristics.

Data for this project represents a large collaboration of groups and institutions and each dataset was processed in accordance with each group’s project goals which ranged from species presence/absence to call-type production. Post processing of the annotations was done to provide a uniform resource for machine learning algorithms. However, users should consider details from each deployment carefully to determine whether they wish to do any additional post-processing. For example, multiple annotations from the DFO datasets may represent different harmonics of the same call. Alternatively, data derived from ONC projects considered only pulsed calls. Thus, unannotated whistles and echolocation clicks may be present in some files. See individual datasets above for details.

Users of these data should also note that all annotations were done by independent experts. As with all bioacoustics annotations, inter-observer variation will be present^{62,63,64,65,66,67}.

Code availability

The R code used to collate data and annotations is available here: https://doi.org/10.5281/zenodo.15743033.

References

Barrett-Lennard, L. G. & Ellis, G. M. Population Structure and Genetic Variability in Northeastern Pacific Killer Whales: Towards an Assessment of Population Viability. Can. Sci. Advis. Secretaria, Research Document 2001/065 (2001).
Morin, P. A. et al. Revised taxonomy of eastern North Pacific killer whales (Orcinus orca): Bigg’s and resident ecotypes deserve species status. R. Soc. Open Sci. 11, 231368 (2024).
Article ADS PubMed PubMed Central Google Scholar
Baird, R. W. & Stacey, P. J. Variation in saddle patch pigmentation in populations of killer whales (Orcinus orca) from British Columbia, Alaska, and Washington State. Can. J. Zool. 66, 2582–2585 (1988).
Article Google Scholar
Balcomb, K. C. III & Bigg, M. A. Population biology of the three resident killer whale pods in Puget Sound and off southern Vancouver Island. Behav. Biol. Kill. Whales Alan R Liss N. Y. N. Y. 85–95 (1986).
Ford, J. K. et al. Dietary specialization in two sympatric populations of killer whales (Orcinus orca) in coastal British Columbia and adjacent waters. Can. J. Zool. 76, 1456–1471 (1998).
Article Google Scholar
Ford, J. K. B. & Ellis, G. M. You Are What You Eat: Foraging Specializations and Their Influence on the Social Organization and Behavior of Killer Whales. in Primates and Cetaceans: Field Research and Conservation of Complex Mammalian Societies (eds. Yamagiwa, J. & Karczmarski, L.) 75–98 https://doi.org/10.1007/978-4-431-54523-1_4 (Springer Japan, Tokyo, 2014).
Whitehead, H. & Ford, J. K. B. Consequences of culturally-driven ecological specialization: Killer whales and beyond. J. Theor. Biol. 456, 279–294 (2018).
Article ADS PubMed Google Scholar
Ford, J. K. B. Killer Whales: Behavior, Social Organization, and Ecology of the Oceans’ Apex Predators. in Ethology and Behavioral Ecology of Odontocetes (ed. Würsig, B.) 239–259. https://doi.org/10.1007/978-3-030-16663-2_11 (Springer International Publishing, Cham, 2019).
Barrett-Lennard, L. G. Population structure and mating patterns of Killer Whales (Orcinus orca) as revealed by DNA analysis. https://doi.org/10.14288/1.0099652 (University of British Columbia, 2000).
Sharpe, D. L., Castellote, M., Wade, P. R. & Cornick, L. A. Call types of Bigg’s killer whales (Orcinus orca) in western Alaska: using vocal dialects to assess population structure. Bioacoustics 28, 74–99 (2019).
Article Google Scholar
Ford, J. K. B., Stredulinsky, E. H., Ellis, G. M., Durban, J. W. & Pilkington, J. F. Offshore Killer Whales in Canadian Pacific Waters: Distribution, Seasonality, Foraging Ecology, Population Status and Potential for Recovery. DFO Can. Sci. Advis. Sec. Res. Doc. 2014/088. vii+55 p. https://publications.gc.ca/collections/collection_2014/mpo-dfo/Fs70-5-2014-088-eng.pdf (2014).
Fisheries and Oceans Canada. Action Plan for the Northern and Southern Resident Killer Whale (Orcinus Orca) in Canada. Species at Risk Act Action Plan Series. Fisheries and Oceans Canada, Ottawa. v + 33 pp. https://www.registrelep-sararegistry.gc.ca/virtual_sara/files/plans/Ap-ResidentKillerWhale-v00-2017Mar-Eng.pdf (2017).
NOAA. Alaska Marine Mammal Stock Assessments, 2022. KILLER WHALE (Orcinus Orca): Eastern North Pacific Alaska Resident Stock. (2023).
Lacy, R. C. et al. Evaluating anthropogenic threats to endangered killer whales to inform effective recovery plans. Sci. Rep. 7, 14119 (2017).
Article ADS PubMed PubMed Central Google Scholar
Burnham, R. E. & Vagle, S. Interference of Communication and Echolocation of Southern Resident Killer Whales. in The Effects of Noise on Aquatic Life: Principles and Practical Considerations (eds. Popper, A. N., Sisneros, J., Hawkins, A. D. & Thomsen, F.) 1–14. https://doi.org/10.1007/978-3-031-10417-6_22-1 (Springer International Publishing, Cham, 2023).
Stewart, J. D. et al. Traditional summer habitat use by Southern Resident killer whales in the Salish Sea is linked to Fraser River Chinook salmon returns. Mar. Mammal Sci. 39, 858–875 (2023).
Article Google Scholar
Veirs, S., Veirs, V. & Wood, J. D. Ship noise extends to frequencies used for echolocation by endangered killer whales. PeerJ 4, e1657 (2016).
Article PubMed PubMed Central Google Scholar
Williams, R. et al. Warning sign of an accelerating decline in critically endangered killer whales (Orcinus orca). Commun. Earth Environ. 5, 1–9 (2024).
Article Google Scholar
Joy, R. et al. Potential Benefits of Vessel Slowdowns on Endangered Southern Resident Killer Whales. Front. Mar. Sci. 6 (2019).
Thornton, S. J. et al. Areas of Elevated Risk for Vessel-Related Physical and Acoustic Impacts in Southern Resident Killer Whale (Orcinus Orca) Critical Habitat. DFO Can. Sci. Advis. Sec. Res. Doc. 2022/058, vi+48 p. (2022).
Ford, J. K. B. et al. Habitats of Special Importance to Resident Killer Whales (Orcinus orca) off the West Coast of Canada. DFO Can. Sci. Advis. Sec. Res. Doc. 2017/035, viii+57 p. (2017).
Burham, R. E., Palm, R. S., Duffus, D. A., Mouy, X. & Riera, A. The combined use of visual and acoustic data collection techniques for winter killer whale (Orcinus orca) observations. Glob. Ecol. Conserv. 8, 24–30 (2016).
Google Scholar
Myers, H. J., Olsen, D. W., Matkin, C. O., Horstmann, L. A. & Konar, B. Passive acoustic monitoring of killer whales (Orcinus orca) reveals year-round distribution and residency patterns in the Gulf of Alaska. Sci. Rep. 11, 20284 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Pilkington, J. F. et al. Patterns of winter occurrence of three sympatric killer whale populations off eastern Vancouver Island, Canada, based on passive acoustic monitoring. Front. Mar. Sci. 10 (2023).
Rice, A. et al. Spatial and temporal occurrence of killer whale ecotypes off the outer coast of Washington State, USA. Mar. Ecol. Prog. Ser. 572, 255–268 (2017).
Article ADS Google Scholar
Riera, A., Pilkington, J. F., Ford, J. K. B., Stredulinsky, E. H. & Chapman, N. R. Passive acoustic monitoring off Vancouver Island reveals extensive use by at-risk Resident killer whale (Orcinus orca) populations. Endanger. Species Res. 39, 221–234 (2019).
Article Google Scholar
Trounce, K. et al. The effects of vessel slowdowns on foraging habitat of the southern resident killer whales. Proc. Meet. Acoust. 37, 070009 (2020).
Google Scholar
Gillespie, D., Caillat, M., Gordon, J. & White, P. Automatic detection and classification of odontocete whistlesa. J. Acoust. Soc. Am. 134, 2427–2437 (2013).
Article ADS PubMed Google Scholar
Helble, T. A., Ierley, G. R., D’Spain, G. L., Roch, M. A. & Hildebrand, J. A. A generalized power-law detection algorithm for humpback whale vocalizations. J. Acoust. Soc. Am. 131, 2682–2699 (2012).
Article ADS PubMed Google Scholar
Bergler, C. et al. ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning. Sci. Rep. 9, 10997 (2019).
Article ADS PubMed PubMed Central Google Scholar
Kirsebom, O. S. et al. MERIDIAN open-source software for deep learning-based acoustic data analysis. J. Acoust. Soc. Am. 151, A27 (2022).
Article Google Scholar
Ford, J. K. A Catalogue of Underwater Calls Produced by Killer Whales (Orcinus Orca) in British Columbia. DFO Can. Data Rep. Fish. Aquat. Sci. 633:165 p (1987).
Janik, V. M. Chapter 4 Acoustic Communication in Delphinids. in Advances in the Study of Behavior vol. 40 123–157 (Academic Press, 2009).
Shiu, Y. et al. Deep neural networks for automated detection of marine mammal species. Sci. Rep. 10, 607 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Au, W. W. L., Ford, J. K. B., Horne, J. K. & Allman, K. A. N. Echolocation signals of free-ranging killer whales (Orcinus orca) and modeling of foraging for chinook salmon (Oncorhynchus tshawytscha). J. Acoust. Soc. Am. 115, 901–909 (2004).
Article ADS PubMed Google Scholar
Barrett-lennard, L. G., Ford, J. K. B. & Heise, K. A. The mixed blessing of echolocation: differences in sonar use by fish-eating and mammal-eating killer whales. Anim. Behav. 51, 553–565 (1996).
Article Google Scholar
Aguilar de Soto, N. et al. No shallow talk: Cryptic strategy in the vocal communication of Blainville’s beaked whales. Mar. Mammal Sci. 28, E75–E92 (2012).
Article Google Scholar
Riesch, R., Ford, J. K. B. & Thomsen, F. Whistle sequences in wild killer whales (Orcinus orca). J. Acoust. Soc. Am. 124, 1822–1829 (2008).
Article ADS PubMed Google Scholar
Souhaut, M. & Shields, M. W. Stereotyped whistles in southern resident killer whales. PeerJ 9, e12085 (2021).
Article PubMed PubMed Central Google Scholar
Thomsen, F., Franck, D. & Ford, J. K. B. Characteristics of whistles from the acoustic repertoire of resident killer whales (Orcinus orca) off Vancouver Island, British Columbia. J. Acoust. Soc. Am. 109, 1240–1246 (2001).
Article ADS CAS PubMed Google Scholar
Miller, P. J. O. Diversity in sound pressure levels and estimated active space of resident killer whale vocalizations. J. Comp. Physiol. A 192, 449–459 (2006).
Article Google Scholar
Rehn, N., Teichert, S. & Thomsen, F. Structural and temporal emission patterns of variable pulsed calls in free-ranging killer whales (Orcinus orca). Behaviour 1, 307–329 (2007).
Deecke, V. B., Barrett-Lennard, L. G., Spong, P. & Ford, J. K. B. The structure of stereotyped calls reflects kinship and social affiliation in resident killer whales (Orcinus orca). Naturwissenschaften 97, 513–518 (2010).
Article ADS CAS PubMed Google Scholar
Ford, J. K. B. Vocal traditions among resident killer whales (Orcinus orca) in coastal waters of British Columbia. Can. J. Zool. 69, 1454–1483 (1991).
Article Google Scholar
Yurk, H., Barrett-Lennard, L., Ford, J. K. B. & Matkin, C. O. Cultural transmission within maternal lineages: vocal clans in resident killer whales in southern Alaska. Anim. Behav. 63, 1103–1119 (2002).
Article Google Scholar
Filatova, O. A. et al. Killer whale call frequency is similar across the oceans, but varies across sympatric ecotypes. J. Acoust. Soc. Am. 138, 251–257 (2015).
Article ADS PubMed Google Scholar
Foote, A. D. & Nystuen, J. A. Variation in call pitch among killer whale ecotypes. J. Acoust. Soc. Am. 123, 1747–1752 (2008).
Article ADS PubMed Google Scholar
Madrigal, B. C., Crance, J. L., Berchok, C. L. & Stimpert, A. K. Call repertoire and inferred ecotype presence of killer whales (Orcinus orca) recorded in the southeastern Chukchi Sea. J. Acoust. Soc. Am. 150, 145–158 (2021).
Article ADS PubMed Google Scholar
Gudivada, V. N., Apon, A. & Ding, J. Data Quality Considerations for Big Data and Machine Learning: Going Beyond Data Cleaning and Transformations. (2017).
Priestley, M., O’donnell, F. & Simperl, E. A Survey of Data Quality Requirements That Matter in ML Development Pipelines. J Data Inf. Qual. 15, 11:1–11:39 (2023).
Google Scholar
Brookes, K. L., Bailey, H. & Thompson, P. M. Predictions from harbor porpoise habitat association models are confirmed by long-term passive acoustic monitoringa. J. Acoust. Soc. Am. 134, 2523–2533 (2013).
Article ADS PubMed Google Scholar
Kotila, M. et al. Large-scale long-term passive-acoustic monitoring reveals spatio-temporal activity patterns of boreal bats. Ecography 2023, e06617 (2023).
Article Google Scholar
Parijs, S. M. V. et al. Management and research applications of real-time and archival passive acoustic sensors over varying temporal and spatial scales. Mar. Ecol. Prog. Ser. 395, 21–36 (2009).
Article ADS Google Scholar
Palmer, K. & Joy, R. DCLDE 2026: Killer whale (Orcinus orca) ecotype and other species annotations for the Detection Classification Localization and Density Estimate (DCLDE)conference in 2026. NOAA National Centers for Environmental Information https://doi.org/10.25921/15EY-MH50 (2025).
Gillespie, D. et al. PAMGUARD: Semiautomated, open source software for real‐time acoustic detection and localization of cetaceans. J. Acoust. Soc. Am. 125, 2547 (2009).
Article ADS Google Scholar
Biffard, B., Morgan, M., Muzi, L., Dakin, T. & Buren, P. V. An Integrated Hydrophone Calibration System for Ocean Observing: ONC HydroCal. in OCEANS 2022, Hampton Roads 1–5, https://doi.org/10.1109/OCEANS47191.2022.9976955 (2022).
Wiggins, S. M. & Hildebrand, J. A. High-frequency Acoustic Recording Package (HARP) for broad-band, long-term marine mammal monitoring. in 2007 symposium on underwater technology and workshop on scientific use of submarine cables and related technologies 551–557 (IEEE, 2007).
Leu, A. A., Hildebrand, J. A., Rice, A., Baumann-Pickering, S. & Frasier, K. E. Echolocation click discrimination for three killer whale ecotypes in the Northeastern Pacific. J. Acoust. Soc. Am. 151, 3197–3206 (2022).
Article ADS PubMed Google Scholar
Deecke, V. B., Ford, J. K. B. & Slater, P. J. B. The vocal behaviour of mammal-eating killer whales: communicating with costly calls. Anim. Behav. 69, 395–405 (2005).
Article Google Scholar
Saulitis, E. L., Matkin, C. O. & Fay, F. H. Vocal repertoire and acoustic behavior of the isolated AT1 killer whale subpopulation in southern Alaska. Can. J. Zool. 83, 1015–1029 (2005).
Article Google Scholar
Leroy, E. C., Thomisch, K., Royer, J.-Y., Boebel, O. & Van Opzeeland, I. On the reliability of acoustic annotations and automatic detections of Antarctic blue whale calls under different acoustic conditions. J. Acoust. Soc. Am. 144, 740–754 (2018).
Article ADS PubMed Google Scholar
Martín-Morató, I. & Mesaros, A. What is the ground truth? Reliability of multi-annotator data for audio tagging. in 2021 29th European Signal Processing Conference (EUSIPCO) 76–80. https://doi.org/10.23919/EUSIPCO54536.2021.9616087 (2021).
Nguyen Hong Duc, P. et al. Assessing inter-annotator agreement from collaborative annotation campaign in marine bioacoustics. Ecol. Inform. 61, 101185 (2021).
Article Google Scholar
van Osta, J. M., Dreis, B., Meyer, E., Grogan, L. F. & Castley, J. G. An active learning framework and assessment of inter-annotator agreement facilitate automated recogniser development for vocalisations of a rare species, the southern black-throated finch (Poephila cincta cincta). Ecol. Inform. 77, 102233 (2023).
Article Google Scholar
Ocean Networks Canada Society. Upper Slope South Hydrophone Deployed 2013-05-11. Ocean Networks Canada Society https://doi.org/10.34943/D644336D-EB3E-4BF0-B2EF-0CDF3D8BD0DB (2013).
Ocean Networks Canada Society. Upper Slope South Hydrophone Deployed 2014-05-07. Ocean Networks Canada Society https://doi.org/10.34943/7BEF925C-DE7E-4E31-80D6-A78C71F9AEC5 (2014).
Ocean Networks Canada Society. Upper Slope South Hydrophone Deployed 2014-05-03. Ocean Networks Canada Society https://doi.org/10.34943/E03FD4FB-3029-4A40-9174-0BD3E4D99276 (2014).

Download references

Acknowledgements

Funding has been provided by the Canada Nature Fund for Aquatic Species at Risk to whom we are grateful. We are also grateful to Tom Denton who has facilitated storage on Google cloud as well as Charles Anderson and Carrie Wall-Bell of the National Centers for Environmental Data who have helped format these data for long term storage.

Author information

Authors and Affiliations

Department of Environmental Science, Simon Fraser University, Burnaby, BC, Canada
K. J. Palmer, Emma Cummings, Alex Harris, Lauren Laturnus, Olivia Murphy, Bruno Padovese, Harald Yurk & Ruth Joy
Dept of Mathematics & Statistics, Dalhousie University, Halifax, NS, Canada
Michael G. Dowd
Scripps Institution of Oceanography, La Jolla, California, USA
Kait Frasier
Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
Fabio Frazao & Oliver S. Kirsebom
Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada
April Houweling
JASCO Applied Sciences Ltd., Victoria, BC, Canada
April Houweling & Jen Wladichuk
Ocean Networks Canada, University of Victoria, Victoria, BC, Canada
Jasper Kanes
Open Ocean Robotics, Victoria, BC, Canada
Oliver S. Kirsebom
Lisa K Yang Center for Conservation Bioacoustics, Cornell University, Ithaca, NY, USA
Holger Klinck
Institute of Ocean Sciences, Fisheries and Oceans Canada, North Saanich, BC, Canada
Holly LeBlond
North Gulf Oceanic Society, Homer, Alaska, USA
Craig Matkin & Dan Olsen
Marine Mammal Institute, Oregon State University, Newport, Oregon, USA
Hannah Myers
College of Fisheries and Ocean Sciences, University of Alaska Fairbanks, Fairbanks, Alaska, USA
Hannah Myers
Pacific Science Enterprise Centre, Fisheries and Oceans Canada, West Vancouver, BC, Canada
Caitlin O’Neill, Lucy Quayle, Svein Vagle & Harald Yurk
Pacific Biological Station, Fisheries and Oceans Canada, Nanaimo, BC, Canada
James Pilkington
Department of Biology, University of Victoria, Victoria, BC, Canada
Amalis Riera Vuibert
Vancouver Fraser Port Authority, Vancouver, BC, Canada
Krista Trounce
Beam Reach, Seattle, Washington, USA
Scott Veirs & Val Veirs
SMRU Consulting, Friday Harbor, Washington, USA
Jason Wood & Tina Yack

Authors

K. J. Palmer
View author publications
Search author on:PubMed Google Scholar
Emma Cummings
View author publications
Search author on:PubMed Google Scholar
Michael G. Dowd
View author publications
Search author on:PubMed Google Scholar
Kait Frasier
View author publications
Search author on:PubMed Google Scholar
Fabio Frazao
View author publications
Search author on:PubMed Google Scholar
Alex Harris
View author publications
Search author on:PubMed Google Scholar
April Houweling
View author publications
Search author on:PubMed Google Scholar
Jasper Kanes
View author publications
Search author on:PubMed Google Scholar
Oliver S. Kirsebom
View author publications
Search author on:PubMed Google Scholar
Holger Klinck
View author publications
Search author on:PubMed Google Scholar
Holly LeBlond
View author publications
Search author on:PubMed Google Scholar
Lauren Laturnus
View author publications
Search author on:PubMed Google Scholar
Craig Matkin
View author publications
Search author on:PubMed Google Scholar
Olivia Murphy
View author publications
Search author on:PubMed Google Scholar
Hannah Myers
View author publications
Search author on:PubMed Google Scholar
Dan Olsen
View author publications
Search author on:PubMed Google Scholar
Caitlin O’Neill
View author publications
Search author on:PubMed Google Scholar
Bruno Padovese
View author publications
Search author on:PubMed Google Scholar
James Pilkington
View author publications
Search author on:PubMed Google Scholar
Lucy Quayle
View author publications
Search author on:PubMed Google Scholar
Amalis Riera Vuibert
View author publications
Search author on:PubMed Google Scholar
Krista Trounce
View author publications
Search author on:PubMed Google Scholar
Svein Vagle
View author publications
Search author on:PubMed Google Scholar
Scott Veirs
View author publications
Search author on:PubMed Google Scholar
Val Veirs
View author publications
Search author on:PubMed Google Scholar
Jen Wladichuk
View author publications
Search author on:PubMed Google Scholar
Jason Wood
View author publications
Search author on:PubMed Google Scholar
Tina Yack
View author publications
Search author on:PubMed Google Scholar
Harald Yurk
View author publications
Search author on:PubMed Google Scholar
Ruth Joy
View author publications
Search author on:PubMed Google Scholar

Contributions

K.J. Palmer collated the final annotations dataset, managed data sharing agreements, produced the collated annotation files, drafted, edited and reviewed the manuscript and annotated the SCRIPPS dataset for the presence of killer whales and humpback whales. Emma Cummings and Alex Harris were expert annotators for the JASCO and Vancouver Port Authority datasets. Kait Frasier provided data and annotations from Scripps Institution of Oceanography and participated in editing the manuscript. Mike Dowd, Fabio Frazao and Bruno Padovese contributed to the HALLO annotation protocol and participated in data curation throughout the process. April Houweling and Jennifer Wladichuk provided the JASCO data and were expert annotators. Additionally, they participated in writing and editing the manuscript. Jasper Kanes provided data from Ocean Networks Canada and served as expert annotator for these data sets. Oliver S. Kirsebom provided editorial feedback and was involved in the initial inception of the project. Holger Klink was involved in the project inception provided financial support and editorial input. Holly Leblond, Lucy Quayle, Caitlin O’Neill, Svein Vagle, and Harald Yurk provided data from the DFO Whale and Dolphin Detection and Localization group. Lucy Quayle served as the expert annotater. Lauren Laturnus served as an expert annotator on the SIMRES data, provided editorial feedback to the manuscript and created Fig. 1. Olivia Murphy served as an expert annotator on the SIMRES data. Hannah Myers, Craig Matkin, and Dan Olsen provided the University of Alaska Fairbanks data. Additionally, Hannah Meyers served as expert annotator on the University of Alaska Fairbanks data and provided editorial feedback for the manuscript. J. Pilkington provided data and annotations from the Department of Fisheries and Oceans Cetacean Research Program, participated in writing and editing the manuscript. Amalis Riera Vuibert served as an expert annotator on the SMRU data. Krista Trounce provided data from the Vancouver Fraser Port Authority and provided editorial feedback on the manuscript. Scott and Val Veirs provided data from Orcasound hydrophones, served as expert annotaters and provided critical feedback to the manuscript. Jason Wood provided data from SMRU Consulting and provided editorial feedback on the manuscript. Ruth Joy provided financial support of the project, critical feedback at all levels, and assisted in the writing process.

Corresponding author

Correspondence to K. J. Palmer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Palmer, K.J., Cummings, E., Dowd, M.G. et al. A Public Dataset of Annotated Orcinus orca Acoustic Signals for Detection and Ecotype Classification. Sci Data 12, 1137 (2025). https://doi.org/10.1038/s41597-025-05281-5

Download citation

Received: 19 December 2024
Accepted: 27 May 2025
Published: 03 July 2025
Version of record: 03 July 2025
DOI: https://doi.org/10.1038/s41597-025-05281-5

Subjects

Abstract

Similar content being viewed by others

Killer whale call detection rates vary among subspecies and populations in the North Pacific

Passive acoustic monitoring of killer whales (Orcinus orca) reveals year-round distribution and residency patterns in the Gulf of Alaska

ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation

Background & Summary

Data Records

Methods

JASCO and the Vancouver Fraser Port Authority (JASCO/VPFA)

Deployment

Processing

Annotation

JASCO, Vancouver Fraser Port Authority, Ocean Networks Canada (JASCO/VPFA/ONC)

Deployment

Processing

Annotation

SMRU Consulting (SMRU)

Deployment

Processing

Annotation

Ocean Networks Canada (ONC)

Deployment

Processing

Annotation

Orcasound

Deployment

Processing

Annotation

Scripps Institute of Oceanography (SIO)

Deployment

Processing

Annotation

Saturna Island Marine Research and Education Society (SIMRES)

Deployment

Processing

Annotation

Fisheries & Oceans Canada (DFO)

DFO Cetacean Research Program (DFO CRP)

Deployment

Processing

Annotation

DFO Whale Detection and Localization Program (DFO WDLP)

Deployment

Processing

Annotation

University of Alaska Fairbanks and North Gulf Oceanic Society (UAF)

Deployment

Focal follows

Moored hydrophones

Processing

Annotation

Technical Validation

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links