Abstract
Killer whales (Orcinus orca) exhibit significant ecological and genetic diversity, with three primary sympatric populations in the Northeast Pacific: Resident, Bigg’s (Transient), and Offshore. Each population is characterized by distinct foraging habits, social structures, and vocal repertoires, which complicate accurate monitoring and conservation efforts. This dataset, compiled from diverse sources, provides a comprehensive resource for the detection and classification of killer whale vocalizations. The dataset includes annotated acoustic recordings spanning 11 years from various locations in Alaska, British Columbia, and Washington, collected using multiple hydrophone systems. It addresses the challenge of differentiating killer whale calls from other marine species and environmental noise, including specific instances of confounding signals that may help enhance model robustness. Detailed annotations capture a diverse suite of vocalizations and their associated metadata, facilitating the development of advanced machine learning models for ecological monitoring. This curated dataset aims to improve the accuracy of killer whale detection algorithms, support conservation efforts, and advance our understanding of killer whale acoustic communication across different populations.
Similar content being viewed by others
Background & Summary
Killer whales (Orcinus orca) are cosmopolitan, with distinct representatives found in every ocean. The killer whale lineage is complex and presently delineated into multiple ecotypes that are genetically distinct1,2. In the Northeast Pacific, killer whales have diverged into genetically and culturally distinct lineages that overlap in distribution. These lineages presently include three sympatric forms (ecotypes): Resident, Transient, and Offshore killer whales3,4,5. The known ecotypes are sympatric but socially isolated and do not interbreed1,5.
Ecotypes are distinguished by genetic, morphological, behavioral, and acoustic traits6,7. Within the Resident ecotype there are four levels of social structure. Matrilines form the basic social unit and are composed of the eldest female and her offspring. Groups of matrilinies that share recent maternal heritage are referred to as pods and clans represent groups of pods that share an acoustic dialect and distant maternal heritage8. The term community refers to pods and clans that regularly associate, interbreed but do not share matrilines or vocal similarity. Not all matrilines in a community necessarily share calls or dialects. There are three resident killer whale communities in the Northeast Pacific, Southern Resident killer whales (SRKW), Northern Resident Killer Whales (NRKW), and Southern Alaskan Residents (SAR). Social association within the Transient ecotype is similarly matrilineal but less rigid with adult males and females frequently splitting from the matrilineal group. As with the Resident ecotype, the Transient ecotype in the Northeast Pacific comprises several genetically and acoustically distinct communities, including the AT1 Transients, Gulf of Alaska Transients, and West Coast Transients (or Bigg’s killer whales)9,10. Little is known of the social organization of Offshore killer whales11.
Conservation status similarly varies between ecotypes and communities. The NRKW community, West Coast Transients, and the Offshore ecotype are considered “threatened” under the Species at Risk Act in Canada; SRKWs are “endangered”, and Offshore killer whales are a “species of special concern” under the same act11,12. Meanwhile, the Alaska Resident community is not designated as depleted under the US Marine Mammal Protection Act nor as threatened or endangered under the Endangered Species Act13. The population is functionally extinct with only seven animals thought to be alive in 2020 and no births in the last thirty years.
Each ecotype and community faces different stressors, SRKW are vulnerable to extinction due to a lack of available food, physical and acoustic disturbance, and persistent pollution in their environment14, e.g. masking of foraging and communication signals from transiting vessels15,16,17,18. There are significant and sustained efforts to improve the outcome population status of SRKW including reducing competition for salmon through fishing closures, and noise reduction efforts in both US and Canadian waters19,20. Critical Habitat designations, determined by years of visual and acoustic detections, inform these efforts20,21.
Acoustic monitoring is an integral part of monitoring the behavior, habitat, and efficacy of conservation projects such as vessel slowdowns19,22,23,24,25,26,27. However, passive acoustic methods typically generate large volumes of data which require automated processing to produce results within reasonable timeframes. A variety of generalized detection algorithms are available that work reasonably well as binary detectors of killer whale calls28,29 and neural network based killer whale detectors have similarly been developed to detect SRKW calls30,31. However, most of the existing detectors lack the ability to robustly distinguish between the highly variable killer whale calls and other signals in the same frequency band. While progress has been made in developing automated detection algorithms for killer whale vocalizations, there is considerable room for improving automated pipelines that discriminate between killer whale calls and other species signals, as well as between ecotypes and clans of killer whales.
All killer whale vocalizations can be grouped into three broad categories: echolocation clicks, whistles, and pulsed calls32,33,34. Echolocation clicks are impulsive sounds used in feeding and navigation with the majority of the energy between 20 and 100 kHz35,36. Rasps are less common and defined as a series of frequency modulated clicks that have been associated with foraging in other odontocetes37. Whistles are narrow band signals that aid in close-range communication, generally spanning from 0.5 to 25 kHz, and may be involved in coordinating movements and maintaining group cohesion38,39,40. Pulsed calls are broadband signals with energy between 0.5 to over 40 kHz41 and are the most common signal type used for communication by killer whales. They are composed of a series of pulses produced in such rapid succession as to sound tonal with multiple harmonics42. Pulsed calls form distinct, complex vocalizations (discrete calls) often characterized by a series of tonal elements that can have one or two overlapping fundamental frequencies43,44,45 that vary in contour and amplitude over time32. Pulsed calls are primarily used for social communication within and between individuals and communities, serving functions in social cohesion, mating, travel, foraging coordination43,44,45 and conveying social and behavioral cues. It is possible to discriminate between ecotypes, clans, and sometimes, pods or maternally related family groups by analyzing features of killer whale vocalizations. Resident killer whales produce calls in higher frequency ranges than Transient killer whales with significantly higher minimum, peak, and median call frequencies10,46,47. The Offshore killer whales produce calls with a higher minimum frequency than other ecotypes47,48. Such differences contribute to the distinct vocal repertoires and form the motivation for harnessing the power of modern classification methodologies to make the most of acoustic surveys in both archived or near real-time settings.
Accurate machine learning models rely on extensive and well-curated labeled datasets in order to reliably detect and classify killer whales in underwater sound recordings49,50. In acoustic ecology, the data used to train machine learning algorithms should ideally represent the full range of the animals’ vocalization repertoire, and those vocalizations should remain relatively static over time34. Many machine learning applications in conservation are targeted at longitudinal datasets to assess changes in occupancy of species on the scale of years or decades23,24,51,52,53. In species capable of cultural adaptation of their repertoires, including humpback (Megaptera novaenglea) and killer whales, data for machine learning algorithms must then contain signals that were previously heard in the environment (e.g. antiquated song, and killer whale calls from now deceased animals). Furthermore, environmental factors including but not limited to background noise, instrument parameters, and sound propagation conditions can all influence how robust detection and classification algorithms are.
This work represents the largest curated dataset of audio and annotations to date as part of the 2026 Biennial Conference and Workshop on Detection, Classification, Localization, and Density Estimation of Marine Mammals using Passive Acoustics (DCLDE). Datasets associated with the DCLDE workshops have allowed for the continual development and evaluation of detection and classification algorithms for these challenging species. As part of the DCLDE, the goal of this dataset is to facilitate the construction and evaluation of detectors that are capable of 1) discriminating killer whale calls from other acoustically similar species and 2) discriminating between different Northeast Pacific ecotypes of killer whales.
Multiple groups have collaborated to produce over 225,000 bounding box acoustic annotations from 23 locations, encompassing 1.6 TB of audio data collected in the Northeast Pacific Ocean from Washington State to Southeast Alaska. These recordings, captured at depths ranging from 8 to 253 m, span a nine-year period between May 2013 to April 2023. The dataset represents a diverse set of projects with varying deployment, processing, and annotation methodologies, contributed by a collaboration of industry partners, not-for-profits, universities, and governmental organizations (Tables 1, 2). Data records are organized by provider, with details on deployment, processing, and annotation methodologies outlined in the following sections. Additionally, we provide a uniform and collated *.csv file with ecotype-level classifications.
Data Records
Audio, annotation and meta files (where available) have been archived and made available by the US National Center for Environmental Information54.
Data in the repository are organized into folders by the provider. Under each provider, there are folders for Audio, Annotations, and Metadata (where applicable). The Audio folder contains all audio files contributed by the provider and are organized by deployment locations. Any additional information provided by the contributor including hydrophone or deployment methods, additional resources for accessing mirrored copies of the data, or applicable reports describing the data collection methodologies is stored in an optional ‘meta’ folder under the provider. To limit the size of the complete dataset, only audio files with annotations are included.
To aid in rapid usability we also provide a standardized annotation file collated across all providers (Annotations.csv). The collated annotation file includes standardized annotations from across all datasets with labels described in Technical Validation section (Table 3).
The original annotations often contain considerable information that is beyond the scope of the DCLDE challenge including a variety of different labels for biologic and anthropogenic sounds and finer resolution on killer whale calls including matriline, pulsed call type, or other non-standard calls such as ‘buzzes’ or ‘rasps’. These annotation details may be of interest to those knowledgeable in the field of killer whale acoustics hence their inclusion. However, as this information data were not consistently collected across or within projects it was not included in the combined Annotations csv described below. Additional information about the analysis procedure, where applicable, is stored in the ‘meta’ folder in each organization’s data along with any additional deployment information or relevant reports provided by the dataset authors.
Methods
In this study, we sought to build an “ecologically representative” dataset with comprehensive coverage of annotated audio signals spanning the entire vocal repertoire of the three ecotypes of killer whales in the Northeast Pacific Ocean: Resident, Transient, and Offshore killer whales. The dataset encompasses recordings sourced from a variety of geographical locations and varying recording conditions. A critical requirement for the dataset is its capability to facilitate the discrimination of target species vocalizations from those produced by other organisms within the survey area. In particular, humpback whale song units and whistles from other odontocetes, such as Pacific white-sided dolphins, are easily confused with killer whale pulsed calls. Effort was made to include anthropogenic noises such as ship propeller cavitation and other abiotic sounds that can sometimes confuse both humans and machine learning models. Therefore, the dataset includes specific instances of a variety of confounding signals to potentially enhance the robustness of any detection and classification algorithm developed with these data.
Building such a dataset is challenging and often cost prohibitive for a single organization. Thus, in this effort we have combined smaller annotated datasets from multiple commercial, non-profit, academic, and governmental organizations to build an ecologically representative annotation dataset. Much of the annotation effort was provided through the Humans and Algorithms Listening and Looking for Orcas (HALLO) project which used a standardized annotation procedure included in the Supporting Document. The following section provides detailed information on the 1) Deployment 2) Processing and 3) Annotation procedures for each data contributor. Deployment information, where available, is presented in Table 1 and detection details are presented in Table 2. While every effort has been made to regularize metadata across the entirety of the dataset, this was not always possible. Rather than exclude data not meeting an arbitrary threshold, we provide as much detail as possible and leave the final decision on which datasets to include or exclude to the user’s discretion. The following sections provide details for the audio and annotations contributed by each provider. A brief description of the goals of the datasets are provided as well as deployment information, any pre-processing algorithms used to re-sample the audio data or automatically detect cetaceans, and details on the annotation process including software used, and audio settings where possible. We also indicate whether the provided annotations were strong (all calls annotated) or weak (some calls likely not annotated in the files), and for what Class/Species label.
JASCO and the Vancouver Fraser Port Authority (JASCO/VPFA)
The Vancouver Fraser Port Authority (VFPA) in collaboration with JASCO Applied Sciences, collected data from two locations: Haro Strait and Boundary Pass. These data were part of the Enhancing Cetacean Habitat Observation (ECHO) program which aims to improve killer whale acoustic habitat through voluntary vessel speed reductions19.
Deployment
Two AMAR recorders were deployed between 210 and 251 m, directly adjacent to the southbound and northbound shipping lanes in Haro Strait (Table 1, Fig. 1). Instruments were deployed and recovered twice over the study length. The first deployment extended between July 6th and September 8th 2017. Instruments were deployed, recovered and refurbished before being re-deployed at the same Haro Strait locations on September 8th and recovered on October 26th 2017. AMARs from the Haro Strait locations sampled at 96 kHz.
Map of hydrophone locations off the coasts of British Columbia, Canada, as well as Washington and Alaska, United States, which contributed recordings to this dataset (inset). Diamonds indicate recording locations and red transparent circles indicate approximate hydrophone locations for DFO deployments. It should not be assumed that recording locations fall within these circles. Numbers correspond to the dataset providers. 1 – 3 Orcasound; 4 ONC; 5 – 6 DFO CRP; 7 – 10 DFO WDLP; 11 SIMRES; 12 – 13 SIO; 14 – 17 JASCO/VFPA; and 18 SMRU Consulting.
AMAR recorders were deployed in Boundary Pass at 193 m depth, adjacent to the shipping lanes. Instruments collected data at two locations between September 2nd, 2018 and April 2nd, 2019. The AMARs sampled at 96 kHz.
Processing
For all deployments, likely marine mammal encounters were initially identified using custom detection and classification algorithms within PAMlab, a proprietary acoustics toolbox developed by JASCO Applied Science. Files that detected marine mammal signals were selected for human inspection and detailed annotation effort.
Annotation
Acoustics encounters (period in which multiple animal sounds are detected) were manually annotated by expert analysts for the presence of killer whale calls following the HALLO protocol. Expert annotators used Raven Pro v 1.5 to identify killer whale calls and, where possible, classify calls to call type. Annotators also noted the presence of a variety of non-target calls and abiotic sounds including unknown signals, background noise, fish, and potential Pacific-white-sided dolphins. Annotators were allowed to vary the spectrogram settings as needed in order to identify killer whale signals but generally set spectrogram settings to FFT length of 2600 and 50% overlap (1300 samples), 20 dB amplification, 20 sec timescale and 0 to 11 khz frequency scale. The audio files have been fully annotated for the presence of killer whale pulsed calls and whistles. All other annotation including killer whale clicks, abiotic sounds, and other biological sounds have been weakly annotated. Files are annotated to the call level.
JASCO, Vancouver Fraser Port Authority, Ocean Networks Canada (JASCO/VPFA/ONC)
The Strait of Georgia underwater listening station (ULS) was a collaborative project between the Vancouver Fraser Port Authority, Transport Canada, Ocean Networks Canada and JASCO Applied Sciences operational. The deployment aims to monitor noise along the northbound shipping lane en route to the ports along Vancouver’s coastline. Data from this hydrophone were formerly used by the ECHO Program for evaluating vessel noise emissions and marine mammal detections19.
Deployment
The Strait of Georgia ULS is situated on the seabed at approximately 170 m depth, near the northbound shipping lane in Boundary Pass. Synchronized data from four hydrophones were streamed to shore from September 23, 2015 to March 30, 2018. Data was streamed via the Victoria Experimental Network Under the Sea (VENUS), an observatory operated by Ocean Networks Canada. Data were sampled at 64 kHz (effective bandwidth 10 Hz to 32 kHz) until 2017, and at 128 kHz per second (effective bandwidth 10 Hz to 64 kHz).
Processing
For all deployments, likely marine mammal encounters were initially identified using custom detection and classification algorithms within PAMlab, a proprietary acoustics toolbox developed by JASCO Applied Science. Files that detected marine mammal signals were selected for human inspection and detailed annotation effort.
Annotation
Acoustic encounters identified with PAMlab were manually annotated by expert analysts for the presence of killer whale calls following the HALLO protocol (HALLO Annotation Guidelines) using Raven Pro v 1.5 classifying calls to call type level, where possible. Annotators changed spectrogram settings it increase the detectability of killer whale sounds but generally set spectrogram settings to FFT length of 2600 samples with 50% overlap (1300 samples), 20 dB amplification, 20 s timescale and 0 to 11 kHz frequency scale. Annotators also noted the presence of a variety of non-target calls and abiotic sounds including unknown signals, background noise, fish, sonar, and potential Pacific-white-sided dolphins. The audio files have been fully annotated for the presence of killer whale pulsed calls and whistles. All other annotation including killer whale clicks, abiotic sounds, and other biological sounds have been weakly annotated. Files are annotated to the call level.
SMRU Consulting (SMRU)
SMRU Consulting in collaboration with the Whale Museum on San Juan Island have maintained a cabled hydrophone within SRKW core habitat for two decades. These data have also been used in evaluating the potential benefits of voluntary ship slowdowns19. Data are routinely evaluated for the presence of killer whales and humpback whales. The hydrophone location is also within visual range of the Lime Kiln Lighthouse which houses volunteers trained for whale and dolphin identification. Data for the DCLDE coincided with periods of visually confirmed killer whale and humpback whale presence around the Lime Kiln hydrophone.
Deployment
The recording setup consists of a cabled Reson TC4032 hydrophone ~70 m from shore mounted to the seafloor at 23 m depth. Data were collected across several sequential deployments between November 6th, 2016 and September 13th, 2020. Data were digitized at 250 kHz sample rate, 16-bit resolution using a SMRU Consulting data acquisition board, recorded as.wav files and uploaded to a cloud-based system for long-term storage.
Processing
Audio data from the Lime Kiln hydrophone were processed for the presence of biological sounds with the PAMGuard whistle and moan detector55.
Annotation
Acoustic encounters identified with PAMguard were manually annotated by expert analysts for the presence of killer whale calls following the HALLO protocol (HALLO Annotation Guidelines) using Raven Pro v 1.5. classifying calls to call type level, where possible. Annotators adjusted spectrogram settings to better identify calls from background noise but generally used a 2600-sample FFT with 50% overlap (1300 samples), 20 dB amplification, 20 s timescale and 0 to 11 kHz frequency scale. The audio files have been fully annotated for the presence of killer whale pulsed calls and whistles. All other annotation including killer whale clicks, abiotic sounds, and other biological sounds have been weakly annotated. Files are annotated to the call level.
Ocean Networks Canada (ONC)
Ocean Networks Canada (ONC) contributed data from an underwater observatory in Canadian waters. The ONC observatory collects continuous oceanographic data for the benefit of science, society, and industry. The observatory nodes are equipped with calibrated hydrophones56 to record long term data on changing ocean soundscapes, supporting research on noise and soniferous animals. Calibration information and other metadata are available on the Ocean Data Portal. Audio files from this provider are collated from three publicly available datasets (https://doi.org/10.34943/d644336d-eb3e-4bf0-b2ef-0cdf3d8bd0db, https://doi.org/10.34943/e03fd4fb-3029-4a40-9174-0bd3e4d99276, https://doi.org/10.34943/7bef925c-de7e-4e31-80d6-a78c71f9aec5).
Deployment
Acoustic data were collected using an Ocean Sonics SC2 recording system deployed on the Barkley Canyon Upper Slope platform of ONC’s Northeast Pacific Time-series Underwater Networked Experiments observatory. The hydrophone was mounted 1 m above the sea floor at 168 m depth. Acoustic data were sampled at 64 kHz. Note that these data contain broadband clicks every 2 seconds produced by an Acoustic Doppler Current Profiler deployed 70 m away. Continuous noise at 12.5 kHz and 25 kHz from the power supply is also present. Audio files were recorded at 64 khz with 24 bit resolution. A 25.6 kHz anti-aliasing filter was applied during data collection and digitization yielding reduced apparent sound intensities above 25.6 kHz.
Processing
No automatic processing software was used to identify acoustic encounters.
Annotation
PAMlab was used to manually identify marine mammal encounters and create bounding boxes around signals of interest. Spectrogram settings were allowed to vary to increase visibility but generally a Hamming window, 16.25 ms time step, 31.25 ms frame length, yielding a frequency resolution of 32 Hz. Audio data were completely annotated for the presence of killer whale pulsed, buzzes, and whistles. Data were incompletely annotated for the presence of killer whale echolocation clicks, humpback whales, and dolphin. All dolphin annotations in this deatset were ‘Lb|Lo’ indicating uncertainty between Pacific right whale dolphin (Lissodelphis borealis) and white-sided dolphin (Lagenorhynchus obliquidens) whose calls are not acoustically differentiable. All dolphin annotations are labeled as ‘UndBio’ in the ClassSpecies column of the in the combined annotations.csv file.
The ‘Call.Type’ column in the original data contains information about the call type category. For killer whales this includes clicks (‘CK’), whistles (‘W’), and pulsed calls (BP). For humpback whales this includes song (S) and non-song (NS) communication. Note that call type here refers to the category of call rather than the catalogue of SRKW calls. No uncertain killer whale calls were included in these annotations. No uncertain marine mammal calls were included in this dataset.
Orcasound
Orcasound is a cooperative hydrophone network and an open-source software and hardware project. Orcasound audio and annotations were compiled from multiple recording efforts spanning from 2017 to 2020, from low-cost hydrophones. This public dataset includes nine labeling efforts with the ‘Pod.Cast’ annotation tool, an open-source web app developed by Microsoft Hackathon volunteers to efficiently analyze audio data to detect the presence of killer whale calls. Original audio recordings and annotations are accessible via Orcasound’s open labeled data bucket. The dataset is organized into annotation rounds that used audio data from various Orcasound locations with a range of signal-to-noise ratios for SRKW calls and background noise characteristics. Full details of Orcasound data are available on the GitHub repositories for these projects.
Deployment
The Orcasound data were gathered from three shallow (<10 m at low tide) sites in Washington State, USA. the Orcasound Lab on San Juan Island tested a wide variety of hydrophone elements, including HTI 99-MIN, Aquarian AS-1, and ITC1032 models between September 27th, 2017 and September 7th, 2020. The Bush Point on Whidbey Island deployed a single CRT26-08 between September 7th 2020 and October 19th 2020. At Port Townsend, LabCore-40 or CRT26-08 elements were deployed for one month starting September 8th, 2020. All hydrophones were deployed using bespoke, affordable live-streaming equipment (Raspberry Pi with the Pisound ADC HAT [24-bit resolution, stereo, at multiple sampling rates; max 192 kHz]) and the open-source code that generates compressed, lossy audio segments in HLS format and uploads it to an open S3 bucket sponsored by Amazon. Hydrophones and recording systems for these projects have not been calibrated.
Processing
Audio data were collected and processed in a variety of formats. Data from Orcasound locations with ‘OS’ or ‘1562344334 ‘prefix were sampled at 20 kHz and a 9 kHz respectively and lowpass was applied. Files from the Orcasound lab with an ‘rpi’ prefix were recorded in stereo and sampled at 48 kHz with a steep 16.5 kHz lowpass filter. The Bush Point and Port Townsend data were similarly recorded at 48 kHz with the same 16.5 kHz lowpass filter although the Port Townsend data were recorded in stereo.
Annotation
Candidate periods of SRKW activity were initially flagged by citizen scientists using live-streamed audio, then reviewed and annotated by expert analysts. The dataset consists of nine ‘Rounds’ from the Orcasound Pod.Cast project, where SRKW presence was pre-labeled using a high-recall classifier and crowd-sourced validation. Annotations are ‘SRKW’ (with start/end times) or ‘non-SRKW’ (labelled ‘Abiotic,’ without time/frequency boundaries). Non-SRKW labels may include ship noise or other sounds but were not further validated. FFT parameters used to create bounding boxes were not retained as they varied across deployments. No uncertain killer whale calls were included in this dataset. Users seeking to include these data in anthropogenic sound detectors should therefore validate these data further or exclude from the final dataset. Orcasound files are strongly annotated for killer whale pulsed calls and weakly annotated for other signals.
Scripps Institute of Oceanography (SIO)
Audio data were collected at two locations off the coast of Washington State as part of a long-term monitoring project between 2008 and 2012. Recordings were made using high-frequency acoustic recording packages (HARPs), autonomous underwater systems designed for long-term passive acoustic monitoring57. These data consist of encounters included in previously published work25,58.
Deployment
Two HARPs were deployed: one in 100 m depth nearshore (Cape Elizabeth) between June 17th, 2008 and January 17th 2012, and one in 1400 m depth at an offshore (Quinault Canyon) location between January 27th, 2011 and June 30th, 2013. The HARPs sampled continuously at 200 kHz with 16-bit resolution. Data from this project represents the most southerly locations as well as the deepest deployment.
Processing
Original pulsed annotations described in Rice et al.25 were identified using Triton software click detection algorithms. Files containing known killer whale calls were assigned to ecotype based on distinct tonal signals associated with known pods off Washington State.
Annotation
Files containing killer whale calls were completely annotated for the presence of pulsed calls, and weakly annotated for whistles. Humpback whale calls were added opportunistically and examples of self-noise, tagged as abiotic signals, were included as these signals show structural similarities to biological signals. All annotations were created using Raven Pro v1.6 (FFT: 19.2 ms, 22.4 Hz). Only calls that could be confidently identified as killer whales or humpback whales are included in the annotation files. Killer whale ecotype classes were defined off the original encounter labels58. Though present in the encounters, echolocation clicks were not labeled during the annotation effort.
Saturna Island Marine Research and Education Society (SIMRES)
SIMRES maintains hydrophones within Boundary Pass as part of the BC Hydrophone Network. This network collaborates to enable quantification and monitoring of the ocean soundscape within SRKW habitat. Data were collected from two hydrophones located near the eastern peninsula of Saturna Island (East Point). Data and annotations represent periods when SRKW were both acoustically and visually detected within a few kilometers of the hydrophones.
Deployment
Acoustic recordings were made with an Ocean Sonic’s icListen high-frequency smart hydrophone (RB9-ETH) cabled to a shore station. The hydrophone is deployed near a commercial shipping lane in Boundary Pass and is approximately 120 m from shore at 18 m depth. Recordings were made June through October 2022. Data are continuously sampled at 128 kHz with 24-bit resolution but were decimated to 64 kHz in the files provided.
Processing
Audio files were decimated to 55 kHz and used a high pass filter with reduced apparent sound intensities above 50 kHz. Audio data were not pre-processed with any detection algorithms for this study.
Annotation
SRKW communication signals were annotated in Raven Pro v 1.6 with 2048 sample FFT length and 50% overlap yielding a time and frequency resolution of 8 ms and 62.5 Hz. Bounding boxes demarcated the start and end time of the signal as well as the low and high frequency boundaries. Data were strongly annotated for the presence of killer whale pulsed calls, whistles, buzzes, and rasps. When possible, pulsed calls were further classified into the specific call types44. Data were weakly annotated for clicks with a single example marked in each audio where present. Annotated signals were assigned a confidence rating of either ‘low’, medium’, or ‘high’ to specify the level of certainty provided by the annotator. All killer whale annotations were included in the combined annotation dataset regardless of quality. Annotations with ‘low’ confidence scores were labeled as ‘Uncertain” or 0 in the KW_certain column of the combined Annotations.csv.
Fisheries & Oceans Canada (DFO)
Two groups within Department of Fisheries and Oceans Canada (DFO) provided datasets to the DCLDE challenge; the Cetacean Research Program and the Whale Detection and Localization Program. Data processing methods were consistent across projects within each lab but varied slightly between labs. Exact hydrophone locations are not publicly available for any DFO hydrophone dataset. Instead, general location descriptors are provided (Fig. 1). The focus of the original analysis effort that resulted in these datasets was simply to identify which of the recording files contained killer whales calls for use in various habitat studies. The two DFO datasets are discussed below. Data from the DFO providers represent the only Northern Resident Killer Whale annotations and recordings in this dataset.
DFO Cetacean Research Program (DFO CRP)
Data from the Cetacean Research Program (DFO CRP) lab consisted of two deployments, one was an AURAL-M2 deployed on the continental shelf edge off the west coast of Vancouver Island and another from a Sound Metrics SM2M hydrophone deployed off northern coastal British Columbia. Data include 375 days between May 18th, 2011 and May 24th, 2012 for west Vancouver Island, and 116 days between October 18th, 2013 and February 3rd, 2014 for recordings from northern coastal BC. Both hydrophones sampled at 16.384 kHz.
Deployment
Data were collected using two different hydrophones: an AURAL-M2 was moored at 114 m depth off the west coast of Vancouver Island and sampled audio at 16.4 kHz; an SM2M was moored at 35 m depth off the Northern mainland coast of BC and sampled audio at 16 kHz with 16-bit resolution. Exact locations were not made available for this competition.
Processing
The raw audio recordings (.wav) were processed using the whistle and moan detector in PAMGuard version 1.12.0828 with an FFT length of 512 samples, and 50% overlap (256 samples). The detector was configured with a high-pass filter of 800 Hz to limit the number of humpback whale detections and lessen the manual validation burden. The signal-to-noise detection threshold was set to 6 dB. All detections in the first two seconds of each five-minute file were excluded because the detection algorithm produced false detections within this period.
Annotation
All detections including whistles and pulsed calls were aurally and visually reviewed by expert annotators using PAMGuard and identified to species (for biotic) and sound type (for abiotic) using the same spectrogram settings as the whistle and moan detector. Where applicable and as time allowed, detections were identified to species or ecotype level depending on the clarity of the call or acoustic encounter. In post processing, overlapping detection boundaries (start time, end time, low frequency, high frequency) were merged based on timing and annotation labels. The original annotations files representing un-merged detections are provided in the ‘original’ subfolder in the DFO_CRP Annotations folder. Individual detections identified by PAMGuard may represent separate components of the same call (i.e. harmonics or sidebands), thus, not every detection represents a unique vocalization. The PAMGuard Whistle and Moan detector detects individual contours, so all individual harmonics within a call may constitute separate detections if they meet the detector’s criteria (this happens quite frequently).
DFO Whale Detection and Localization Program (DFO WDLP)
DFO’s Whale Detection and Localization Program (DFO WDLP) worked in collaboration with DFO’s Acoustics Program to provide data from four deployment locations in Canadian waters. These included Carmanah Point, Swanson Channel, and two locations in the Strait of Georgia (SOG North and SOG South where north and south are in relation to each other).
Deployment
Four locations were chosen for the study area: Carmanah Point, Swanson Channel, and the southern region of the Canadian waters of the Strait of Georgia, SOG North and SOG South. As with all DFO data, the exact locations are not publicly available. A SoundTrap ST600 HF was used at Carmanah Point, while AMAR G4 hydrophones were used at the three other locations. There was one deployment at each of Carmanah and Swanson Channel, lasting between 3 and 5 months from September 2021 through June 2022. There were two deployments at each of SOG North and SOG South, each deployment lasted between 1 day, and 4 weeks. Audio data were continuously sampled at either 192 kHz for the SoundTrap or 256 kHz for the AMARs. AMARs had 24-bit resolution.
Processing
Audio recordings were processed with the Whistle and Moan Detector in PAMGuard version 2.02.0328 for the presence of potential killer whale calls. Audio files were decimated within PAMGuard to 48 kHz, and a weak IIR Butterworth high-pass filter with a threshold of 2 kHz and an order of 1 was applied to reduce background noise in the lower frequency bands. The SNR detection threshold was set to 8 dB. Nominal sensitivities of −164.1 dB and −176.2 dB were used for the AMARs and SoundTrap, respectively. The Whistle and Moan Detector used a minimum frequency threshold of 200 Hz, a maximum frequency threshold of 24 kHz (the Nyquist frequency), and a minimum contour length of 15 time-slices (about 341 ms); otherwise, all other detection settings were kept at their defaults. In the detector’s noise and thresholding tab, all boxes on PAMGuard dashboard were selected except “Run Gaussian Kernel Smoothing” and any input values were kept at their default values as well. The FFT engine used with the detector selected an FFT length of 2048, a hop size of 1024, and a Hann window function with the same noise parameters as those used with the detector.
Annotation
All detections produced by the Whistle and Moan Detector were evaluated for the presence of killer whales and annotated as such using a custom PAMGuard plugin using the same spectrogram settings as the whistle and moan detector. Detected sounds included whistles and pulsed calls; echolocation clicks were not included as they typically do not trigger the detector due to their short length. As with the Cetacean Research Program’s dataset, detections were merged in post-processing to reduce the number of duplicated annotations. The original annotations files representing un-merged detections are provided in the ‘original’ subfolder in the DFO_WLDP Annotations folder. Killer whale calls not detected by the PAMGuard whistle and moan detector were not added to the annotations. Therefore, while these data likely contain the majority of the visually or aurally identifiable calls, the annotation labels are considered weak for all species. No effort was made to annotate killer whale clicks.
University of Alaska Fairbanks and North Gulf Oceanic Society (UAF)
Data contributed by the University of Alaska Fairbanks and North Gulf Oceanic Society are part of a long-term killer whale monitoring project in the Gulf of Alaska (Fig. 2). This includes recordings of the Southern Alaska Resident, Gulf of Alaska Transients, AT1 Transients, and Offshore killer whales from both stationary moorings and focal follows23. Transient and Offshore killer whales were rarely encountered during vessel surveys, and Transient killer whales vocalize less often than Residents59,60 making field recordings of these ecotypes difficult to obtain. We therefore also contributed killer whale recordings from moored hydrophones in the region, on which we detected Gulf of Alaska Transients, AT1 Transients, or Offshore killer whales. The metadata folder associated with these data contains three files. The Myers_DCLDE_2026_files.xls file was used to relate filenames, ecotypes, and locations in the original annotation files to the final annotations. It contains three headings, Filename, Ecotype, Population (i.e. community), Location, and recording time in UTC. Filename refers to Soundtrap audio file names containing the start time, UTC is the corrected start time. Location values are abbreviations for Hinchinbrook Entrance (HE), Kachemak Bay (KE), Montague Strait (MS), Resurrection Bay (RS). These represent fixed hydrophone locations. Location values for the focal fallows are labeled ‘field’ in the location column. Audio files are organized according to the instrument name, or ‘field’ for field recordings. Metadata for fixed instrument locations is contained in an external file (Hydrophone locations.xls). Information pertaining to the field recordings is in the attached report “20120114-N_Matkin_FY20_Annual_Report.pdf”. GPS tracks from the focal follows are not provided.
Deployment
Focal follows
Recordings of southern Alaska Residents were taken with a dipping hydrophone during vessel survey encounters in Prince William Sound and Kenai Fjords (Fig. 1) between May and October in 2019, 2020, and 2021. When killer whales were encountered, we photographically identified as many individuals present as possible. We then maneuvered the vessel approximately 500 m in front of the animals, shut off the engine, and collected a field recording. Recordings before June 16th, 2021 were made with a High-Tech, Inc. HTI-96-Min hydrophone deployed at approximately 8 to 10 m depth with a two channel TASCAM DR100 portable digital recorder (sampling rate 24 kHz). Only the first channel was used. Recordings after June 16th, 2021 were made with an Ocean Instrument’s SoundTrap ST300 hydrophone (sampling rate 24 kHz 16 bit resolution) deployed at 20–30 m depth (Table 1).
Moored hydrophones
Moored hydrophones were deployed in Hinchinbrook Entrance, Kachemak Bay Montague Strait, and Resurrection Bay (Table 1). All moored hydrophones were Ocean Instruments SoundTrap ST300s, except for the hydrophone in Montague Strait in 2023 which was a model ST600. Hydrophones were deployed at depths of 25 – 42 m on primarily gravel and sand substrate and moored approximately 2 m above the seafloor. Hinchenbrook Entrace included one 2-week deployment, Kachemak Bay included short(1-day) and long deployments (6 month) deployments, between 2020 and 2022. Montague Strait included three deployments lasting between 1-day and 8.5 months between 2019 and 2023. Resurrection Bay had a single 3 month deployment between November 21st 2019 and February 23rd, 2020. All moored hydrophones recorded at a 24 kHz sampling rate with 16-bit resolution and were duty cycled (primarily 5 min on, 10 min off) based on battery requirements.
Processing
All acoustic data from moored hydrophones were processed using the whistle and moan detector in the open-source software package PAMGuard v.1.15.1755. Spectrograms were created with a 1024 sample FFT with 50% overlap. The Whistle and Moan detector identified tonal signals in the 700 to 12,000 Hz frequency range with a minimum length of 15 time slices, minimum size of 30 pixels that met an 8 dB signal-to-noise ratio threshold.
Annotation
In recordings with killer whales, discrete pulsed calls were manually annotated by an expert acoustician in Raven Pro v.1.6.5 using similar spectrogram settings as the PAMGuard analysis. Bounding boxes were drawn around each call, noting the call start time, end time, low frequency, high frequency, and call length in selection tables for each audio file. Spectrogram settings were varied to allow for increased resolution across the annotation process. Recordings with at least three PAMGuard detections were visually and aurally checked by an expert acoustician and classified to the ecotype and/or community level. Gulf of Alaska Transients and AT1 Transients were identified using published call catalogues60. Offshore killer whale detections were confirmed by J. Pilkington. A minority of recordings were excluded if they included multiple killer whale ecotypes or killer whale and humpback whale vocalizations in the same recording. Annotations for killer whale pulsed calls are strong and other signals should be considered weak.
Technical Validation
All annotations were created by expert analysts at their respective institutes based on a canonical catalogue of killer whale calls44 and experience. As with all biological signals, the sound quality of the killer whale vocalizations and certainty of the classification varies considerably based on background noise, distance between the animal and the hydrophone, and propagation conditions. Furthermore, despite manual validation of audio files representing the ‘gold-standard’ in bioacoustics inter-observer variation is common61.
Annotations are comprehensive but are not intended to be exhaustive. In no project was there a concerted effort to label every potential call and signal type in each audio file.
This dataset was collated for the purpose of building killer whale detection and ecotype classification algorithms. In doing so, users should consider both their intended applications and potential limitations. For instance, users will immediately note that sample rates and filter settings differ considerably between contributed datasets. Much of the effort in classifying killer whale ecotypes, communities, and clans has utilized lower frequency sound < 12 kHz26. However, as seen in this dataset, killer whale vocalizations may have fundamental frequencies at or above 20 kHz. Whether or not the features present at higher frequencies represent useful information for discrimination is yet to be determined. Standardizing the data through decimating higher frequency audio will likely result in the loss of clicks, and harmonic structure that may be informative to classification algorithms. Conversely, standardizing the data through excluding audio files sampled at lower rates will greatly restrict the available data rendering it less ecologically representative.
No efforts have been made to normalize audio files across the providers to account for different gain and calibration settings between the various instruments and individual project goals. Where these data are available, they are provided in the ‘meta’ folder for each provider.
The data presented here represent a variety of annotation levels across projects. Efforts have been made to produce a collated dataset representing the lowest common denominator of these categories (ecotype). However, all killer whales produce a variety of calls and only a portion of those calls are known to be indicative of the different ecotypes or pods43,44. The HALLO annotation protocol asked experts to identify both the pod (SRKW) and stereotyped call type32. Therefore, these annotations may be of use in appropriately augmenting datasets to balance not only ecotypes but stereotyped calls. Similarly, echolocation clicks in the sound files have not been consistently annotated but are included in the collated annotation csv file. As echolocation clicks can be diagnostic of species and potentially ecotype58, further annotation of this dataset could feed into training or validation based on echolocation characteristics.
Data for this project represents a large collaboration of groups and institutions and each dataset was processed in accordance with each group’s project goals which ranged from species presence/absence to call-type production. Post processing of the annotations was done to provide a uniform resource for machine learning algorithms. However, users should consider details from each deployment carefully to determine whether they wish to do any additional post-processing. For example, multiple annotations from the DFO datasets may represent different harmonics of the same call. Alternatively, data derived from ONC projects considered only pulsed calls. Thus, unannotated whistles and echolocation clicks may be present in some files. See individual datasets above for details.
Users of these data should also note that all annotations were done by independent experts. As with all bioacoustics annotations, inter-observer variation will be present62,63,64,65,66,67.
Code availability
The R code used to collate data and annotations is available here: https://doi.org/10.5281/zenodo.15743033.
References
Barrett-Lennard, L. G. & Ellis, G. M. Population Structure and Genetic Variability in Northeastern Pacific Killer Whales: Towards an Assessment of Population Viability. Can. Sci. Advis. Secretaria, Research Document 2001/065 (2001).
Morin, P. A. et al. Revised taxonomy of eastern North Pacific killer whales (Orcinus orca): Bigg’s and resident ecotypes deserve species status. R. Soc. Open Sci. 11, 231368 (2024).
Baird, R. W. & Stacey, P. J. Variation in saddle patch pigmentation in populations of killer whales (Orcinus orca) from British Columbia, Alaska, and Washington State. Can. J. Zool. 66, 2582–2585 (1988).
Balcomb, K. C. III & Bigg, M. A. Population biology of the three resident killer whale pods in Puget Sound and off southern Vancouver Island. Behav. Biol. Kill. Whales Alan R Liss N. Y. N. Y. 85–95 (1986).
Ford, J. K. et al. Dietary specialization in two sympatric populations of killer whales (Orcinus orca) in coastal British Columbia and adjacent waters. Can. J. Zool. 76, 1456–1471 (1998).
Ford, J. K. B. & Ellis, G. M. You Are What You Eat: Foraging Specializations and Their Influence on the Social Organization and Behavior of Killer Whales. in Primates and Cetaceans: Field Research and Conservation of Complex Mammalian Societies (eds. Yamagiwa, J. & Karczmarski, L.) 75–98 https://doi.org/10.1007/978-4-431-54523-1_4 (Springer Japan, Tokyo, 2014).
Whitehead, H. & Ford, J. K. B. Consequences of culturally-driven ecological specialization: Killer whales and beyond. J. Theor. Biol. 456, 279–294 (2018).
Ford, J. K. B. Killer Whales: Behavior, Social Organization, and Ecology of the Oceans’ Apex Predators. in Ethology and Behavioral Ecology of Odontocetes (ed. Würsig, B.) 239–259. https://doi.org/10.1007/978-3-030-16663-2_11 (Springer International Publishing, Cham, 2019).
Barrett-Lennard, L. G. Population structure and mating patterns of Killer Whales (Orcinus orca) as revealed by DNA analysis. https://doi.org/10.14288/1.0099652 (University of British Columbia, 2000).
Sharpe, D. L., Castellote, M., Wade, P. R. & Cornick, L. A. Call types of Bigg’s killer whales (Orcinus orca) in western Alaska: using vocal dialects to assess population structure. Bioacoustics 28, 74–99 (2019).
Ford, J. K. B., Stredulinsky, E. H., Ellis, G. M., Durban, J. W. & Pilkington, J. F. Offshore Killer Whales in Canadian Pacific Waters: Distribution, Seasonality, Foraging Ecology, Population Status and Potential for Recovery. DFO Can. Sci. Advis. Sec. Res. Doc. 2014/088. vii+55 p. https://publications.gc.ca/collections/collection_2014/mpo-dfo/Fs70-5-2014-088-eng.pdf (2014).
Fisheries and Oceans Canada. Action Plan for the Northern and Southern Resident Killer Whale (Orcinus Orca) in Canada. Species at Risk Act Action Plan Series. Fisheries and Oceans Canada, Ottawa. v + 33 pp. https://www.registrelep-sararegistry.gc.ca/virtual_sara/files/plans/Ap-ResidentKillerWhale-v00-2017Mar-Eng.pdf (2017).
NOAA. Alaska Marine Mammal Stock Assessments, 2022. KILLER WHALE (Orcinus Orca): Eastern North Pacific Alaska Resident Stock. (2023).
Lacy, R. C. et al. Evaluating anthropogenic threats to endangered killer whales to inform effective recovery plans. Sci. Rep. 7, 14119 (2017).
Burnham, R. E. & Vagle, S. Interference of Communication and Echolocation of Southern Resident Killer Whales. in The Effects of Noise on Aquatic Life: Principles and Practical Considerations (eds. Popper, A. N., Sisneros, J., Hawkins, A. D. & Thomsen, F.) 1–14. https://doi.org/10.1007/978-3-031-10417-6_22-1 (Springer International Publishing, Cham, 2023).
Stewart, J. D. et al. Traditional summer habitat use by Southern Resident killer whales in the Salish Sea is linked to Fraser River Chinook salmon returns. Mar. Mammal Sci. 39, 858–875 (2023).
Veirs, S., Veirs, V. & Wood, J. D. Ship noise extends to frequencies used for echolocation by endangered killer whales. PeerJ 4, e1657 (2016).
Williams, R. et al. Warning sign of an accelerating decline in critically endangered killer whales (Orcinus orca). Commun. Earth Environ. 5, 1–9 (2024).
Joy, R. et al. Potential Benefits of Vessel Slowdowns on Endangered Southern Resident Killer Whales. Front. Mar. Sci. 6 (2019).
Thornton, S. J. et al. Areas of Elevated Risk for Vessel-Related Physical and Acoustic Impacts in Southern Resident Killer Whale (Orcinus Orca) Critical Habitat. DFO Can. Sci. Advis. Sec. Res. Doc. 2022/058, vi+48 p. (2022).
Ford, J. K. B. et al. Habitats of Special Importance to Resident Killer Whales (Orcinus orca) off the West Coast of Canada. DFO Can. Sci. Advis. Sec. Res. Doc. 2017/035, viii+57 p. (2017).
Burham, R. E., Palm, R. S., Duffus, D. A., Mouy, X. & Riera, A. The combined use of visual and acoustic data collection techniques for winter killer whale (Orcinus orca) observations. Glob. Ecol. Conserv. 8, 24–30 (2016).
Myers, H. J., Olsen, D. W., Matkin, C. O., Horstmann, L. A. & Konar, B. Passive acoustic monitoring of killer whales (Orcinus orca) reveals year-round distribution and residency patterns in the Gulf of Alaska. Sci. Rep. 11, 20284 (2021).
Pilkington, J. F. et al. Patterns of winter occurrence of three sympatric killer whale populations off eastern Vancouver Island, Canada, based on passive acoustic monitoring. Front. Mar. Sci. 10 (2023).
Rice, A. et al. Spatial and temporal occurrence of killer whale ecotypes off the outer coast of Washington State, USA. Mar. Ecol. Prog. Ser. 572, 255–268 (2017).
Riera, A., Pilkington, J. F., Ford, J. K. B., Stredulinsky, E. H. & Chapman, N. R. Passive acoustic monitoring off Vancouver Island reveals extensive use by at-risk Resident killer whale (Orcinus orca) populations. Endanger. Species Res. 39, 221–234 (2019).
Trounce, K. et al. The effects of vessel slowdowns on foraging habitat of the southern resident killer whales. Proc. Meet. Acoust. 37, 070009 (2020).
Gillespie, D., Caillat, M., Gordon, J. & White, P. Automatic detection and classification of odontocete whistlesa. J. Acoust. Soc. Am. 134, 2427–2437 (2013).
Helble, T. A., Ierley, G. R., D’Spain, G. L., Roch, M. A. & Hildebrand, J. A. A generalized power-law detection algorithm for humpback whale vocalizations. J. Acoust. Soc. Am. 131, 2682–2699 (2012).
Bergler, C. et al. ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning. Sci. Rep. 9, 10997 (2019).
Kirsebom, O. S. et al. MERIDIAN open-source software for deep learning-based acoustic data analysis. J. Acoust. Soc. Am. 151, A27 (2022).
Ford, J. K. A Catalogue of Underwater Calls Produced by Killer Whales (Orcinus Orca) in British Columbia. DFO Can. Data Rep. Fish. Aquat. Sci. 633:165 p (1987).
Janik, V. M. Chapter 4 Acoustic Communication in Delphinids. in Advances in the Study of Behavior vol. 40 123–157 (Academic Press, 2009).
Shiu, Y. et al. Deep neural networks for automated detection of marine mammal species. Sci. Rep. 10, 607 (2020).
Au, W. W. L., Ford, J. K. B., Horne, J. K. & Allman, K. A. N. Echolocation signals of free-ranging killer whales (Orcinus orca) and modeling of foraging for chinook salmon (Oncorhynchus tshawytscha). J. Acoust. Soc. Am. 115, 901–909 (2004).
Barrett-lennard, L. G., Ford, J. K. B. & Heise, K. A. The mixed blessing of echolocation: differences in sonar use by fish-eating and mammal-eating killer whales. Anim. Behav. 51, 553–565 (1996).
Aguilar de Soto, N. et al. No shallow talk: Cryptic strategy in the vocal communication of Blainville’s beaked whales. Mar. Mammal Sci. 28, E75–E92 (2012).
Riesch, R., Ford, J. K. B. & Thomsen, F. Whistle sequences in wild killer whales (Orcinus orca). J. Acoust. Soc. Am. 124, 1822–1829 (2008).
Souhaut, M. & Shields, M. W. Stereotyped whistles in southern resident killer whales. PeerJ 9, e12085 (2021).
Thomsen, F., Franck, D. & Ford, J. K. B. Characteristics of whistles from the acoustic repertoire of resident killer whales (Orcinus orca) off Vancouver Island, British Columbia. J. Acoust. Soc. Am. 109, 1240–1246 (2001).
Miller, P. J. O. Diversity in sound pressure levels and estimated active space of resident killer whale vocalizations. J. Comp. Physiol. A 192, 449–459 (2006).
Rehn, N., Teichert, S. & Thomsen, F. Structural and temporal emission patterns of variable pulsed calls in free-ranging killer whales (Orcinus orca). Behaviour 1, 307–329 (2007).
Deecke, V. B., Barrett-Lennard, L. G., Spong, P. & Ford, J. K. B. The structure of stereotyped calls reflects kinship and social affiliation in resident killer whales (Orcinus orca). Naturwissenschaften 97, 513–518 (2010).
Ford, J. K. B. Vocal traditions among resident killer whales (Orcinus orca) in coastal waters of British Columbia. Can. J. Zool. 69, 1454–1483 (1991).
Yurk, H., Barrett-Lennard, L., Ford, J. K. B. & Matkin, C. O. Cultural transmission within maternal lineages: vocal clans in resident killer whales in southern Alaska. Anim. Behav. 63, 1103–1119 (2002).
Filatova, O. A. et al. Killer whale call frequency is similar across the oceans, but varies across sympatric ecotypes. J. Acoust. Soc. Am. 138, 251–257 (2015).
Foote, A. D. & Nystuen, J. A. Variation in call pitch among killer whale ecotypes. J. Acoust. Soc. Am. 123, 1747–1752 (2008).
Madrigal, B. C., Crance, J. L., Berchok, C. L. & Stimpert, A. K. Call repertoire and inferred ecotype presence of killer whales (Orcinus orca) recorded in the southeastern Chukchi Sea. J. Acoust. Soc. Am. 150, 145–158 (2021).
Gudivada, V. N., Apon, A. & Ding, J. Data Quality Considerations for Big Data and Machine Learning: Going Beyond Data Cleaning and Transformations. (2017).
Priestley, M., O’donnell, F. & Simperl, E. A Survey of Data Quality Requirements That Matter in ML Development Pipelines. J Data Inf. Qual. 15, 11:1–11:39 (2023).
Brookes, K. L., Bailey, H. & Thompson, P. M. Predictions from harbor porpoise habitat association models are confirmed by long-term passive acoustic monitoringa. J. Acoust. Soc. Am. 134, 2523–2533 (2013).
Kotila, M. et al. Large-scale long-term passive-acoustic monitoring reveals spatio-temporal activity patterns of boreal bats. Ecography 2023, e06617 (2023).
Parijs, S. M. V. et al. Management and research applications of real-time and archival passive acoustic sensors over varying temporal and spatial scales. Mar. Ecol. Prog. Ser. 395, 21–36 (2009).
Palmer, K. & Joy, R. DCLDE 2026: Killer whale (Orcinus orca) ecotype and other species annotations for the Detection Classification Localization and Density Estimate (DCLDE)conference in 2026. NOAA National Centers for Environmental Information https://doi.org/10.25921/15EY-MH50 (2025).
Gillespie, D. et al. PAMGUARD: Semiautomated, open source software for real‐time acoustic detection and localization of cetaceans. J. Acoust. Soc. Am. 125, 2547 (2009).
Biffard, B., Morgan, M., Muzi, L., Dakin, T. & Buren, P. V. An Integrated Hydrophone Calibration System for Ocean Observing: ONC HydroCal. in OCEANS 2022, Hampton Roads 1–5, https://doi.org/10.1109/OCEANS47191.2022.9976955 (2022).
Wiggins, S. M. & Hildebrand, J. A. High-frequency Acoustic Recording Package (HARP) for broad-band, long-term marine mammal monitoring. in 2007 symposium on underwater technology and workshop on scientific use of submarine cables and related technologies 551–557 (IEEE, 2007).
Leu, A. A., Hildebrand, J. A., Rice, A., Baumann-Pickering, S. & Frasier, K. E. Echolocation click discrimination for three killer whale ecotypes in the Northeastern Pacific. J. Acoust. Soc. Am. 151, 3197–3206 (2022).
Deecke, V. B., Ford, J. K. B. & Slater, P. J. B. The vocal behaviour of mammal-eating killer whales: communicating with costly calls. Anim. Behav. 69, 395–405 (2005).
Saulitis, E. L., Matkin, C. O. & Fay, F. H. Vocal repertoire and acoustic behavior of the isolated AT1 killer whale subpopulation in southern Alaska. Can. J. Zool. 83, 1015–1029 (2005).
Leroy, E. C., Thomisch, K., Royer, J.-Y., Boebel, O. & Van Opzeeland, I. On the reliability of acoustic annotations and automatic detections of Antarctic blue whale calls under different acoustic conditions. J. Acoust. Soc. Am. 144, 740–754 (2018).
Martín-Morató, I. & Mesaros, A. What is the ground truth? Reliability of multi-annotator data for audio tagging. in 2021 29th European Signal Processing Conference (EUSIPCO) 76–80. https://doi.org/10.23919/EUSIPCO54536.2021.9616087 (2021).
Nguyen Hong Duc, P. et al. Assessing inter-annotator agreement from collaborative annotation campaign in marine bioacoustics. Ecol. Inform. 61, 101185 (2021).
van Osta, J. M., Dreis, B., Meyer, E., Grogan, L. F. & Castley, J. G. An active learning framework and assessment of inter-annotator agreement facilitate automated recogniser development for vocalisations of a rare species, the southern black-throated finch (Poephila cincta cincta). Ecol. Inform. 77, 102233 (2023).
Ocean Networks Canada Society. Upper Slope South Hydrophone Deployed 2013-05-11. Ocean Networks Canada Society https://doi.org/10.34943/D644336D-EB3E-4BF0-B2EF-0CDF3D8BD0DB (2013).
Ocean Networks Canada Society. Upper Slope South Hydrophone Deployed 2014-05-07. Ocean Networks Canada Society https://doi.org/10.34943/7BEF925C-DE7E-4E31-80D6-A78C71F9AEC5 (2014).
Ocean Networks Canada Society. Upper Slope South Hydrophone Deployed 2014-05-03. Ocean Networks Canada Society https://doi.org/10.34943/E03FD4FB-3029-4A40-9174-0BD3E4D99276 (2014).
Acknowledgements
Funding has been provided by the Canada Nature Fund for Aquatic Species at Risk to whom we are grateful. We are also grateful to Tom Denton who has facilitated storage on Google cloud as well as Charles Anderson and Carrie Wall-Bell of the National Centers for Environmental Data who have helped format these data for long term storage.
Author information
Authors and Affiliations
Contributions
K.J. Palmer collated the final annotations dataset, managed data sharing agreements, produced the collated annotation files, drafted, edited and reviewed the manuscript and annotated the SCRIPPS dataset for the presence of killer whales and humpback whales. Emma Cummings and Alex Harris were expert annotators for the JASCO and Vancouver Port Authority datasets. Kait Frasier provided data and annotations from Scripps Institution of Oceanography and participated in editing the manuscript. Mike Dowd, Fabio Frazao and Bruno Padovese contributed to the HALLO annotation protocol and participated in data curation throughout the process. April Houweling and Jennifer Wladichuk provided the JASCO data and were expert annotators. Additionally, they participated in writing and editing the manuscript. Jasper Kanes provided data from Ocean Networks Canada and served as expert annotator for these data sets. Oliver S. Kirsebom provided editorial feedback and was involved in the initial inception of the project. Holger Klink was involved in the project inception provided financial support and editorial input. Holly Leblond, Lucy Quayle, Caitlin O’Neill, Svein Vagle, and Harald Yurk provided data from the DFO Whale and Dolphin Detection and Localization group. Lucy Quayle served as the expert annotater. Lauren Laturnus served as an expert annotator on the SIMRES data, provided editorial feedback to the manuscript and created Fig. 1. Olivia Murphy served as an expert annotator on the SIMRES data. Hannah Myers, Craig Matkin, and Dan Olsen provided the University of Alaska Fairbanks data. Additionally, Hannah Meyers served as expert annotator on the University of Alaska Fairbanks data and provided editorial feedback for the manuscript. J. Pilkington provided data and annotations from the Department of Fisheries and Oceans Cetacean Research Program, participated in writing and editing the manuscript. Amalis Riera Vuibert served as an expert annotator on the SMRU data. Krista Trounce provided data from the Vancouver Fraser Port Authority and provided editorial feedback on the manuscript. Scott and Val Veirs provided data from Orcasound hydrophones, served as expert annotaters and provided critical feedback to the manuscript. Jason Wood provided data from SMRU Consulting and provided editorial feedback on the manuscript. Ruth Joy provided financial support of the project, critical feedback at all levels, and assisted in the writing process.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Palmer, K.J., Cummings, E., Dowd, M.G. et al. A Public Dataset of Annotated Orcinus orca Acoustic Signals for Detection and Ecotype Classification. Sci Data 12, 1137 (2025). https://doi.org/10.1038/s41597-025-05281-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-05281-5