Abstract
Passive acoustic monitoring is essential for assessing the impact of anthropogenic noise on marine ecosystems and detecting vocalizing marine life. While acoustic event recorders are widely used to record odontocete echolocation due to their low power and memory demands, conventional detection algorithms are often unsuitable for analyzing datasets composed of complex pulse events. Here, we developed a hybrid analytical framework combining a rule-based filter with a random forest model to efficiently detect narrow-ridged finless porpoise (Neophocaena asiaeorientalis) click trains and vessel noise events using data from the pulse event recorder. The rule-based filter effectively reduced noise from raw data, achieving detection accuracy of almost 100% for click trains and 94% for vessel noise. However, among the events detected by this filter, 45% and 81% were actually false positives. The machine learning model improved classification accuracy to 97% and 99%, respectively. This model reduced the high false positive rates to 2.8% and 0.1%. This combined method offers a robust and efficient approach to processing pulse event recorder data, specifically for A-tag. It reduces manual workload, improves detection accuracy, and facilitates rapid assessment of vessel noise impacts, thereby supporting long-term ecological monitoring of small cetacean populations in diverse and noisy marine environments.
Similar content being viewed by others
Introduction
The underwater environment contains various acoustic sources, commonly categorized as geophony (e.g., earthquakes and rainfall), biophony (e.g., vocalization of whales and fish), and anthrophony (e.g., vessel noise and pile driving sound)1,2. Analysis of underwater soundscape data enables biodiversity assessment and enhances understanding of species occurrence patterns through the detection of acoustic signals from cetaceans and soniferous fish3,4,5,6,7. Acoustic monitoring and analytical techniques are also increasingly used to evaluate the impact of anthropogenic sounds, such as vessel noise, on marine life8,9,10,11,12,13. Therefore, efficient and accurate methods for detecting these acoustic sources and assessing their ecological impacts are needed.
Passive acoustic monitoring is a powerful tool for detecting and measuring underwater acoustic signals5,14. Autonomous acoustic recorders enable continuous monitoring until battery or memory capacity is exhausted. Echolocation click trains produced by small odontocetes contain ultrasonic components up to 150 kHz12, necessitating high sampling frequencies, which pose challenges for long-term monitoring15. To address this issue, ultrasonic pulse event recorders such as F-POD16,17,18 and A-tag15 have been developed. The A-tag, specifically designed for long-term recording of odontocete click trains, has been widely adopted in odontocete monitoring19,20. Pulse event recorders require less memory capacity since they store the intensity, event time, and associated parameters such as band intensities without recording the entire waveform. Power spectrum characteristics of click trains show only limited differences among species12. Off- or on-axis effects significantly change the received spectrum shape21. Therefore, recording the full waveform is not essential for detecting small odontocetes. By omitting waveform recording, the A-tag achieves low power consumption and efficient memory storage.
Besides these benefits, a major challenge in analyzing pulse event recorder data is that conventional detection and identification algorithms are not applicable. To address this, Kimura et al.20 developed a rule-based filter to extract click trains from the A-tag data for Yangtze finless porpoise (Neophocaena asiaeorientalis asiaeorientalis) under relatively silent river conditions. The rule-based filter refers to set of analysis codes that apply multiple criteria to time-series pulse event data to detect clusters of pulse corresponding to target events. However, many dolphin and porpoise species inhabit noisy marine environments due to biological sources, such as snapping shrimp.
To overcome these challenges associated with noisy marine environments, we developed a hybrid approach combining a rule-based filter and a machine learning model to extract target acoustic events and minimize false alarms in the pulse event recording data. We applied this method to identify click trains of narrow-ridged finless porpoise (Neophocaena asiaeorientalis) and vessel noise events in Japanese coastal waters, where intense biological and anthropogenic noise often masks target signals. Manual detection in such environments is often time-consuming. However, our approach enables efficient and automated detection, offering a practical solution for long-term monitoring in acoustically complex environments.
Materials and methods
Monitoring sites and periods
Data for developing the rule-based filter and machine learning model were collected in Mikawa Bay, Japan, while additional data were obtained from the Seto Inland Sea, Japan, to test the effectiveness of the developed algorithm (Fig. 1). The data were intermittently collected from October 2013 to December 2023 in Mikawa Bay (Table S1). Test data were collected from July 2021 to December 2023 in the Seto Inland Sea, a geographically separated region from Mikawa Bay (Fig. 1, Table S1). This area hosts a distinct finless porpoise population compared to that in Mikawa Bay22,23,24 and exhibits different background noise levels25. Incorporating such variation into the test data enables evaluation of the model’s robustness to minor differences in click train characteristics across sites, populations, and acoustic backgrounds25.
Locations of two data collection areas. The data for training and validation of machine learning model were recorded in Mikawa Bay. The data for testing effectiveness of the developed model were recorded in the Seto Inland Sea. The area enclosed by the rectangle indicates the recorded area. Maps were created in QGIS 3.40.626 with administrative boundaries from GADM27 and coastline data from Natural Earth Data (https://www.naturalearthdata.com/).
All methods were non-invasive and carried out in accordance with relevant guidelines and regulations. The Kyoto University Animal Experiments Committee approved experiments (Inf-K15003, Inf-K16002, Inf-K17004, Inf-K18004, Inf-K19004, Inf-K20010, Inf-K21008, 1–202202, 1–202302). This study is reported in accordance with the ARRIVE guidelines.
Data acquisition instruments
We used A-tags15 (MMT, Saitama, Japan) as pulse event recorders designed to monitor high-frequency underwater acoustic signals. Each A-tag comprises a stereo hydrophone system (hereafter, hydrophones A and B), preamplifier with bandpass filter, CPU, flash memory, and two alkaline batteries. Hydrophone A and B have peak sensitivities at 70 and 130 kHz, respectively28,29 (Figure S1). Two configurations were used: T-type with horizontally arranged hydrophones spaced 590 mm apart, and an I-type with vertically aligned hydrophones spaced 190 mm apart (Figure S1). Since no substantial difference was found in detecting target pulse events between these configurations, the same analysis procedures were applied to all data. The A-tag detects ultrasonic pulse events within the 55–235 kHz band, and records the time of detection of each pulse, the received sound pressure levels (SPL) at the two hydrophones, and the time of arrival difference between the two hydrophones only when the received pressure exceeded a predefined amplitude threshold. Unlike other event recorders, which classify detected signals onboard and store only the processed results16,17,18, the A-tag preserves raw pulse event parameters without onboard classification. Pulse event was stored with a minimum time resolution of 0.5 ms, with the detection threshold set at 139 dB re 1 µPa. The time difference of arrival between the two hydrophones was measured at a resolution of 0.25 µs and stored in association with the time of detection (0.5 ms resolution) and received SPL of both hydrophones. The SPL ratio (SPLR) between hydrophones A and B was used to infer the spectral characteristics of incoming pulse events, such as the relative proportions of high- and low-frequency components, which are useful for distinguishing between Phocoenidae and Delphinidae families28,29. The A-tag was deployed by suspending it from a buoy tethered by a rope, maintaining hydrophone A at a depth of 3 m.
Target pulse events for detection
This study aimed to detect acoustic events of click trains produced by the narrow-ridged finless porpoise. Click trains recorded by the A-tag exhibit smooth changes in SPL and pulse interval 30,31 (Fig. 2a). Unlike many delphinid species, finless porpoise produces only click trains with high frequencies ranging from 100 to 150 kHz. Their click trains were classified into two types: regular clicks, which are used for echolocation, and buzzes. Buzzes, characterized by pulse intervals of ≤ 10 ms, are typically produced during close-range prey approaches32.
Examples of (a) finless porpoise click trains and (b) vessel noise recorded by the A-tag. The x-axis shows the recording time. The y-axes show the received sound pressure level (relative to Pa) at hydrophone A (SPL), the sound pressure level ratio between hydrophones A and B (SPL ratio A/B), the time difference (µs) between hydrophones A and B that can be converted to relative azimuth, and the pulse interval (ms). A positive value for the time difference indicates that the signal arrived at hydrophone A earlier than at hydrophone B.
This study also targeted high-frequency vessel noise that falls within the auditory sensitivity range of the finless porpoise33,34, as assessing noise impacts requires capturing sounds that porpoises can actually perceive. The A-tag detects pulse events only in the 55–235 kHz range, encompassing the auditory sensitivity peak (70-80 kHz) of finless porpoise33,34, and its therefore suitable for assessing high-frequency vessel noise within their ultrasonic acoustic environment. The detected vessel noise is characterized by irregular pulse intervals, SPLs, and time differences of arrival between the two hydrophones, and typically exhibits a prolonged duration35 (Fig. 2b). Such high-frequency noise, typically generated by small, high-speed vessels not equipped with AIS due to cavitation or other mechanisms associated with propellers and engines36,37, is presumed to originate from vessels passing in close proximity to the A-tag, as high-frequency sound attenuates rapidly and does not propagate over long distance.
Overview of the development of a rule-based filter and machine learning model
This section outlines the development of the rule-based filter and machine learning model for detecting and classifying click trains and vessel noise. These two signal types differ markedly in their acoustic characteristics (Fig. 2). Due to these differences, each signal type was processed using a separate rule-based filter and a dedicated machine learning model (Fig. 3).
As a preprocessing step, separate rule-based filter was applied to raw data to eliminate irrelevant noise and extract pulse events likely to be either click trains or vessel noise (Fig. 3). The extracted pulse events were manually reviewed to evaluate detection performance.
For each detected pulse events, feature values were computed for training the corresponding machine learning model. Click-train-like events were labeled as regular clicks, buzzes, or noise, while vessel-noise-like events were labeled as vessel noise or non-vessel noise. Using these labeled events and their feature values, two machine learning models were trained: one to classify click train-related signals (regular clicks, buzzes, and noise) and the other to classify vessel noise events (vessel noise and non-vessel noise). The classification performance of each machine learning model was then evaluated using 30% of the manually labeled dataset, which was held out as validation data.
Development of a rule-based filter
The rule-based filter for detecting click trains and vessel noise was developed using Igor Pro 64 8.04 (WaveMetrics, Portland, OR, USA). This filter was based on the detection criteria described by Kimura et al.20, originally developed to extract Yangtze finless porpoise from stationary A-tag recording. In that study20, the criteria included the following: a passive SPL threshold (≥ 140.4 dB re 1 μPa), a minimum pulse interval (≥ 2.0 ms), a maximum pulse interval (≤ 100 ms), at least six pulses per click train, and coefficient of variation of pulse interval ≤ 0.4. Of these criteria, this study adopted the following criteria for the rule-based filter, excluding the detection threshold, which was predefined by the A-tag’s recording setting: minimum pulse interval (≥ 2.0 ms), a maximum pulse interval (≤ 100 ms), a minimum of six pulses per click train, and coefficient of variation of pulse interval ≤ 0.4 (Table 1). Because click trains produced by Phocoenidae, including finless porpoises, typically exhibit an SPLR greated than 0.628,29, a threshold of 0.6 was adopted for the rule-based filter (Table 1). Additional criteria were initially defined based on empirical knowledge. These criteria were subsequently refined by comparing filter outputs with manual annotations of the raw data, to improve click train detection. These criteria (Table 1) included the following: a minimum duration of click train (≥ 12 ms), a maximum standard deviation of arrival-time differences between hydrophones A and B (< 25 µs), a maximum coefficient of variation (standard deviation/mean) of received SPL at hydrophone A (≤ 100), and maximum median pulse interval within a click train (< 100 ms). Pulse events satisfying all nine criteria were selected as candidate finless porpoise click trains.
On the other hand, the rule-based filter for detecting vessel noise was developed using pulse intervals, number of consecutive pulses, and minimum continuous duration (Table 2), based on the typical acoustic characteristics of vessel noise. These criteria were optimized by comparing filter outputs with manual annotations of the raw data, resulting in final settings of pulse intervals shorter than 500 ms within pulse events, more than 80 consecutive pulses within a pulse event, and a minimum continuous duration of a pulse event of ≥ 10 s (Table 2).
The detection accuracy of the established rule-based filter was calculated using manually validated datasets. The raw time-series data recorded by A-tags were plotted using Igor Pro, manually annotated, and subsequently analyzed using the rule-based filter to evaluate the number of target pulse events (click trains or vessel noise) successfully detected within the validation dataset (Fig. 3). Detection rates were defined as the proportion of click trains or vessel noise events correctly identified by the rule-based filter relative to the total number manually confirmed in the raw data. Additionally, the total number of candidate events detected by the rule-based filter was also quantified. The accuracy of the rule-based filter for detecting click trains was evaluated based on a verified dataset totaling 36 h (Table S1). For vessel noise detections, a separate dataset was used due to the relatively low occurrence of such events, totaling 408 h (Table S1).
Preparation of training and validation datasets for a machine learning model
Click trains and vessel noise events detected by the rule-based filter were characterized by 18 and 17 feature values, respectively. In addition to common acoustic parameters such as the number of pulses, duration, pulse intervals, and SPLs, temporal features represented by “Start” and “End” timestamps were also included (Table 3). These feature values were selected not only to capture seasonal and diel patterns, but also to incorporate empirical observations, such as that a pulse event is more likely to originate from finless porpoises if porpoise clicks have been detected immediately beforehand. Click trains were classified into regular clicks and buzzes following definitions established in previous studies, where a buzz was specifically defined as a sequence containing five or more consecutive pulses with intervals ≤ 10 ms38,39. To facilitate this distinction, a binary feature value called “BuzzCheck” was implemented to return 1 when the click train satisfied the definition of buzzes and 0 otherwise. The same set of feature values used for click trains was applied to vessel noise, excluding “BuzzCheck,” which was irrelevant for vessel noise.
A subset of the training data was labeled for developing and evaluating machine learning models. After applying the rule-based filter, pulse events detected as click trains were manually classified and labeled into three groups: regular clicks, buzzes, and noise. In contrast, pulse events detected as vessel noise were manually classified and labeled into two groups: vessel noise and non-vessel noise. The labeling of click trains was based on a dataset totaling 72 hours, while labeling of vessel noise events was based on a dataset totaling 720 hours (Table S1).
Development of a machine learning model
The machine learning model was developed using the random forest algorithm implemented in the scikit-learn toolbox in Python 3.9.740,41. The random forest is a machine learning algorithm based on ensemble learning, in which multiple decision trees are combined to improve prediction accuracy42. Random forests are commonly used for classification tasks due to their robustness and high accuracy, particularly in classifying small cetacean vocalizations43,44. Training and validation were conducted using 70% and 30% of the labeled pulse event data, respectively (Fig. 3). Hyperparameters were optimized for each model, with a maximum tree depth of 30, a minimum sample split of 7, and a total of 100 estimators.
Performance evaluation
The performance of the machine learning model was evaluated based on five metrics: accuracy (Eq. 1), precision (Eq. 2), recall (Eq. 3), F1-score (Eq. 4), and false positive rate (FPR) (Eq. 5). These metrics were calculated based on four values: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). TP represents instances correctly predicted as positive, TN denotes instances correctly predicted as negative, FP refers to instances incorrectly predicted as positive, and FN refers to instances incorrectly predicted as negative. In this evaluation, regular clicks and buzzes were combined into a single category of click trains, based on definitions operationalized using the “BuzzCheck” feature. The binary feature enabled deterministic separation between regular clicks and buzzes, with no ambiguity or overlap. The performance of the machine learning model was evaluated separately for detecting click trains and vessel noise.
Accuracy indicates the overall ability of the model to correctly identify both target and non-target events. Accuracy was calculated using Eq. (1):
Precision indicates the proportion of instances predicted as target events that were correctly classified. Precision was calculated using Eq. (2):
Recall indicates the proportion of actual target events that were correctly identified by the model. Recall was calculated using Eq. (3):
F1-score indicates the harmonic mean of precision and recall and serves as a comprehensive metric for evaluating the balance between these two measures. F1-score was calculated using Eq. (4):
The FPR indicates the proportion of non-target events that were incorrectly classified as target events. The FPR was calculated using Eq. (5):
Finally, to evaluate the time savings achieved by applying the developed rule-based filter and machine learning model, approximately 395 h of A-tag data were analyzed, and the total time required for manual detection was compared with that required when using the combined rule-based filter and machine learning approach.
Validation of the developed algorithm on test data
To evaluate the generalizability of the rule-based filter and machine learning model developed using A-tag data recorded in Mikawa Bay, we evaluated them using a test dataset recorded in the Seto Inland Sea, separately assessing performance for click trains and vessel noise (Fig. 1). The performance of the rule-based filter was evaluated by comparing the number of click train and vessel noise events detected manually with those detected by the filter in the test datasets. Detection accuracy was calculated as the percentage of filter-based detections relative to manual detections. In addition, the total number of pulse events identified by the filter as candidate click trains or vessel noise was also counted. For the machine learning model, classification was performed on pulse events detected by the rule-based filter. The classification results were then compared with manual verification results to determine the number of TP, FP, TN, and FN instances. Based on these values, standard performance metrics—including accuracy, recall, precision, F1-score, and FPR—were calculated according to their respective equations. A total of 16 h of data were used for click train analysis, and 456 h for vessel noise analysis.
Results
Performance of the rule-based filters
Manual verification of the 36 h dataset from validated dataset identified 4,235 click trains. When the same dataset was analyzed using the rule-based filter, 7,734 pulse events were detected, of which 4,247 were classified as click trains. The rule-based filter correctly detected almost 100% of the click trains identified by manual verification. However, 45% of the detected events were FP.
Similarly, manual verification of a 408 h dataset identified 532 vessel noise events. The rule-based filter detected 2,695 pulse events in this dataset, including 500 true vessel noise events. Thus, the detection rate was 94%, but 81% of the detected events were FP.
Performance of machine learning model
The machine learning model for classifying pulse events into regular clicks, buzzes, and noise was trained and validated using a 72 h dataset comprising 4,868 pulse events. Manual verification identified 1,271 regular clicks, 319 buzzes, and 3,278 noise events. The model achieved an accuracy of 97%, precision of 94%, recall of 96%, F1-score of 95%, and FPR of 2.8% (Table 4). Among the 18 feature values, AvSPLR was the most important, followed by SdPi, BuzzCheck, and Start (Fig. 4a).
Relative importance of each feature used in the machine learning model for classifying pulse events into (a) regular clicks, buzzes, and noise, and (b) vessel noise and nnon-vessel noise events. Feature abbreviations are defined in Table 3.
The machine learning model for classifying pulse events into vessel noise and noise was trained and validated using a 408-h dataset comprising 3,353 pulse events. Manual verification identified 201 vessel noise and 3,152 other noise events. The model achieved an accuracy of 99%, precision of 99%, recall of 88%, F1-score of 93%, and FPR of 0.1% (Table 4). The most important feature values were AvSPLR, AvSPL A, Np, and Sdtd (Fig. 4b).
To evaluate the efficiency of the developed models, we compared the time required to analyze 395 h of A-tag data. Manual inspection took approximately 36 h, whereas the automated method completed the same analysis in about 1.5 min.
Validation of the developed algorithm on test data
The performance of the rule-based filter and machine learning model was evaluated using a test dataset. For click train classification, a 16-h dataset comprising 395 manually verified click trains (351 regular clicks and 44 buzzes) was used. When the rule-based filter was applied to the test dataset, 685 pulse events were detected, including all 395 manually verified click trains, resulting in a detection rate of 100%. However, 42% of the detected events were FP. When these events were subsequently classified by the machine learning model, it achieved an accuracy of 83%, precision of 95%, recall of 75%, F1-score of 84%, and FPR of 5.9% (Table 5).
In contrast, for vessel noise classification, a 456-h dataset was manually verified, identifying 68 vessel noise events. The rule-based filter detected 169 pulse events in this dataset, including 66 true vessel noise events, resulting in a detection rate of 97%. However, 61% of the detected pulse events were FP. When the detected pulse events were classified by the machine learning model, the model achieved an accuracy of 67%, precision of 100%, recall of 15%, F1-score of 26%, and FPR of 0.0% (Table 5). This test dataset contained only 66 true vessel noise events, which likely contributed to the low FPR.
Discussion
The rule-based filter and the machine learning model were separately developed in this study, and a combined approach was proposed for efficiently detecting click trains and vessel noise recorded by A-tags. The rule-based filter achieved high detection rates: almost 100% for click trains and 94% for vessel noise in the training dataset. Furthermore, when applied to the test dataset with different background noise levels25 and finless porpoise populations24, it maintained strong performance, detecting 100% of click trains and 97% of vessel noise events. These results suggest that the rule-based filter functions as a robust preprocessing tool for detecting target events under varying site conditions. The primary advantage of this approach lies in its ability to eliminate large amounts of noise while reliably capturing nearly all target events, thereby enhancing analytical accuracy and computational efficiency. However, a substantial proportion of the pulse events detected by the rule-based filter were FP: 45% for click trains and 81% for vessel noise in the training dataset and 42% and 61% in the test dataset. The large number of high FP was largely attributed to broadband impulsive noise produced by snapping shrimp. These pulses often exhibit similar amplitude to click trains and vessel noise, making them difficult to distinguish using rule-based criteria alone. This residual noise was a result of the intentionally relaxed detection criteria, designed to minimize missed detections of true events. Consequently, FP was tolerated under the assumption that they would be removed during manual verification or by the subsequent machine learning classification step. Therefore, for accurate and efficient detection of click trains and vessel noise, combining the rule-based filter with a machine learning model is essential.
The machine learning model for classifying click trains demonstrated high performance on the training dataset (Table 4). The low FPR of 2.8% indicates that the model effectively distinguishes click trains from noise. Although performance declined moderately when applied to the test data, the model still correctly classified 83% of the events (Table 5). Comparable levels of classification accuracy have been reported in previous cetacean acoustic studies using full waveform data. For example, Zahn et al.45 reported accuracy of 98% for narwhals (Monodon monoceros) and belugas (Delphinapterus leucas), while Griffiths et al.43 reported 97% accuracy for Dall’s porpoises (Phocoenoides dalli) and Kogia spp. However, these studies evaluated performance using only data from the same sites used in model training. In contrast, the present study not only demonstrated similarly high accuracy on the training dataset (Table 4), but also demonstrated the model’s generalizability by testing it on independent data from a different site. This approach provides a clear indication of its robustness for practical application (Table 5). In addition, Song et al.46 achieved 93% accuracy in detecting click trains of the Yangtze finless porpoise using a Hilbert–Huang transform combined with a backpropagation neural network, a result comparable to the present study.
While direct comparisons are limited by methodological differences47,48, the performance of the model in this study is broadly comparable to those of previous approaches using full waveform data. Notably, the A-tag employed here achieved high classification accuracy despite lacking direct frequency information, which is often regarded as a critical feature in such models43,45. The comparison of SPL at two specific frequencies, i.e., SPLR, suggest that it can serve as an effective proxy for frequency information. Phocoenidae produce similar narrow-band high frequency click49, and therefore, SPLR is not expected to differ significantly among species. Accordingly, the method developed in this study is likely to be applicable to other Phocoenidae species with only minor adjustments and fine-tuning. Further research is needed to validate its effectiveness for different target species.
In the machine learning model for classification of click trains, the feature values “Start” and “End” showed relatively high importance. As these parameters represent temporal information, their contribution may reflect seasonal and diel behavioral patterns, as well as temporal autocorrelation in their acoustic activity. For instance, finless porpoises in the Kanmon Strait, located at the western entrance of the Seto Inland Sea, are reported to occur mainly at night19, suggesting that such a nocturnal activity pattern may have been implicitly learned by the model. Although investigating porpoise behavior was not the primary aim, incorporating this information into the design of the rule-based filter and machine learning model can enhance model accuracy, enabling the development of species-specific classifiers informed by expert knowledge.
The decrease in accuracy for classification of click trains when applying the model to the test dataset may be attributed to several factors. Indeed, significant differences in source level, -3 dB bandwidth, click duration, and the number of clicks per click train have been reported between regular clicks recorded in Mikawa Bay and the Seto Inland Sea25. Despite these acoustic differences, the machine learning model maintained high accuracy (Table 5), supporting its applicability to different populations. Moreover, because the A-tag does not record frequencies below 55 kHz, it is less affected by low-frequency background noise, likely resulting in stable detection performance even at sites with varying background noise levels. Thus, the combined use of the rule-based filter and machine learning model is expected to efficiently process large-scale, long-term monitoring datasets.
Vessel noise recorded by the A-tag primarily consisted of ultrasonic components that are audible to finless porpoises34,34,36,37. Our method enabled efficient detection of both click trains and vessel noise events from the same dataset, facilitating assessments of acoustic impacts on finless porpoises. The accuracy of the machine learning model for classifying vessel noise was high for the training dataset (Table 4). However, accuracy declined on the test dataset (Table 5), likely due to site-specific differences in vessel noise characteristics, such as vessel type, size, and speed50,51,52,53,54, and variation in background noise that can influence signal masking. The test dataset included a relatively small number of vessel noise events, which may have limited the robustness of the evaluation. In particular, the vessel noise classification model exhibited perfect precision but very low recall, suggesting that the model may have learned overly strict classification criteria, thereby failing to detect many true vessel noise events. Although vessel type and speed were not visually assessed, the performance drop in classification suggests major site-specific differences in vessel noise characteristics. The training site in Mikawa Bay is near a busy ferry route and fishing port with frequent small-vessel traffic, whereas the Seto Inland Sea test site is used mainly by large vessels and has fewer fishing boats. The low recall observed in the test data may also have resulted from overfitting to the acoustic features specific to the training environment. In particular, the training data included frequent broadband impulsive noise from snapping shrimp, resulting in a higher background noise level than the test data25. This environmental contrast may have led the model to overfit to site-specific acoustic features, resulting in reduced recall on the test dataset. To improve generalization, it is essential to train the model using a more diverse dataset that includes vessel noise recorded under varying conditions, such as different sites, background noise levels, and vessel types. In addition, incrementally updating the model with newly collected data may help develop a more robust machine learning model that can adapt to site-specific differences in acoustic characteristics.
Combining the rule-based filter and machine learning model developed in this study significantly reduced analysis time compared to manual analysis. The combined approach, which first removes noise using the rule-based filter and then applies a machine learning model, achieved over 90% accuracy in detecting click trains and vessel noise on the training dataset. Furthermore, when applied to data from different sites not included in the model development, the method maintained relatively high accuracy for detecting finless porpoise click trains, demonstrating its generalizability. Further research is needed to evaluate the generalizability of the proposed detection method under various background noise conditions and among different populations of finless porpoises. This method was specifically designed for the A-tag. However, the rule-based filter using the pulse intervals, number of clicks in a click train and the SPLR to separate Phocoenidae out of Delphinidae can be applied to other pulse event recorders. The unique feature of A-tag is to separate sound source by the time difference of arrival between two hydrophones. Criteria based on time difference of arrival information provide the rule-based filter with an additional function of source separation. Therefore, although the filter is specifically optimized for the A-tag system, its core framework can be adapted to other devices with suitable modifications. The proposed approach in this study provides a scalable framework for ecological studies of finless porpoises, vessel noise impact assessments, and broader applications in passive acoustic monitoring of other small cetacean species and anthropogenic noise.
Data availability
The rule-based filter and model are currently being prepared for use on the website. The data used is available from the corresponding author upon reasonable request.
References
Bayrakci, G., & Klingelhoefer, F. (Eds.). Noisy Oceans: Monitoring Seismic and Acoustic Signals in the Marine Environmen. (Wiley, 2023)
Duarte, C. M. et al. The soundscape of the Anthropocene ocean. Science 371, 6529 (2021).
Bolgan, M. et al. Use of passive acoustic monitoring to fill knowledge gaps of fish global conservation status. Aquat. Conserv. Marine Freshwater Ecosyst. 33(12), 1580–1589 (2023).
Looby, A. et al. A quantitative inventory of global soniferous fish diversity. Rev, Fish Biol, Fish. 32(2), 581–595 (2022).
Mellinger, D. K. et al. An Overview of fixed passive acoustic observation methods for cetaceans. Oceanography 20, 36–45 (2007).
Pieretti, N. et al. Marine soundscape as an additional biodiversity monitoring tool: A case study from the Adriatic Sea (Mediterranean Sea). Ecolo. Indic. 83, 13–20 (2017).
Pieretti, N. & Danovaro, R. Acoustic indexes for marine biodiversity trends and ecosystem health. Philos. Trans. R. Soc. B 375(1814), 20190447 (2020).
Andrew, R. K., Howe, B. M. & Mercer, J. A. Long-time trends in ship traffic noise for four sites off the North American West Coast. J. Acoust. Soc. Am. 129(2), 642–651 (2011).
Andrew, R. K. et al. Ocean ambient sound: Comparing the 1960s with the 1990s for a receiver off the California coast. Acoust. Res. Lett. Online 3(2), 65–70 (2002).
Erbe, C. et al. The effects of ship noise on marine mammals—A review. Front. Mar. Sci. 6, 606 (2019).
McDonald, M. A., Hildebrand, J. A. & Wiggins, S. M. Increases in deep ocean ambient noise in the Northeast Pacific west of San Nicolas Island. California. J. Acoust. Soc. Am. 120(2), 711–718 (2006).
Richardson, W. J. Marine Mammals and Noise (Academic Press, Cambridge, 2013).
Weilgart, L. S. A brief review of known effects of noise on marine mammals. Int. J. Comp. Psychol. https://doi.org/10.46867/ijcp.2007.20.02.09 (2007).
Cauchy, P. et al. Gliders for passive acoustic monitoring of the oceanic environment. Front. Remote Sens. 4, 1165033 (2023).
Akamatsu, T. et al. New stereo acoustic data logger for free-ranging dolphins and porpoises. Mar. Technol. Soc. J. 39(2), 3–9 (2005).
Cosentino, M. et al. Dolphin and porpoise detections by the F-POD are not independent: Implications for sympatric species monitoring. J. Acoust. Soc. Am. 4(3), 031202 (2024).
Ivanchikova, J. & Tregenza, N. Validation of the F-POD—A fully automated cetacean monitoring system. PLoS ONE 18, e0293402 (2023).
Todd, N. R. E. et al. What the F-POD? Comparing the F-POD and C-POD for monitoring of harbor porpoise (Phocoena phocoena). Ecol. Evol. 13, e10186 (2023).
Akamatsu, T. et al. Evidence of nighttime movement of finless porpoises through Kanmon Strait monitored using a stationary acoustic recording device. Fish. Sci. 74(5), 970–975 (2008).
Kimura, S. et al. Density estimation of Yangtze finless porpoises using passive acoustic sensors and automated click train detection. J. Acoust. Soc. Am. 128(3), 1435–1445 (2010).
Madsen, P. T. & Wahlberg, M. Recording and quantification of ultrasonic echolocation clicks from free-ranging toothed whales. Deep Sea Res. Part I 54, 1421–1444 (2007).
Yoshida, H. et al. Geographic variation in the skull morphology of the finless porpoise Neophocaena phocaenoides in Japan waters. Fish. Sci. 61(4), 555–558 (1995).
Yoshida, H. et al. Population structure of finless porpoises (Neophocaena phocaenoides) in coastal waters of Japan. Raffles Bull. Zool. 50, 35–42 (2002).
Yoshida, H. et al. Population Structure of Finless Porpoises (Neophocaena Phocaenoides) in Coastal Waters of Japan Based on Mitochondrial DNA Sequences. J. Mammal. 82(1), 123–130 (2001).
Ogawa, M. & Kimura, S. S. Variations in echolocation click characteristics of finless porpoise in response to day/night and absence/presence of vessel noise. PLoS ONE 18(8), e0288513 (2023).
QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project. https://qgis.org (accessed 10 Jun 2025).
GADM. Global Administrative Areas Database (Version 4.1). GADM. https://gadm.org (2023).
Kameyama, S. et al. Acoustic discrimination between harbor porpoises and delphinids by using a simple two-band comparison. J. Acoust. Soc. Am. 136(2), 922–929 (2014).
Kimura, S. S. et al. Acoustic identification of the sympatric species Indo-Pacific finless porpoise and Indo-Pacific humpback dolphin: an example from Langkawi, Malaysia. Bioacoustics, 1–17 (2021).
Akamatsu, T. et al. Comparison of echolocation behaviour between coastal and riverine porpoises. Proc. Underwater Technol. Symp., 520–526 (2007).
Kimura, S. et al. Variation in the production rate of biosonar signals in freshwater porpoises. J. Acoust. Soc. Am. 133(5), 3128–3134 (2013).
Akamatsu, T. et al. Scanning sonar of rolling porpoises during prey capture dives. J. Exp. Biol. 213(1), 146–152 (2010).
Popov, V. V. et al. Evoked-potential audiogram of the Yangtze finless porpoise Neophocaena phocaenoides asiaeorientalis (L). J. Acoust. Soc. Am. 117(5), 2728–2731 (2005).
Wang, Z. T. et al. Evoked-potential audiogram variability in a group of wild Yangtze finless porpoises (Neophocaena asiaeorientalis asiaeorientalis). J. Comp. Physiol. 206(4), 527–541 (2020).
Li, S. et al. Widespread passive acoustic detection of Yangtze finless porpoise using miniature stereo acoustic data-loggers: a review. J. Acoust. Soc. Am. 128(3), 1476–1482 (2010).
Hermannsen, L. et al. High frequency components of ship noise in shallow water with a discussion of implications for harbor porpoises (Phocoena phocoena). J. Acoust. Soc. Am. 136(4), 1640–1653 (2014).
Li, S. et al. Mid- to high-frequency noise from high-speed boats and its potential impacts on humpback dolphins. J. Acoust. Soc. Am. 138(2), 942–952 (2015).
Verfuss, U. K. et al. Echolocation by two foraging harbour porpoises (Phocoena phocoena). J. Exp. Biol. 212(6), 823–834 (2009).
Zein, B. et al. Time and tide: Seasonal, diel and tidal rhythms in Wadden Sea Harbour porpoises (Phocoena phocoena). PLoS ONE 14(3), e0213348 (2019).
Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2010).
Varoquaux, G. et al. Scikit-learn: Machine learning without learning the machinery. GetMobile 19(1), 29–33 (2015).
Breiman, L. Random Forests. Mach. Learning 45(1), 5–32 (2001).
Griffiths, E. T. et al. Detection and classification of narrow-band high frequency echolocation clicks from drifting recorders. J. Acoust. Soc. Am. 147(5), 3511 (2020).
Serra, O. M., Martins, F. P. R., & Padovese, L. R. (2019) Automatic detection of estuarine dolphin whistles in spectrogram images. In arXiv preprint, 1909.00425.
Zahn, M. J. et al. Acoustic differentiation and classification of wild belugas and narwhals using echolocation clicks. Sci. Rep. 11(1), 22141 (2021).
Song, H. et al. An automatic identification algorithm of Yangtze finless porpoise. In Proc. IEEE ICSPCC, Ningbo, China (2015).
Branco, P., Torgo, L., & Ribeiro, R. (2015) A survey of predictive modelling under imbalanced distributions. arXiv preprint, 1505.01658.
Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, e0118432 (2015).
Morisaka, T. & Connor, R. C. Predation by killer whales (Orcinus orca) and the evolution of whistle loss and narrow-band high frequency clicks in odontocetes. J. Evo. Bio. 20(4), 1439–1458 (2007).
Arranz, P. et al. Whale-watch vessel noise levels with applications to whale-watching guidelines and conservation. Mar, Policy 134(104776), 104776 (2021).
Erbe, C. Underwater noise of whale-watching boats and potential effects on killer whales (Orcinus Orca) based on an acoustic impact model. Mar. Mamm. Sci. 18(2), 394–418 (2002).
Findlay, C. R. et al. Small reductions in cargo vessel speed substantially reduce noise impacts to marine mammals. Sci. Adv. 9(25), eadf2987 (2023).
Jensen, F. H. et al. Vessel noise effects on delphinid communication. Mar. Ecol. Prog. Ser. 395, 161–175 (2009).
Wladichuk, J. L. et al. Systematic source level measurements of whale watching vessels and other small boats. J. Oce. Tech. 14, 110–126 (2019).
Acknowledgements
We thank Tunemi Suzuki, Haruhiko Suzuki, Kengo Ueda, Tetsuya Kohama, Hirotaka Tajima, Haruka Nakajin, and Takehiro Ikeda for their support in data acquisition. We are also grateful to Dr. Ken Yoda, Dr. Nobuaki Arai, Dr. Hiromichi Mitamura, Dr. Kotaro Ichikawa, Dr. Junichi Takagi, Dr. Hisashi Kashima, Dr. Hiroshi Harada, Dr. Yu Teshima and Dr. Shinsuke Kawagucchi for their valuable advice and encouragement. Our appreciation extends to the members of the Fisheries and Environmental Oceanography Laboratory, Division of Applied Biosciences, Graduate school of Agriculture, at Kyoto University, the members of Distinguished Doctoral Program of Platforms (WISE), Center for Interdisciplinary Graduate Education, Division of Graduate Studies, at Kyoto University, and the members of Underwater Biological Sound Analysis Group, Smart Sensing Technology Development Center, Research Institute for Marine Technology and Engineering, at Japan Agency for Marine-Earth Science and Technology for their cooperation throughout the study. We would like to thank Editage for English language editing. The authors used ChatGPT (OpenAI, 2025) to assist in editing and refining the English. All content are the responsibility of the authors.
Funding
This work was supported by the JST FOREST Program, JPMJFR 2171; JSPS KAKENHI, JP18H06495, JP19K20460, and JP22H05652; SPIRITS 2020 of Kyoto University; the Japan Science Society in a Sasakawa Scientific Research grant, 2021-6010, the JST SPRING Program, JPMJSP2110; the Fujiwara Natural History Foundation; and the Collaborative Research Program of Wildlife Research Center, Kyoto University, 2023-A-29.
Author information
Authors and Affiliations
Contributions
M. I. O. contributed to methodology, software, validation, formal analysis, investigation, data curation, writing—original draft, writing—review & editing, visualization, and funding acquisition. S. S. K. contributed to conceptualization, methodology, software, investigation, resources, data curation, writing—review & editing, supervision, project administration, and funding acquisition. N. I. contributed to methodology, software, investigation, and data curation. T. A. contributed to conceptualization, methodology, software, investigation, writing—review & editing, and supervision. All authors reviewed and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ogawa, M.I., Kimura, S.S., Ishiai, N. et al. A hybrid method combining rule-based filter and machine learning to detect porpoise and vessel sounds from a pulse event recorder. Sci Rep 15, 31211 (2025). https://doi.org/10.1038/s41598-025-16370-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-16370-1






