An open-source deep learning-based toolbox for automated auditory brainstem response analyses (ABRA)

Erra, Abhijeeth; Miller, Cayla M.; Chen, Jeffrey; Chrysostomou, Elena; Barret, Shannon; Kassim, Yasmin M.; Friedman, Rick A.; Lauer, Amanda; Ceriani, Federico; Marcotti, Walter; Carroll, Cody; Manor, Uri

doi:10.1038/s41598-026-38045-1

Download PDF

Article
Open access
Published: 19 February 2026

An open-source deep learning-based toolbox for automated auditory brainstem response analyses (ABRA)

Abhijeeth Erra¹^na1,
Cayla M. Miller²^na1,
Jeffrey Chen¹^na1,
Elena Chrysostomou²,
Shannon Barret²,
Yasmin M. Kassim²,
Rick A. Friedman³,
Amanda Lauer⁴,
Federico Ceriani^5,6,
Walter Marcotti^5,6,
Cody Carroll^1,7 &
…
Uri Manor^2,3,8

Scientific Reports volume 16, Article number: 9855 (2026) Cite this article

1494 Accesses
Metrics details

Subjects

Abstract

Hearing loss is a pervasive global health challenge with profound impacts on communication, cognitive function, and quality of life. Recent studies have established age-related hearing loss as a significant risk factor for dementia, highlighting the importance of hearing loss research. Auditory brainstem responses (ABRs), which are electrophysiological recordings of acoustically evoked synchronized neural activity from the auditory nerve and brainstem, serve as in vivo correlates for sensory hair cell and synaptic function, hearing sensitivity, and other critical readouts of auditory pathway physiology, making them highly valuable for both basic neuroscience and clinical research. Despite the utility of the ABR, traditional ABR analyses rely heavily on subjective manual interpretation, which may introduce variability and pose challenges for reproducibility across studies. Here, we introduce Auditory Brainstem Response Analyzer (ABRA), a novel suite of open-source ABR analysis tools powered by deep learning, which automates and standardizes ABR waveform analysis. ABRA employs convolutional neural networks trained on diverse datasets collected from multiple experimental settings, achieving rapid and unbiased extraction of key ABR metrics, including peak amplitude, latency, and auditory threshold estimates. We demonstrate that ABRA’s deep learning models provide performance comparable to expert human annotators while dramatically reducing analysis time and enhancing reproducibility across datasets from different laboratories. By bridging hearing research, sensory neuroscience, and advanced computational techniques, ABRA facilitates broader interdisciplinary insights into auditory function. An online version of the tool is available for use at no cost at https://abra.ucsd.edu.

Introduction

Hearing loss is a prevalent and debilitating condition affecting hundreds of millions worldwide. In addition to significantly diminishing quality of life, age-related hearing loss has emerged as a major risk factor for cognitive decline and dementia, underscoring the urgent need for better research tools and treatment strategies. Both age-related hearing loss and dementia involve the progressive loss of synapses (synaptopathy) - in the cochlea and brain, respectively - highlighting common neurodegenerative mechanisms that warrant intensive study^{1,2,3,4,5,6,7}.

Among the most powerful approaches to assess auditory function are auditory brainstem response (ABR) recordings, which objectively measure electrical activity along the auditory neural pathway, from cochlear inner hair cells through the brainstem^{8,9,10,11,12,13}. In mice, ABRs consist of five characteristic peaks approximately corresponding to neural signals propagating through sequential auditory structures, though some centrally generated waves may reflect concurrent activity in multiple structures (Fig. 1)^14,15,16,17. Physiological and mathematical models of ABR morphology have also been developed¹⁸.

Two particularly informative measurements obtained from ABRs—hearing threshold sensitivity and wave 1 characteristics—have been shown to correlate strongly with cochlear-based hearing damage; specifically, ABR thresholds and wave 1 amplitudes are affected by changes in cochlear function such as hair cell loss or cochlear synaptopathy^2,19,20,21. However, current ABR analysis methods typically rely on manual waveform interpretation, which can be subjective, labor-intensive, and prone to inconsistency between individual researchers or labs^22,23.

Heuristic and machine learning computational approaches have been explored for automated ABR analysis^24,25. Early methods focused on hand-engineered features (e.g., manually calculating waveform curvature)²⁰ and statistical classifiers, such as support vector machines for threshold detection²⁶ or model-based approaches for identifying near-threshold responses²⁷. Supervised learning models (i.e. models which learn from data with ground truth labels) like convolutional neural networks (CNNs), gradient boosting machines, and others have been used to accurately analyze suprathreshold ABR waveforms^28,29 and to assess the degree of synaptopathy in humans³⁰.

In this paper, we introduce the Auditory Brainstem Response Analyzer (ABRA), a collection of novel open-source tools, including machine learning models trained on a diverse set of mouse data ABR to enable comprehensive and maximally generalizable mouse ABR analysis. Two algorithms are provided to (1) automatically estimate thresholds and (2) detect peaks and quantify latencies. To make these algorithms broadly accessible, we have packaged them into a user-friendly API and accompanying browser-based application that also supports batch data import/export, waveform visualization, and interactive 2D/3D plotting. By integrating these diverse functionalities into a unified platform, ABRA aims to streamline ABR data processing and analysis, reduce manual labor, and facilitate standardization and reproducibility across labs. We demonstrate the flexibility and generalizability of these algorithms by benchmarking the performance on ABR datasets collected from three different hearing research labs using distinct experimental protocols and recording settings.

Methods

Data collection

To test the generalizability and flexibility of the open-source ABR software developed here (ABRA v0.1.1; https://abra.ucsd.edu³¹), we trained and tested the models using three distinct ABR datasets from separate laboratories (Table 1). While the three laboratories used a similar overarching methodology, each used unique experimental protocols, including varying collection software, sound source and stimulus frequencies, and mouse strains, including mouse models of accelerated aging and mice exposed to temporary threshold shift-inducing noise. These differences underscore the flexibility of ABRA in accommodating diverse experimental setups and protocols. Further details on data collection conditions are available in the Supplementary Information (Supplementary Table S1).

Table 1 Details of mouse and individual ABR waveforms used from the three different laboratories (Lab A, lab B, lab C) in the model split into a train and test datasets. Figures and tables relevant to a given dataset are enumerated in brackets in the last row.

Full size table

Ethics and study design

ABR datasets from two contributing laboratories have been previously published (Lab B:³²; Lab C:³³⁾ and were collected in accordance with the ethical approvals described in those studies. Additional recordings from the Manor laboratory (Lab A) were obtained from available ABR data collected under ongoing studies; no new interventions were performed for the purpose of this work. These experiments were approved by the Institutional Animal Care and Use Committees (IACUC) at the Salk Institute for Biological Studies, protocol number 18–00052, and at the University of California, San Diego, protocol number S23058, and were performed in accordance with relevant guidelines and regulations.

For all datasets, the experimental unit was defined as the individual animal. For this study, animals were randomly selected for method development and testing as outlined in Table 1. Sample sizes were determined by data availability rather than statistical considerations, and all recordings with available human-generated ground truth data were included in the analysis. Data were analyzed in an automated manner without knowledge of experimental conditions or prior annotations.

All animal procedures are reported in accordance with the ARRIVE guidelines for the reporting of animal experiments (https://arriveguidelines.org). The focus of this study was development and validation of an automated ABR analysis pipeline rather than testing a biological hypothesis.

ABR preprocessing

To provide consistent data into the downstream models for peak detection and thresholding, ABR waveforms from all labs were preprocessed to have the same sampling frequency and scale. First, each waveform stack (all waveforms for a particular frequency and mouse) was normalized so the minimum and maximum values span from 0 to 1. To place all ABRs on the same time scale, each waveform was provided to the CNN as a vector of length 244, covering 10 ms of recording. All data included in the model was recorded for at least 10 ms; longer recordings were truncated. Higher sampling frequencies were downsampled to 244 points using linear interpolation. While no data in our dataset had lower sampling frequencies, the provided code allows for lower sampling frequencies to be upsampled to 244 points using cubic spline interpolation. For the downsampled waveforms, we computed the power spectrum of the waveform to ensure the results were not affected by aliasing; we found on average 0.02% and at max 0.75% of the power was above the new Nyquist limit, making aliasing effects negligible.

Peak detection

The ABRA toolbox incorporates a two-step peak finding algorithm: briefly, Convolutional Neural Network (CNN) predicts the location of the wave 1 peak, and then a second fine-tuning step improves this prediction and labels the remaining peaks.

The CNN was trained on ABR data from two labs (summarized in Table 1) labeled with ground truth peak 1 annotations. Ground truth labels from Lab A were hand-annotated by expert ABR practitioners; those for Lab B were labeled as previously described³². The training dataset of 8,923 ABRs from 115 mice (summarized in Table 1) was split into training and validation subsets, with 80%, (7,209 ABR) and 20% (1,714 waveforms) in each set, respectively. The loss contributions from each training sample were weighted to ensure that data from both labs were represented equally in the model training.

The CNN optimized squared error loss (L²) for the regression task which returns a prediction for the wave 1 peak index. The model architecture and training hyperparameters were determined by a randomized search over convolutional filter sizes, kernel sizes, dropout rates, pooling strategies, learning rates, weight decay values, and early stopping criteria. The validation set was evaluated at each training epoch, and if the validation loss did not decrease over 25 consecutive epochs, training was stopped to prevent overfitting. With this criterion, training stopped after 62 epochs. A simplified representation of the network architecture is shown in Fig. 2. The model hyperparameters chosen by cross-validation are displayed in Supplementary Table S2.

The CNN’s prediction of the wave 1 peak index was often close to, but not exactly at the local peak. An additional fine-tuning step was added to leverage this relatively accurate prediction with greater precision (Supplementary Figure S2). First, the waveform was smoothed using Gaussian smoothing (σ=1) to remove any noise which could cause spurious peak detection. Then the find_peaks method from Scikit-learn was used to identify all candidate peak and trough locations. The parameters for this function were optimized using ground truth wave 1 latency and wave 1 amplitude for the validation set and include the following:

a.
Waveform start point: 0.41 ms (10 sample points) before the CNN-predicted peak 1 location (earlier timepoints are not fed into find_peaks).
b.
Minimum allowed time between candidate peaks: 0.66 ms (16 sample points).
c.
Minimum time between candidate troughs: 0.29 ms (7 sample points).

Of these candidate peaks, our algorithm then identifies whether any are within ± 0.25 ms from the CNN prediction. If multiple peaks were present, the highest valued one was chosen. If no peaks were found within this window, the window was widened by an additional ± 0.25 ms until a peak was found. From the rest of the candidate peaks, the 4 peaks after this peak with the highest amplitudes (measured from 0) were chosen as peaks 2–5. From the candidate troughs, troughs 1–5 were chosen as the first trough that follows a given peak. The amplitudes corresponding to the identified peak and trough indices were quantified from the original (unsmoothed) waveforms.

Supervised threshold estimation

The ABRA threshold estimation method includes two steps: First, a binary classifier labels individual ABR waveforms as either above or below threshold. Then, for a stack of waveforms at a given frequency, the threshold is defined as the lowest sound level predicted as hearing for two consecutive stimuli (e.g., if levels up to 30 dB are predicted as below threshold, 35 dB as above, 40 dB as below, and 45 dB and higher as above, the threshold is defined as 45 dB; Supplementary Figure S3). This method greatly increases generalizability across different datasets since it allows the model to estimate thresholds from inputs containing any number of waveforms. Three candidate supervised binary classifiers were trained and evaluated: A CNN classifier, an XGBoost classifier, and a Logistic Regression classifier.

Logistic Regression was selected as a baseline linear model because it allowed us to test whether simple linear combinations of waveform amplitudes across timepoints could be sufficient to classify the presence of a signal in the ABR. Its coefficients can provide direct information about which timepoints (i.e. regions around canonical peaks and troughs) contribute most strongly to classification, allowing for interpretability. XGBoost was chosen as a more complex model that could capture higher-order interactions between waveform features. It is known for its strong performance on tabular data. Together, these models provide a range of algorithm complexity against which the CNN’s performance can be compared.

These models were evaluated and compared using accuracy, true positive rate (TPR; i.e. the proportion of actual above threshold instances that are correctly identified as above threshold), false positive rate (FPR; i.e. the proportion of actual below threshold instances that are incorrectly classified as above threshold), the area under the receiver operator curve (AUC-ROC) and the area under the precision-recall curve (AUC-PR).

The dataset included 28,636 individual ABR waveforms from 259 mice (Table 1) across varying stimulus frequencies and sound levels. The ABRs were grouped by animal subject, then 80% of these waveform stacks from each lab were randomly allocated for training and the remaining 20% were designated for testing (see Table 1 for ABR counts from each lab). This minimized data leakage and ensured a representative distribution of ABRs from various subjects and labs across the training and testing sets.

Three models were evaluated for threshold estimation: Logistic Regression and XGBoost Classifiers, and a CNN. For the Logistic Regression and XGBoost Classifiers, time warping was applied to the ABR waveforms as an additional preprocessing step to align waveform features such as peaks and troughs (see Supplementary section: ABR Curve Alignment with Time Warping and Supplementary Figure S1). Because these models do not require a validation group, they were trained on the full training and validation set, a matrix with dimensions of 22,642 × 244, where 22,642 is the total number of training samples and 244 the number of voltage readings for each waveform. Unlike CNNs, these models do not learn hierarchical representations and are less able to benefit from augmentation techniques (described below) designed to teach invariances from raw signals.

To improve generalization and avoid overfitting, data augmentation techniques were used to increase the sample space used for training the CNN. This included noise injection, elastic augmentation through cubic spline interpolation, and time shifting. Noise augmentation added normally distributed noise scaled by 4% of the waveform magnitude. Elastic augmentation was applied by perturbing temporal indices with Gaussian noise (σ = 3) and resampling each sequence using cubic spline interpolation to produce smoothly distorted time axes. Time shifting was performed by randomly shifting each sequence up to ± 18 indices (0.73 ms) and padding the empty regions with Gaussian noise centered on the initial value (5% standard deviation). Thus, the final training input matrix for the CNN had dimensions of 36,096 × 244, including the 18,048 original training waveforms, and an augmented copy of each. An additional 4,594 waveforms were used as validation.

As with the peak finding model, the final hyperparameters for the CNN architecture (shown in Fig. 3) were selected by randomized search to optimize model performance. The final selected hyperparameters are shown in Supplementary Table S2. Weighted loss was used to weight the loss contribution from each waveform, balancing the relative amount of data available from each lab.

Results

Peak amplitude and latency estimation

To benchmark the performance of the ABRA peak amplitude and latency estimation, we tested the performance on a set of 2,327 ABRs with human-labeled “ground truth” wave 1 amplitude and latency values from Lab A (80 waveforms from 8 mice) and Lab B (2,247 waveforms from 21 mice). The ground truth values for Lab A data were obtained by using visual examination with the BioSigRZ software from Tucker Davis Technologies (v5.7.6; https://www.tdt.com), while the ground truth values for Lab B data were obtained using a semi-automatic approach using custom software, as described in³². These ground truth annotations cover a wide range of tested frequencies and therefore latencies (Supplementary Figure S4). Though it is possible to make manual adjustments to these predictions, we assess the model here by comparing the automated (i.e. unadjusted) estimates from the ABRA peak finding algorithm vs. their corresponding human-labeled ground truth values.

Table 2 (Related to Fig. 5): table showing the mean error difference and their standard errors between ABRA-detected wave 1 latency and amplitude and corresponding ground truth values detected by human reviewers. Two sample t-tests found that mean error differences were not significant for wave 1 latency nor for wave 1 amplitude estimates at the 95% significance level after bonferroni correction.

Full size table

Errors in the algorithm output (differences between the automated estimate and the human-annotated ground truth values) for the latencies and amplitudes of the wave 1 peak are shown in Fig. 4A and B, respectively; summary statistics for errors are reported in Table 2. Note that since many measurements had zero error, the average error of -0.004 ms is smaller in magnitude than the measurement precision of 0.041 ms (Table 2). While the root mean squared error in the peak latency was 0.092 ms, the mean absolute latency error was only 0.022 ms, or about 0.2% of the total sweep length. The low MAE shows that the prediction errors are generally small, while the over 4-fold greater RMSE is influenced by rare instances of larger error. These often occur in the low signal-to-noise ratio (SNR) regime (Fig. 4C). Randomly sampled examples of erroneous peak 1 predictions (Fig. 5A, B) occur in these low SNR waveforms; however, accurate predictions are also observed across a range of SNRs (Fig. 5C, D). Occasional amplitude errors at moderate amplitudes are often the result of misplaced trough 1 predictions (Fig. 4D, Supplementary Figure S5). The largest errors correspond to a small number of click responses from two animals and reflect ambiguity in trough placement rather than a systematic difference in performance between click and tone stimuli.

ABR classification and threshold estimation results

The performance of the three ABR classifiers for threshold detection was assessed on the testing set of 5,872 ABR waveforms. Performance metrics are shown in Fig. 6, and pairwise comparisons for these metrics are provided in Table 3. Logistic regression was used as a simple and interpretable baseline model for the binary classification task. However, its performance was significantly outperformed by the XGBoost model, which was further outperformed by the CNN across all metrics. The CNN model achieved an AUC-ROC of 0.98 and an AUC-PR of 0.99 (Fig. 6A), suggesting strong overall discrimination at different decision thresholds (AUC-ROC, Fig. 6B), excellent performance in handling class imbalance (AUC-PR), and reflecting robust sensitivity and precision.

Table 3 Comparative analysis of performance metrics between the three machine learning models for ABR threshold classification (related to Fig. 6). Metrics were calculated on the test set of 5,872 ABR waveforms, weighted to represent each lab equally, and compared between the convolutional neural network (CNN), XGBoost (XGB), and logistic regression (LR) models. The CNN model outperforms the XGB model across all metrics. Both CNN and XGB outperform the LR model across all metrics. P-values marked 0* are below the precision limit of 1E-300. Significance was calculated using two-sample proportion tests (Accuracy, TPR, FPR), delong’s test (AUC-ROC), or bootstrap resampling with 1000 iterations (AUC-PR). Standard errors were calculated from theoretical binomial variance (Accuracy, TPR, FPR), delong’s analytical variance (AUC-ROC) or from bootstrap resampling (AUC-PR). A bonferroni correction was applied to the p-values to correct for multiple comparisons. All metrics (Accuracy, AUC-ROC, etc.) and their difference estimates are reported as unitless proportions.

Full size table

To determine how much errors in our model may arise from variability in annotations by different researchers, used as the ground truth for training the model, we quantified the variability between expert annotators. We sampled 90 ABRs (10 mice at 9 frequencies) from Lab B and asked an expert from Lab A to independently determine the thresholds. The difference between the thresholds assigned by Rater 1 (Lab B) and Rater 2 (Lab A) represents the interrater error (Fig. 7A). We then trained the CNN on only data from Labs A and C, so that data from Rater 1 was not included in training. On average, the interrater error was comparable to that of the CNN on the sample of Lab B data (Fig. 7). While the mean absolute errors were similar, we also assessed the accuracy of the CNN predictions within 5 dB and 10 dB envelopes. At both the 5 dB and 10 dB cutoffs, the two were indistinguishable (Table 4). Together with the performance metrics above, this supports that the CNN model can function as a reliable tool for estimating hearing thresholds, providing a machine learning-based approach that matches human expert performance.

Table 4 Inference for differences in accuracy between inter-rater accuracy (IR) and the convolutional neural network (CNN) model in threshold estimation (related to Fig. 7). Within both the 5 and 10 dB SPL envelopes, no significant difference between the CNN baseline and inter-rater accuracy was detected, suggesting CNN is performing at a comparable level as a human reviewer at this precision.²²

Full size table

The performance of our threshold estimation technique was compared against the cross-correlation algorithm embedded in EPL-ABR²² on a separate dataset of ABR waveforms from lab C (Table 5). This smaller set of ABR waveforms (N = 122) was selected because EPL-ABR’s threshold estimation software requires data in the custom ABR file format used by the Eaton Peabody laboratories (EPL). Our CNN method closely matches or outperforms EPL-ABR’s cross-correlation threshold estimation method across all metrics on this dataset.

Table 5 Comparison of the EPL-ABR cross-correlation algorithm and the ABRA CNN-based thresholding algorithm on lab C data (122 ABR waveforms from 2 mice). Reported metrics include Accuracy, True Positive Rate, False Positive Rate, and the ability to estimate thresholds within 5, 10, and 15dB SPL of the ground truth value. Values are presented as mean (± standard error). Standard errors for Accuracy, TPR, and FPR were computed from pooled binomial variance across all waveform classifications, while standard errors for the within-X dB metrics were computed as the sample standard error across the 12 threshold estimates.

Full size table

Time cost analysis

In order to quantify the time savings of using the ABRA thresholding algorithm, a random sample of ABR files from 10 mice at 9 frequencies each for a total of 90 waveform stacks from Lab B was analyzed by two ABR raters from Lab A. It took both raters approximately 1 h to manually analyze the ABR thresholds. Using ABRA, it took about 48 s to output automated thresholds for all frequencies, corresponding to a 75x increase in efficiency. The automated thresholds were within 5 dB of Lab A inspection 90% of the time, within 10 dB 98% of the time, and within 15 dB 100% of the time. For comparison, inter-rater assessment showed that a Lab A annotator was within 5 dB of Lab B annotator’s result 92% of the time, 10 dB 98% of the time, and 15 dB 100% of the time.

Discussion

The deep learning techniques used in ABRA build on prior applications of machine learning to automate analyses of biological time-series data, including electrophysiology. Recent studies have shown that deep learning models such as convolutional neural networks and recurrent architectures can automatically extract meaningful features from complex, noisy biological signals and infer latent structure from high-dimensional recordings^34,35. These efforts automate otherwise laborious and subjective analysis steps, enabling more accurate, reproducible, and time-efficient workflows. The deep learning techniques presented here have similar potential to streamline peak labeling and threshold identification in ABR studies. We further envision future ABR acquisition protocols in which data collection is adaptively guided by such algorithms to avoid unnecessary measurements once thresholds are reached. Together, this work contributes to the broader adoption of deep learning-based automation in electrophysiology analysis pipelines.

While ABRA is a powerful tool set for ABR analysis, like all existing ABR analysis programs, it also has limitations that highlight areas for future development. While the CNN-based predictions of peak location are powerful, they offer little transparency in terms of feature importance and interpretability. Future incorporation of interpretability methods may reveal waveform features that the model leverages to make accurate predictions across a large range of frequencies. Additional inclusion of biologically relevant priors in future versions may further reduce errors in peak finding. As for model limitations, these models were trained and validated only on mouse ABRs; future work could extend or train new deep learning models to handle non-murine ABRs, which may have reduced signal-to-noise due to larger distances from the source generators. Validation of automated amplitude and latency measurements has so far been restricted to wave 1, leaving waves 2–5 currently unvalidated; this can be pursued in future efforts as the model continues to incorporate new data from the labs mentioned here and others.

Another important consideration is the diversity of datasets used for model training: ABRA’s models have been trained on a diverse dataset including accelerated aging mouse models, mice with and without cadherin-23 correction, and two mouse lines exposed to varying noise exposures (Supplementary Table 1). However, this dataset is not exhaustive, and other datasets, especially those involving severe mutations, damage, or disease conditions may be significantly different from the training data. Such conditions, representing “out-of-distribution” cases, may require retraining or fine-tuning of the existing models. To address this challenge, transfer learning methods could be employed, facilitating rapid adaptation of existing deep learning models to new data conditions with minimal additional training data. Most importantly, the accuracy of peak and threshold detection may not yet match that of the most seasoned experts in visual ABR analysis for abnormal, ambiguous, or low signal-to-noise waveforms.

ABRA has been designed to be a multi-purpose and versatile suite of tools and accompanying web app with extended functionality to be able to handle datasets acquired from different mouse strains and experimental settings (Fig. 8; Supplementary Information: The ABRA Graphical User Interface). ABRA’s modular design allows its deep learning models to be used independently from the GUI, facilitating integration into other researchers’ computational workflows and software environments. It includes readers to facilitate processing of datasets recorded in different formats, including the widely used standard .arf files from BioSigRZ and BioSigRP Tucker Davis Technology recordings (v5.7.6; https://www.tdt.com), .tsv/.asc files from EPL’s Cochlear Function Test Suite (v1.0; CFTS software (Eaton-Peabody Laboratories v1.0; https://masseyeandear.org/research/otolaryngology/eaton-peabody-laboratories/engineering-core), or a generalized .csv file format from any number of other systems. ABRA’s automated thresholding method also reduces the time required for thresholding analyses by more than 50x compared to manual analysis and can streamline the process of extracting ABR thresholds from multiple subjects. All results can be exported to a .csv file for post-processing by the experimenter, and plots can be directly exported for publication if desired. While the time saved by automation alone may be a worthwhile tradeoff for certain applications, an additional benefit is the deterministic nature of the model and therefore high reproducibility. Most importantly, we anticipate significant improvements in performance as larger and more diverse datasets are incorporated over time.

In summary, ABRA represents a significant advancement in auditory neuroscience, merging cutting-edge deep learning technology with accessible and intuitive software design. By automating the analysis of auditory brainstem responses, a crucial in vivo indicator of auditory function, ABRA accelerates hearing research while enhancing reproducibility and consistency across diverse experimental settings. The accompanying user-friendly graphical interface, coupled with powerful computational models, makes advanced data processing accessible to researchers from various backgrounds, including neuroscientists, audiologists, and computational biologists. Beyond hearing research, ABRA exemplifies how artificial intelligence can effectively tackle complex biological data analysis, offering valuable insights into sensory neuroscience, neurodegenerative diseases, and beyond. Ultimately, ABRA provides a versatile and scalable solution, poised to facilitate transformative discoveries at the intersection of biology, medicine, and computer science.

Data availability

Data that support the findings of this study are publicly available from the following sources: Manor and Liberman Labs: [https://doi.org/10.5281/zenodo.15626375](https:/doi.org/10.5281/zenodo.15626375) ; Marcotti Lab: [https://doi.org/10.5281/zenodo.15619099](https:/doi.org/10.5281/zenodo.15619099) . Scripts are publicly available in the Github: [https://github.com/ucsdmanorlab/abranalysis](https:/github.com/ucsdmanorlab/abranalysis) .

References

O’Leary, T. P. et al. Reduced acoustic startle response and peripheral hearing loss in the 5xFAD mouse model of alzheimer’s disease. Genes Brain Behav. 16, 554–563 (2017).
Article PubMed Google Scholar
Bramhall, N. F., McMillan, G. P., Kujawa, S. G. & Konrad-Martin, D. Use of non-invasive measures to predict cochlear synapse counts. Hear. Res. 370, 113–119 (2018).
Article PubMed PubMed Central Google Scholar
Loughrey, D. G., Kelly, M. E., Kelley, G. A., Brennan, S. & Lawlor, B. A. Association of age-related hearing loss with cognitive function, cognitive impairment, and dementia: a systematic review and meta-analysis. JAMA Otolaryngology–Head Neck Surg. 144, 115–126 (2018).
Article Google Scholar
Ray, M., Dening, T. & Crosbie, B. Dementia and hearing loss: A narrative review, Maturitas, 128, 64–69, (2019).
Liu, Y. et al. Hearing loss is an early biomarker in APP/PS1 alzheimer’s disease mice. Neurosci. Lett. 717, 134705 (2020).
Article CAS PubMed Google Scholar
Livingston, G. et al. and others, Dementia prevention, intervention, and care: 2024 report of the Lancet standing Commission, The lancet, 404, 572–628, (2024).
Park, J. H. et al. Sullivan and others, Early-onset hearing loss in mouse models of alzheimer’s disease and increased DNA damage in the cochlea. Aging Biology. 1, 20240025 (2024).
Article PubMed PubMed Central Google Scholar
Eggermont, J. J. Auditory brainstem response. Handb. Clin. Neurol. 160, 451–464 (2019).
Article PubMed Google Scholar
Kim, Y. H., Schrode, K. M. & Lauer, A. M. Auditory brainstem response (ABR) measurements in small mammals, In: Developmental, physiological, and Functional Neurobiology of the Inner Ear, Springer, 357–375. (2022).
Burkard, R. & Don, M. The auditory brainstem response (ABR), Translational Perspectives in Auditory Neuroscience: Hearing Across the Life Span–Assessment and Disorders. San Diego, CA: Plural Publishing,161–200, (2012).
Ingham, N. J., Pearson, S. & Steel, K. P. Using the auditory brainstem response (ABR) to determine sensitivity of hearing in mutant mice. Curr. Protocols Mouse Biol.. 1, 279–287 (2011).
Article Google Scholar
Møller, A. R. & Jannetta, P. J. Neural generators of the auditory brainstem response. The Auditory Brainstem Response 13–31, (1985).
Xie, L. et al. The characterization of auditory brainstem response (ABR) waveforms: A study in tree shrews (Tupaia belangeri). J. Otology. 13, 85–91 (2018).
Article Google Scholar
Rüttiger, L., Zimmermann, U. & Knipper, M. Biomarkers for hearing dysfunction: facts and outlook, Orl, 79, 93–111, (2017).
Melcher, J. R. & Kiang, N. Y. S. Generators of the brainstem auditory evoked potential in Cat III: identified cell populations. Hear. Res. 93, 52–71 (1996).
Article CAS PubMed Google Scholar
Henry, K. R. Auditory brainstem volume-conducted responses: origins in the laboratory mouse. Ear Hear. 4, 173–178 (1979).
CAS Google Scholar
Land, R., Burghard, A. & Kral, A. The contribution of inferior colliculus activity to the auditory brainstem response (ABR) in mice. Hear. Res. 341, 109–118 (2016).
Article PubMed Google Scholar
Kamerer, A. M., Neely, S. T. & Rasetshwane, D. M. A model of auditory brainstem response wave I morphology. J. Acoust. Soc. Am. 147, 25–31 (2020).
Article ADS PubMed PubMed Central Google Scholar
Fernandez, K. A., Jeffers, P. W. C., Lall, K., Liberman, M. C. & Kujawa, S. G. Aging after noise exposure: acceleration of cochlear synaptopathy in recovered ears. J. Neurosci. 35, 7509–7520 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bao, J. et al. Detecting cochlear synaptopathy through curvature quantification of the auditory brainstem response. Front. Cell. Neurosci. 16, 851500 (2022).
Article PubMed PubMed Central Google Scholar
Young, A., Cornejo, J. & Spinner, A. Auditory brainstem response, in StatPearls [Internet], StatPearls Publishing, (2023).
Suthakar, K. & Liberman, M. C. A simple algorithm for objective threshold determination of auditory brainstem responses. Hear. Res. 381, 107782 (2019).
Article PubMed PubMed Central Google Scholar
Schrode, K. M., Dent, M. L. & Lauer, A. M. Sources of variability in auditory brainstem response thresholds in a mouse model of noise-induced hearing loss. J. Acoust. Soc. Am. 152, 3576–3582 (2022).
Article ADS PubMed PubMed Central Google Scholar
Shaheen, L. A., Buran, B. N., Suthakar, K., Koehler, S. D. & Chung, Y. ABRpresto: an algorithm for automatic thresholding of the auditory brainstem response using resampled cross-correlation across Subaverages. Hear. Res. 462, 109258 (2025).
Article PubMed Google Scholar
Burke, K., Burke, M. & Lauer, A. M. Auditory brainstem response (ABR) waveform analysis program, MethodsX, 11, 102414, (2023).
Acır, N., Özdamar, Ö. & Güzeliş, C. Automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection. Eng. Appl. Artif. Intell. 19, 209–218 (2006).
Article Google Scholar
Sanchez, R., Riquenes, A. & Perez-Abalo, M. Automatic detection of auditory brainstem responses using feature vectors. Int. J. Biomed. Comput. 39, 287–297 (1995).
Article CAS PubMed Google Scholar
Wimalarathna, H. et al. Comparison of machine learning models to classify auditory brainstem responses recorded from children with auditory processing disorder. Comput. Methods Programs Biomed. 200, 105942 (2021).
Article PubMed Google Scholar
McKearney, R. M. & MacKinnon, R. C. Objective auditory brainstem response classification using machine learning. Int. J. Audiol. 58, 224–230 (2019).
Article PubMed Google Scholar
Buran, B. N., McMillan, G. P., Keshishzadeh, S., Verhulst, S. & Bramhall, N. F. Predicting synapse counts in living humans by combining computational models with auditory physiology. J. Acoust. Soc. Am. 151, 561–576 (2022).
Article ADS PubMed PubMed Central Google Scholar
Erra, A. et al. ucsdmanorlab/abranalysis: ABRA v0.1.1 Release, Zenodo, (2025).
Ceriani, F. et al. A machine-learning-based approach to predict early hallmarks of progressive hearing loss. Hearing Res.,109328, (2025).
Wu, P., Liberman, L. D. & Liberman, M. C. Noise-induced synaptic loss and its post-exposure recovery in CBA/CaJ vs. C57BL/6J mice. Hear. Res. 445, 108996 (2024).
Article PubMed PubMed Central Google Scholar
Pandarinath, C. et al. and others, Inferring single-trial neural population dynamics using sequential auto-encoders, Nat. Meth., 15, 805–815, (2018).
Celik, N. et al. Deep-channel uses deep neural networks to detect single-molecule events from patch-clamp data. Commun. Biol.. 3, 3 (2020).
Article Google Scholar

Download references

Acknowledgements

We would like to thank Kirupa Suthakar and Charles Liberman for providing us with data as well as helping us improve our software, Zhijun Shen, Mark Rutherford, and Kali Burke for helping us improve our software and for insightful feedback on the manuscript, and Leah Ashebir and Peeyush Patel for help creating a manual for working with the ABRA GUI.

Funding

U.M. is supported by a CZI Imaging Scientist Award (DOI:https://doi.org/10.37921/694870itnyzk) from the Chan Zuckerberg Initiative DAF, NSF NeuroNex Award 2014862, the David F. And Margaret T. Grohne Family Foundation, the G. Harold & Leila Y. Mathers Foundation, NIDCD R018566-03S1 and R01 DC021075-01, and the UCSD Goeddel Family Technology Sandbox. The collection of Marcotti Lab data was supported by an Innovation Seed Fund from the RNID (F115) to F.C. and BBSRC grant (BB/V006681/1) to W.M.

Author information

These authors contributed equally: Abhijeeth Erra, Cayla M. Miller and Jeffrey Chen.

Authors and Affiliations

Data Institute, University of San Francisco, San Francisco, CA, USA
Abhijeeth Erra, Jeffrey Chen & Cody Carroll
Department of Cell & Developmental Biology, University of California San Diego, La Jolla, CA, USA
Cayla M. Miller, Elena Chrysostomou, Shannon Barret, Yasmin M. Kassim & Uri Manor
Department of Otolaryngology, University of California San Diego, La Jolla, CA, USA
Rick A. Friedman & Uri Manor
Departments of Otolaryngology-Head and Neck Surgery and Neuroscience, Center for Functional Anatomy and Evolution, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Amanda Lauer
School of Biosciences, University of Sheffield, Sheffield, S10 2TN, UK
Federico Ceriani & Walter Marcotti
Neuroscience Institute, University of Sheffield, Sheffield, S10 2TN, UK
Federico Ceriani & Walter Marcotti
Department of Mathematics and Statistics, University of San Francisco, San Francisco, CA, USA
Cody Carroll
Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
Uri Manor

Authors

Abhijeeth Erra
View author publications
Search author on:PubMed Google Scholar
Cayla M. Miller
View author publications
Search author on:PubMed Google Scholar
Jeffrey Chen
View author publications
Search author on:PubMed Google Scholar
Elena Chrysostomou
View author publications
Search author on:PubMed Google Scholar
Shannon Barret
View author publications
Search author on:PubMed Google Scholar
Yasmin M. Kassim
View author publications
Search author on:PubMed Google Scholar
Rick A. Friedman
View author publications
Search author on:PubMed Google Scholar
Amanda Lauer
View author publications
Search author on:PubMed Google Scholar
Federico Ceriani
View author publications
Search author on:PubMed Google Scholar
Walter Marcotti
View author publications
Search author on:PubMed Google Scholar
Cody Carroll
View author publications
Search author on:PubMed Google Scholar
Uri Manor
View author publications
Search author on:PubMed Google Scholar

Contributions

A.E. and J.C. performed the majority of the model training and validation work and wrote the first draft of the manuscript. C.M.M. contributed to coding and validation work. E.C. assisted in generating training data and validating the model. S.B. helped with data annotation. Y.M.K. contributed to software development. R.A.F. provided funding support and expertise on ABR analysis. A.L. provided expertise on ABR analysis, assisted in validating the model, and helped interpret results. F.C. and W.M. contributed data and aided in interpreting results. C.C. provided expertise on statistical analysis and model development. C.C. and U.M. guided software development and analysis and co-supervised the overall project. U.M. conceived the project and provided funding support. All authors contributed to manuscript editing and reviewed the manuscript.

Corresponding author

Correspondence to Uri Manor.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Erra, A., Miller, C.M., Chen, J. et al. An open-source deep learning-based toolbox for automated auditory brainstem response analyses (ABRA). Sci Rep 16, 9855 (2026). https://doi.org/10.1038/s41598-026-38045-1

Download citation

Received: 23 May 2025
Accepted: 28 January 2026
Published: 19 February 2026
Version of record: 25 March 2026
DOI: https://doi.org/10.1038/s41598-026-38045-1

Subjects

Abstract

Introduction

Methods

Data collection

Ethics and study design

ABR preprocessing

Peak detection

Supervised threshold estimation

Results

Peak amplitude and latency estimation

ABR classification and threshold estimation results

Time cost analysis

Discussion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1 (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links