Introduction

Since its advent in the early 1900’s, electrocardiography (ECG) has transformed cardiovascular health by enabling non-invasive continuous heart monitoring and supporting objective data-driven diagnosis of heart disease1,2. More than a century later, ECG remains widely regarded as the gold standard for diagnosing arrhythmias, ischemia, and other cardiac conditions3. Traditional analysis have predominantly relied on visual inspection and algorithmic interpretation; however, these methods may not always be optimal in clinical or assistive settings, particularly during complex or acute cardiac events, such as myocardial infarction, in resource-limited environments and remote telemedicine applications, or for users with visual impairments4,5.

Sonification, or the systematic auditory representation of data as sound, has recently emerged as a compelling alternative or complement to visual monitoring. Auditory displays enable users to perceive dynamic signal changes without continuous visual focus, thereby enhancing intuitive interpretation, particularly in real-time monitoring or accessibility-driven contexts6,7. By mapping signal features such as amplitude, frequency, or heart rate variability to acoustic parameters like pitch, rhythm, or timbre, sonification facilitates anomaly detection, pattern recognition, and interactive feedback5,8.

Over the past decade, interest in auditory representations of biomedical signals has grown significantly, with applications spanning clinical diagnostics, emotional regulation, rehabilitation, and assistive technologies9,10,11. Recent innovations include real-time ECG sonification for elderly care12, emotional monitoring through acoustic mapping13, and multisensory systems that integrate sound with haptic or visual feedback14. These advances underscore the interdisciplinary nature of the field, merging biomedical engineering, auditory perception, and human computer interaction.

However, the research landscape remains fragmented, with substantial variations in signal sources, transformation techniques, evaluation protocols, and clinical objectives10,13,15. To date, no comprehensive review has systematically appraised the literature on ECG sonification based on methodological characteristics. Existing surveys have either addressed broader auditory display principles or focused on isolated subdomains16,17.

As interest in inclusive and multimodal health technologies grows, there is a clear need to consolidate current efforts and provide a structured analysis of trends, gaps, and opportunities. To contextualize this growth, a parallel search for general ECG-related publications (signal processing, diagnosis, monitoring) during the same period yielded over 48,000 studies. While ECG sonification represents a small fraction (~0.017%) of the broader ECG literature, the consistent annual output since 2020 marks a notable emergence of interest in auditory cardiovascular display methods. We emphasize that systematic synthesis of emerging evidence bases serves distinct but complementary purposes to meta-analyses of mature fields. While the limited number of studies precludes quantitative pooling, structured methodological assessment provides critical foundation for standardization, identifies reproducibility barriers early in field development, and guides resource allocation for future validation efforts. This approach has proven valuable in other emerging biosensing modalities where early frameworks shaped subsequent methodological evolution. This review provides both systematic evidence synthesis (structured search, standardized extraction, limitation assessment) and translational framework (general pipeline, minimum reporting standards, phased evaluation pathway). These dual objectives are complementary rather than competing: synthesis documents the current state, while framework elements provide structure to advance the field toward comparability and clinical validation. This approach is appropriate for emerging fields where early standardization influences subsequent methodological development. In this review, we systematically synthesize eight peer-reviewed studies published between January 2015 and July 2025, and applied sonification, visualization, or both to ECG signals. We identified the studies through structured searches in PubMed, IEEE Xplore, and Embase, and analyzed each study based on input modality, transformation technique, and application domain. This review also explicitly presents a conceptual input method output framework (Fig. 1), laying a foundation to guide future developments in biomedical auditory display systems aimed at improving diagnosis accuracy and supporting clinical decision-making.

Fig. 1: General sonification pipeline.
Fig. 1: General sonification pipeline.
Full size image

Biosignals xi(t) are acquired and preprocessed into \({\tilde{x}}_{i}(t)\). Feature mapping \({\mathcal{M}}({\tilde{x}}_{i}(t))\) generates control parameters such as loudness Ai(t), frequency fi(t), and optionally timbre or phase. These parameters drive tone synthesis si(t), which are multiplexed into the sonified output y(t). The same y(t) can also be decoded or visualized for further interpretation.

Results

Publications and signal modalities

We incorporated eight studies in this review, covering the period from January 2015 to July 2025. These studies underscore the increasing interest in sonification-based approaches for cardiovascular signal monitoring, particularly utilizing ECG. This structured screening process ensured transparent reporting of included studies, as summarized in the PRISMA-based workflow (Fig. 2).

Fig. 2: PRISMA-based search workflow for study selection.
Fig. 2: PRISMA-based search workflow for study selection.
Full size image

The figure illustrates the structured review process based on PRISMA guidelines, encompassing four stages: Identification, Screening, Eligibility, and Inclusion. This process ensured a transparent methodology for selecting relevant articles on ECG sonification and visualization.

A study-by-study summary of inputs, methods, performance, and applications is provided in Table 1. As depicted in Fig. 3a, six studies exclusively utilized ECG signals, while the remaining two studies integrated ECG with additional physiological signals such as EEG or EDA9,10. This trend highlights a strong reliance on single-modality data. A few studies, however, have explored multimodal combinations as these may offer richer diagnostic information. Most systems targeted real-time or near-real-time applications in clinical or home-based settings. For instance, Iliev et al. developed a Global System for Mobile Communications (GSM)-enabled ECG sonification system to facilitate remote cardiac monitoring for the elderly or visually impaired individuals12. Signal acquisition methods varied across the reviewed literature. Several studies employed commercial wearable biosensors such as MAXREFDES100 or NeXus-105, while others utilized full-lead clinical ECG configurations tested in either simulated or real healthcare environments4,6.

Table 1 Summary of studies (2015–2025) investigating ECG and PPG sonification techniques, categorized by sensor input, methodology, application, and limitations
Fig. 3: Overview of study characteristics.
Fig. 3: Overview of study characteristics.
Full size image

a Types of signals used; b analysis methods; c application domains; d validation environments. Monitoring and diagnosis emerged as the predominant applications, often integrated with visualization or machine learning approaches. Validation was largely confined to controlled laboratory settings, and key limitations included small participant cohorts and the absence of standardized evaluation frameworks.

General sonification pipeline and variables

Across all reviewed methods, the process of transforming ECG into sound follows the general framework illustrated in Fig. 1. Starting with acquisition, the pipeline proceeds through preprocessing, feature mapping, parameter control (e.g., loudness, pitch, timbre), tone synthesis, multiplexing into the sonified output, and optionally decoding or visualization. A comparative implementation of this pipeline across the eight included studies is summarized in the Supplementary Materials. The sonification pipeline operates according to the following general nomenclature:

  • xi(t) : input biosignal from channel i (e.g., ECG lead).

  • \({\tilde{x}}_{i}(t)\) : preprocessed signal (normalized, filtered, rescaled).

  • \({\mathcal{M}}({\tilde{x}}_{i}(t))\) : mapping function or feature extraction step applied to \({\tilde{x}}_{i}(t)\).

  • Ai(t) : loudness control derived from signal amplitude or features.

  • fi(t) : instantaneous frequency or pitch modulation derived from \({\tilde{x}}_{i}(t)\).

  • ϕi(t) : instantaneous phase of the oscillator.

  • si(t) : synthesized tone for channel i after applying Ai(t), fi(t), and optional controls (e.g., timbre).

  • y(t) : multiplexed sonified output, typically y(t) = ∑isi(t), representing the complete sonified ECG.

Study 1: Polyphonic sonification of 6-lead ECG, enabling auditory representation of waveform dynamics1

  • Acquisition xi(t): 6-lead ECG.

  • Preprocessing \({\tilde{x}}_{i}(t)=({x}_{i}(t)-{\mu }_{i})/{\sigma }_{i}\); smoothing; rescale to [0,1].

  • Mapping/Features \({\mathscr{M}}({\tilde{x}}_{i}(t))\): amplitude and normalized waveform for control mapping.

  • Controls: Ai(t) = wig(xi(t)) (loudness with pitch compensation); 7D2 \({f}_{i}(t)={f}_{0,i}\,(1+\alpha \,{\tilde{x}}_{i}(t))\) with α = 0.03.

  • Tone synthesissi(t): \({\phi }_{i}(t)=2\pi \mathop{\int}\nolimits_{0}^{t}{f}_{i}(\tau )\,d\tau\); \({s}_{i}(t)={\sum }_{j\in \{1,3,4,5\}}{a}_{j}{A}_{i}(t)\sin (j\,{\phi }_{i}(t))\).

  • Multiplexing/Output y(t) = ∑isi(t) 7D2 (polyphonic sonified ECG).

  • Optional: export as audio file for listening/training.

Study 2: Frequency-modulated (FM)-based sonification with CNN decoding to reconstruct ECG signals4

  • Acquisition xi(t): 8-lead ECG.

  • Preprocessing \({\tilde{x}}_{i}(t)=({x}_{i}(t)-{\mu }_{i})/{\sigma }_{i}\).

  • Mapping/Features \({\mathcal{M}}({\tilde{x}}_{i}(t))\): frequency deviation \(\propto {\tilde{x}}_{i}(t)\).

  • Controls: \({f}_{i}(t)={f}_{0,i}+{k}_{f,i}{\tilde{x}}_{i}(t)\), \({\phi }_{i}(t)=2\pi \mathop{\int}\nolimits_{0}^{t}{f}_{i}(\tau )\,d\tau\); loudness Ai(t) = A0wi.

  • Tone synthesis \({s}_{i}(t)={A}_{0}{w}_{i}\cos ({\phi }_{i}(t))\).

  • Multiplexing/Output y(t) = ∑isi(t) (narrowband audio stream).

  • Optional: CNN decoder \({\hat{x}}_{i}(t)={{\mathcal{D}}}_{\theta }\{y(t)\}\) reconstructs ECG.

Study 3: Low-power ambulatory 3-lead ECG with auditory alarms for anomaly detection6

  • Acquisition xi(t): DI, aVF, V2.

  • Preprocessing \({\tilde{x}}_{i}(t)=({x}_{i}(t)-{\mu }_{i})/{\sigma }_{i}\).

  • Mapping/Features: CWT-based R-peaks {tk}; RRk = tktk−1, HRk = 60/RRk.

  • Controls: safety logic on HR band, lead-off, signal quality.

  • Tone synthesis si(t): event beep \({s}_{{\rm{alarm}}}(t)={A}_{0}\sin (2\pi {f}_{{\rm{beep}}}t)\,w(t)\) (triggered on anomaly).

  • Multiplexing/Output y(t) = salarm(t) (event-driven sonified output).

  • Optional: LCD display of HR; microSD logging.

Study 4: Wireless single-lead ECG system for infant cardiac orienting response capture5

  • Acquisition x1(t): single-lead ECG; stimuli times {sj}.

  • Preprocessing \({\tilde{x}}_{1}(t)=({x}_{1}(t)-{\mu }_{1})/{\sigma }_{1}\); R-peaks {tk} HR(t).

  • Mapping/Features: COR metrics \({{\Delta }}H{R}_{j}=\mathop{\min }\nolimits_{t\in {W}_{j}^{+}}HR(t)-{\overline{HR}}_{{W}_{j}^{-}}\), latencyj.

  • Controls: app schedules auditory/visual stimuli.

  • Tone synthesis si(t): stimulus sound sstim(t) (tablet-generated).

  • Multiplexing/Output y(t) = sstim(t), analyzed with concurrent ECG.

  • Optional: store ECG + COR metrics for analysis.

Study 5: Multimodal neurofeedback integrating EEG and ECG sonification9

  • Acquisition xi(t): EEG channels + ECG.

  • Preprocessing \({\tilde{x}}_{i}(t)=({x}_{i}(t)-{\mu }_{i})/{\sigma }_{i}\).

  • Mapping/Features: event peaks \(\{{b}_{k}^{i}\}\), intervals \({T}_{{\rm{pp}}}^{(i)}\); ECG R-peaks {tm}.

  • Controls: \({A}_{i}(k)\propto | {\tilde{x}}_{i}({b}_{k}^{i})|\);7D2 \({f}_{i}(k)={\kappa }_{i}/{T}_{{\rm{pp}}}^{(i)}(k)\);7D2 ECG→ heartbeat cue.

  • Tone synthesis \({s}_{i}(t)={\sum }_{k}{A}_{i}(k)\,{w}_{i}(t-{b}_{k}^{i})\sin (2\pi {f}_{i}(k)(t-{b}_{k}^{i})+{\phi }_{i,k})\); heartbeat sHB(t).

  • Multiplexing/Output y(t) = ∑isi(t) + sHB(t).

  • Optional: synchronized light/visual feedback.

Study 6: Audio conversion of ECG/PPG for remote monitoring via FM/AM encoding12

  • Acquisition xi(t): ECG/PPG.

  • Preprocessing \({\tilde{x}}_{i}(t)=({x}_{i}(t)-{\mu }_{i})/{\sigma }_{i}\).

  • Mapping/Features: FM or AM encoding.

  • Controls: FM: \({f}_{i}(t)={f}_{0,i}+{k}_{f,i}{\tilde{x}}_{i}(t)\), Ai(t) = A0;7D2 AM: \({A}_{i}(t)={\rm{clip}}({g}_{i}({\tilde{x}}_{i}(t)),0,1)\).

  • Tone synthesis si(t): FM: \({A}_{0}\cos ({\phi }_{i}(t))\);7D2 AM: \({A}_{i}(t)\sin (2\pi {f}_{0,i}t+{\phi }_{i})\).

  • Multiplexing/Output y(t) = ∑isi(t) (speech-band audio).

  • Optional: human listening or software decoding \({\hat{x}}_{i}(t)\).

Study 7: Physiological-to-music therapy system mapping ECG and EDA features into adaptive music10

  • Acquisition xi(t): ECG, EDA.

  • Preprocessing \({\tilde{x}}_{i}(t)=({x}_{i}(t)-{\mu }_{i})/{\sigma }_{i}\).

  • Mapping/Features: HRV metrics (SDNN, RMSSD, SDSD) and EDA slope βEDA.

  • Controls: map features to musical parameters (tempo, loudness, timbre).

  • Tone synthesis si(t): adaptive instrument renders controlled music.

  • Multiplexing/Output y(t) = ∑isi(t) (real-time music as sonified physiology).

  • Optional: therapist logs, visualization of features.

Study 8: Music-information-retrieval (MIR)-inspired features from ECG, combined with machine learning classifiers13

  • Acquisition xi(t): ECG segments.

  • Preprocessing \({\tilde{x}}_{i}(t)=({x}_{i}(t)-{\mu }_{i})/{\sigma }_{i}\); baseline correction; augmentation.

  • Mapping/Features: \({\mathcal{M}}({\tilde{x}}_{i}(t))\) extracts MIR descriptors (spectral shape, energy, entropy).

  • Controls: not used (features to ML).

  • Tone synthesis: not performed.

  • Multiplexing/Output: no direct y(t); features→ classifiers (NN, RF, etc.).

  • Optional: performance metrics (AUC, accuracy).

Sonification techniques and mapping strategies

The studies included in this review employed diverse sonification methodologies with differences in purpose, auditory design, and levels of complexity. Parameter-mapping sonification was the most prevalent approach, where ECG signal features (e.g., QRS amplitude, RR intervals, heart rate variability) were translated into acoustic parameters such as pitch, volume, or tempo4,5,12. Three studies applied FM techniques to transform ECG waveforms into continuously varying auditory streams. For example, Kather et al. implemented polyphonic FM sonification to enhance auditory perception of arrhythmias and facilitate remote diagnosis via telemetry1. This approach supported intuitive interpretation of signal variability using sound. Some studies explored more advanced auditory designs by combining mapping strategies or incorporating machine learning models. For example, Valenzuela Ramirez et al. developed an adaptive sonification system that dynamically modulated musical components based on user emotional states derived from ECG and EDA10. Martínez-Suárez et al. proposed a dynamic event-mapping framework that differentiated normal from abnormal cardiac events through rhythmic and timbral changes6. These systems sought to improve user engagement and provide interpretable auditory cues for various diagnostic and therapeutic applications.

Publication trends

As illustrated in Fig. 4, publications increased markedly after 2020, reflecting greater accessibility to wearable sensors, improved computational power, signal processing and denoising algorithms18,19, as well as increased interest in precision medicine and broader interdisciplinary engagement20,21. By 2024, the cumulative number of studies reached its highest point to date, reflecting a slowly growing but still developing research area.

Fig. 4: Year-wise distribution of included studies on ECG sonification (2015–2025).
Fig. 4: Year-wise distribution of included studies on ECG sonification (2015–2025).
Full size image

The bars show the annual number of included publications, while the line indicates the cumulative count over time. The overall evidence base remains small, so these counts should be interpreted descriptively rather than as indicators of field growth or maturity. Cumulative values reached their highest point in 2024, reflecting the largest aggregation of studies to date within this still developing research area. Note that 2025 counts are incomplete, as the search concluded in July 2025.

Application domains and use cases

The studies included in this review primarily focused on three domains: monitoring, diagnosis, and therapy. Monitoring emerged as the most prevalent application, with four studies emphasizing the value of sonification for cardiac monitoring, particularly in home-based or continuous care settings1,22. For example, Martinez-Suáarez et al. utilized sonified ECG to facilitate real-time arrhythmia detection in ambulatory environments6. Iliev et al. demonstrated that auditory icons associated with ST-segment changes could aid clinicians in identifying cardiac abnormalities through sound cues alone12. Diagnostic applications were explored in two studies, both investigating how acoustic mappings could highlight subtle waveform irregularities that might otherwise be overlooked visually by clinicians, thereby enhancing early detection of cardiac pathologies13,18. Additionally, two studies focused on therapeutic and assistive use cases, particularly for visually impaired individuals. Aguilar-Rivera et al. and Krasteva et al. proposed wearable systems where sonified ECG enabled users to independently monitor their cardiovascular health without visual displays4,5. These systems often employed wireless or GSM transmission to convert biosignals into pitch-modulated auditory streams, thereby improving accessibility for both patients and caregivers5,23. Emotional regulation and neurofeedback were secondary therapeutic themes. Hinterberger et al.’s Sensorium system integrated EEG and ECG sonification to promote relaxation and self-awareness9. Similarly, Valenzuela Ramire et al. incorporated multimodal signals such as EDA and respiration into an adaptive music therapy platform for children with disabilities, reporting enhanced engagement and emotional regulation10.

Performance and evaluation

Evaluation methodologies varied significantly across the studies reviewed. Only a subset of studies reported quantitative metrics. For instance, Kather et al. achieved an arrhythmia classification accuracy of up to 78% when trained human observers interpreted sonified ECG signals1. Other studies with a diagnostic focus reported similar potential for enhanced pattern recognition, particularly when utilizing multimodal inputs9,10,24. Assistive and therapeutic applications primarily emphasized user experience. Krasteva et al. surveyed blind participants and found a preference for pitch-modulated sonification over rhythmic cues in terms of usability and accuracy4. Participants in the Sensorium neurofeedback study reported increased immersion and relaxation when audio feedback complemented visual displays9. Likewise, Valenzuel Ramire et al. documented greater emotional regulation and engagement in children with disabilities when using adaptive music-based sonified biofeedback10.

These findings underscore that while quantitative metrics (e.g., classification accuracy) demonstrate the potential clinical value of sonified ECG, user-centered qualitative outcomes such as usability, accessibility, and emotional engagement remain critical for adoption in real-world settings4,21.

Despite these promising outcomes, several studies acknowledged limitations in sample size, generalizability, and clinical integration. Many systems were validated only in laboratory or simulated environments, and few involved clinical populations1,5. Moreover, subjective evaluation methods predominated, often lacking standardized measurement scales, and long-term follow-up data on usability or therapeutic benefit remain scarce.

Evidence-derived conceptual workflow

To summarize recurring patterns across the included studies, we synthesized a high-level workflow of ECG sonification (Fig. 5). The diagram aggregates what the literature most commonly reports: (i) biosignal acquisition (clinical or wearable ECG), (ii) signal processing and feature extraction, (iii) sonification mappings (e.g., parameter mapping, AM/FM encoding, event-driven cues), and (iv) application domains (diagnosis/monitoring, biofeedback/therapy, accessibility/education). This figure is descriptive of the reviewed evidence rather than prescriptive; it reflects the modal sequence and components across studies.

Fig. 5: Evidence-derived conceptual workflow for ECG sonification.
Fig. 5: Evidence-derived conceptual workflow for ECG sonification.
Full size image

Schematic synthesized from the reviewed literature showing four recurrent stages: a Biosignal acquisition, b Signal processing and feature extraction, c Sonification mappings (e.g., parameter mapping, amplitude/frequency modulation, event-driven cues), and d Application domains (diagnosis/monitoring, biofeedback/therapy, accessibility/education).

Methodological limitations

As summarized in Fig. 6a–d, the eight studies show heterogeneous risk profiles across six predefined categories (L1–L6). Participant sample bias (L2), partial reporting of real-time performance (L4), and single-device/context validation (L6) were the most recurrent concerns, whereas preprocessing (L5) and evaluation metrics (L3) were more consistently reported in a subset of papers. See the Supplementary Materials for rating criteria.

Fig. 6: Methodological limitations across included studies.
Fig. 6: Methodological limitations across included studies.
Full size image

Ratings are provided for six limitation categories (L1–L6), using four levels of concern: Low (green), Moderate (yellow), Serious (red), and Unclear/insufficiently reported (gray). a Heat-map matrix with rows indicating studies and columns indicating limitation categories, allowing per-study/per-category inspection of risk patterns. b Stacked bars showing, for each limitation category, the proportion of studies falling into each concern level. c Horizontal stacked bars summarizing, for each study, its composition across all limitation categories. d Overall counts across all studies and categories, providing a corpus-level snapshot.

Discussion

The evidence-derived workflow in Fig. 5 clarifies where current studies converge (acquisition → preprocessing → feature/mapping → synthesis/output) and where they diverge most-namely, in the mapping design and the choice of evaluation endpoints. We interpret this as a descriptive synthesis rather than a prescriptive pipeline: it motivates minimum reporting for preprocessing and mapping decisions, and it highlights the need for shared benchmarks that couple task-specific endpoints (e.g., arrhythmia detection, biofeedback) with perceptual metrics. This framing also exposes opportunities for multimodal extensions and for reproducible reference implementations to reduce heterogeneity across future studies. The observed heterogeneity is a documented finding of this review, not a limitation; our framework provides structure to reduce future variability.

Collectively, recent work in ECG sonification and audio-based signal processing demonstrates that transforming ECG into auditory cues can facilitate intuitive recognition of pathological events, enhance monitoring in resource-limited settings, and improve accessibility for visually impaired users1,5,9. These methods complement conventional ECG visualization by adding a parallel sensory channel that can support situational awareness and continuous oversight.

Patterns in Fig. 6 indicate that the body of evidence is constrained more by external validity and systems reporting (L2, L4, L6) than by signal handling per se. Accordingly, we call for: (i) minimum reporting standards on latency/buffering and computational load; (ii) multi-site and multi-device validation to probe generalizability; and (iii) recruitment beyond homogeneous convenience samples. Addressing these limitations is essential if sonification results are to translate beyond the lab and into routine clinical or field use.

Regarding signal fidelity during digital-to-audio conversion, the reviewed literature provides limited but encouraging preliminary evidence. Krasteva et al. reported RMSE of 3–7 μV and >99.7% sensitivity/PPV for QRS detection after frequency-modulated encoding and CNN decoding, demonstrating minimal diagnostic information loss for beat detection tasks4. However, comprehensive assessment of signal degradation across different sonification methods, sampling rates, audio codecs, and cardiac abnormality types remains scarce. Future validation should systematically quantify information preservation for clinically relevant waveform features (ST-segment morphology, T-wave alternans, QT intervals) and establish minimum audio bandwidth/bitrate requirements for diagnostic-grade sonification. The scope of audio-based clinical decision support may be initially most appropriate for rhythm monitoring and gross morphology assessment, with morphology-dependent diagnoses (e.g., subtle ischemic changes) requiring additional validation to ensure adequate signal fidelity.

Parameter mapping sonification, which directly maps ECG features such as QRS amplitude and RR intervals into acoustic parameters (e.g., pitch, volume, or tempo), remains the most widely adopted technique due to its simplicity, low computational requirements, and intuitive nature4,5. While parameter mapping, as the simplest and most feasible solution in real time, is the most utilized, FM-based approaches combined with machine learning (ML) decoding represent a significant methodological evolution, allowing the reconstruction and classification of ECG signals from encoded audio with greater flexibility and precision4,21. These approaches have demonstrated high accuracy in simulations, preserving diagnostic ECG features while facilitating compact transmission formats compatible with existing telecommunication systems12,25. Multimodal neurofeedback systems and hybrid solutions combining sonification and visualization further extend beyond diagnostic applications into stress regulation, interoceptive training, and patient engagement in diverse therapeutic contexts9,10.

Despite these promising developments, several persistent limitations constrain the clinical translation of ECG sonification. A primary concern is the lack of standardization in both, the sonification frameworks and the evaluation methodologies. Studies vary significantly in terms of input signal quality, preprocessing techniques, and mapping strategies employed, which restricts the comparing across systems and hinders reproducibility14,18. Additionally, the evaluation protocols are highly heterogeneous; some studies relied heavily on subjective user experience metrics, while others used non-standardized technical performance indicators. This variability complicates the ability to conduct robust meta-analyses or establish meaningful benchmarks17,23.

Real-time applicability remains another critical challenge for emerging sonification-based systems. Many implementations were tested only in controlled laboratory settings or simulated environments and have not undergone rigorous evaluation in real-world clinical workflows16,26. This limitation undermines their practical relevance because clinical environments often inherently introduce unpredictable noise, data loss, and complex patient conditions which do not align with experimental settings. Furthermore, most studies were limited by small sample sizes, short-term assessments, or reliance on homogenous participant groups. These factors can introduce sampling bias and restrict the generalizability of findings across broader and more diverse populations6,20.

From a usability perspective, interpreting sonified biomedical signals can present a considerable learning curve, particularly for non-expert users and clinicians who are unfamiliar with auditory diagnostic interfaces. Audio patterns are not always self-explanatory, especially when the mappings lack transparency or contextual training21,22. The inherent subjectivity of auditory perception further complicates interpretation and reduces inclusivity, as individual differences in hearing acuity, auditory processing speed, and even musical background can affect outcomes. Moreover, few systems incorporate user-specific adaptations or accessibility accommodations, such as customization for individuals with hearing impairments, further limiting their universal applicability.

Some studies have also employed musical mappings designed to enhance user engagement; however, these choices may inadvertently introduce aesthetic biases or prioritize musical coherence over diagnostic fidelity. In some cases, clinically relevant signal features may be obscured by sonification strategies optimized for pleasant or easily recognizable sounds rather than accurate medical interpretation9,10.

Another limitation pertains to the design of the devices and systems themselves. Portable implementations often encounter wearability constraints, such as cumbersome hardware configurations, limited battery life, and inadequate integration with existing patient monitoring platforms. These issues significantly hinder their adoption in both clinical and home-based settings2,12. Furthermore, many implementations prioritize conceptual or assistive objectives–such as therapy, accessibility, or emotional feedback-over diagnostic rigor. While these applications are valuable, the absence of standardized clinical validation and benchmarking (e.g., accuracy, precision, sensitivity, specificity, RMSE against ground-truth ECG) diminishes the credibility of results and slows progress toward broader adoption in healthcare23,25.

Future research should pursue hybrid approaches that integrate perceptual sonification with computational ML decoding to optimize both diagnostic accuracy and user experience12,21. Such systems could leverage the intuitive benefits of audio feedback while capitalizing on the analytical precision of advanced algorithms. Multimodal studies integrating synchronized ECG with phonocardiogram (PCG) for cardiac health monitoring are promising as they reflect the state of the heart from different perspectives, potentially enhancing the accuracy and efficiency of early cardiovascular disease diagnosis24. Comparative studies are also necessary to ascertain whether audio-based representations offer measurable perceptual or diagnostic advantages over purely computational or visual-only methods20,25. Standardized assessment protocols must be developed to evaluate performance consistently across studies, encompassing both technical metrics as well as user-centered measures such as usability, accessibility, and acceptance10,23.

Beyond diagnosis and monitoring, ECG sonification holds significant promise in broader healthcare applications. Sonified biofeedback has demonstrated potential in stress regulation, rehabilitation, and interoceptive training, helping users improve their emotional self-regulation and compliance/adherence to therapy protocols9,10. Auditory representations of physiological signals hold potential to empower visually impaired and elderly users to independently monitor vital signs without relying on visual displays. While technical feasibility has been demonstrated in systems designed for elderly and assistive contexts5,12, dedicated user studies with visually impaired populations evaluating usability, accessibility benefits, and user acceptance remain needed to validate these claims in real-world settings. The integration of sonification technologies with wearable devices and Internet of Things (IoT) platforms could further enable continuous, remote health monitoring across diverse patient populations and geographical locations, thereby expanding accessibility and promoting precision medicine and personalized care2,22.

The number of ECG leads used across studies ranged from single-lead portable configurations5 to 8-lead clinical systems4. This variability directly impacts system complexity, diagnostic capability, and practical deployment feasibility. For remote monitoring applications targeting elderly or visually impaired populations, single-lead or 3-lead systems offer greater wearability, lower cost, and reduced setup complexity, though with limited diagnostic scope compared to 6- or 8-lead configurations that enable more comprehensive rhythm and morphology assessment.

Expanding research across these application domains will be essential to demonstrate the full value of ECG sonification. Future efforts must refine technical performance, adopt user-centered design principles, and prioritize seamless clinical integration. By addressing these gaps, ECG sonification can evolve from a promising experimental concept into a practical, scalable technology capable of transforming patient assessment, monitoring, and rehabilitation in real-world healthcare environments.

Across the eight reviewed studies, audio was used in three complementary roles: (i) diagnostic support, where multi-lead ECG is rendered as intelligible sound to assist human classification1; (ii) connectivity/monitoring, where ECG is transported or monitored as an audio stream over commodity links4,12; and (iii) assessment/therapy, where audio acts as a stimulus or real-time feedback channel during evaluation or intervention5,9,10. In diagnostic support, polyphonic sonification of six ECG leads (aVL, I, aVR, II, aVF, and III) enabled four-class discrimination after brief training, with cardiology-trained listeners averaging about 78% accuracy and 27% exceeding 90%1. The four-class discrimination task encompassed normal sinus rhythm, atrial fibrillation, premature ventricular contraction, and pacemaker rhythms. While this demonstrates proof of concept for auditory pattern recognition among trained listeners, the authors acknowledged this represents a limited scope relative to clinical ECG interpretation, which routinely distinguishes dozens of rhythm and morphology abnormalities. The study’s training period was brief (under 1 h), suggesting potential for improved performance with extended familiarization. However, the authors did not position sonification as a replacement for visual ECG but rather as a complementary modality that may enhance situational awareness during monitoring or facilitate triage in resource-limited settings, clearly indicating that sonification was effective in supporting classification and can augment visual ECG review. For remote care, eight ECG leads were frequency-multiplexed into a 300–2700 Hz audio stream, transmitted over phone/GSM, and AI-decoded back to ECG with sub-2 ms fiducial timing error and >99.7% sensitivity/PPV for beat detection, showing that audio can be both a transport layer and a quality-controlled representation4; audio conversion was also validated for comfortable long-term monitoring of the elderly/visually-impaired users under medical supervision12. Ambulatory hardware advances support real-world tracking e.g., a low-power three-lead monitor produced real-time HR/HRV suitable for daily-life prognosis and detection of transient abnormalities6.

In the assessment contexts, Case 1: a wireless single-lead system delivered controlled audio/visual stimuli and captured stimulus-locked cardiac orienting responses, meaning heart rate changes that occur in temporal alignment with presented stimuli, providing immediate operator feedback for scalable infant screening (using audio as stimulus)5. Case 2: the sensory system mapped the ongoing EEG/ECG to sound/light for closed-loop neurofeedback with reported well-being benefits9. In addition, physiological control of an adapted musical instrument provided therapists with real-time objective cues of users’ emotional state, enhancing engagement in disability care10.

Note that audio-inspired music-information-retrieval descriptors applied to ECG, combined with classical ML (AdaBoost, CN2, NN, RF, DT, kNN), discriminated cardiopathies (diseases of the heart) after baseline correction and wavelet-based augmentation-highlighting cross-fertilization between audio analytics and clinical ECG modeling13. Collectively, these results indicate that sonification can complement diagnosis, enable robust remote monitoring, and support assessment/therapy. Validation warrants prospective clinical trials with endpoint metrics, accessibility-aware mappings and quantified learning curves, robustness/cybersecurity evaluations for audio links, and seamless integration with existing telemetry/EHR systems1,4,5,6,9,10,12,13.

Several factors may threaten validity and should temper interpretation. Internally, performance can shift with learning, fatigue, attentional drift, and device-pipeline latency (sampling, buffering, OS audio, transducer); SPL at the ear varies with headphone fit and ANC/EQ, and psychoacoustic masking in realistic noise further limits detectability. Our systematic assessment (Fig. 6) reveals that these parameters are incompletely reported in most studies (L4: 5 of 8 studies provide only partial system metrics), representing a critical barrier to reproducibility and clinical translation. Design choices also matter: surrogate perceptual metrics (e.g., \(d{\prime}\)) do not necessarily reflect clinical endpoints (time-to-recognition, false-alarm burden), and class imbalance or cueable ECG artefacts can bias results.

A pragmatic translation pathway can proceed in three steps. First, bench validation verifies sub-200 ms end-to-end latency for real-time monitoring, documents buffering/windowing and mapping monotonicity on standard ECG tests, calibrates SPL across output devices (including ANC/EQ states), and releases code, parameter files, and 10–20 s exemplar audio. Second, simulated-ward experiments add realistic noise and multitasking in counterbalanced cross-over designs; primary readouts are time-to-detection, sensitivity/specificity or \(d{\prime}\), and workload (e.g., NASA-TLX), with mixed-effects analyses and a visual-only baseline. Third, prospective ward evaluations (cluster-randomized or stepped-wedge) compare audio + visual with visual-only monitoring using prespecified non-inferiority/superiority margins; primary endpoints are median time-to-recognition and false-alarm burden, with secondary measures of workload, alarm-response adherence, and satisfaction. Implementation measures (e.g., RE-AIM), brief qualitative feedback, simple cost-effectiveness, and-where permitted-shared de-identified logs/configurations with analysis scripts support adoption and verification.

The minimum reporting elements emerging from this review-ECG provenance, mapping parameters, audio-rendering details, verified latency/SPL, ambient noise, user/hearing characteristics, task prevalence/design, statistical model specification, and open stimuli/parameters/audio-are immediately actionable and, if adopted, will make results comparable across sites and bring clinical-grade ECG sonification within reach. Here is a list of actionable recommendations for future work:

  • Pre-register hypotheses, endpoints, analyses; link the protocol.

  • Verify and report end-to-end latency (stimulus → ear); target <200 ms for real-time use.

  • Calibrate and log SPL at the ear; document device model, ANC/EQ, and ambient noise.

  • Release mapping parameter files, 10–20 s audio exemplars, and scripts to regenerate figures and latency/SPL logs.

  • Use prevalence-aware performance metrics (e.g., Matthews Correlation Coefficient, balanced accuracy, F1-score, area under precision-recall curves for imbalanced datasets) and report full confusion matrices with sensitivity, specificity, positive predictive value, negative predictive value, and likelihood ratios when applicable.

  • Analyse with mixed-effects models (random effects for listener and scenario) and control multiplicity.

  • Recruit representative users (e.g., bedside nurses), record basic hearing status, and standardize training/washouts with counterbalancing.

  • Test under realistic noise/multitasking with a required visual-only baseline; include an audio-naïve arm where feasible.

  • Follow a phased evaluation: bench → simulated ward → pragmatic ward trials with prespecified margins.

  • Report implementation details (alarm policies, integration, UI states) and, where allowed, share de-identified logs/configurations.

Taken together, our synthesis indicates that ECG sonification is moving beyond proof-of-concept toward practical augmentation of monitoring and alarm management, provided latency and SPL are verified, mappings are transparent, and evaluations reflect real clinical conditions. The field remains constrained by small, heterogeneous studies and variable reporting, which currently limits quantitative aggregation and external validity. The reporting elements and phased trial pathway outlined here offer a concrete route to comparable results and reproducible practice, and can be adopted immediately in ongoing studies. With these standards in place-and with open artifacts that include parameter files, scripts, and exemplar audio-future work can more decisively establish where sonification adds clinical value, for whom, and under what conditions.

Methods

This review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines27. The review process consisted of three main stages: literature search, study selection, and data extraction with subsequent methodological assessment.

Search strategy

A structured literature search was performed in PubMed, IEEE Xplore, and Embase to identify relevant studies published between 1 January 2015 and 1 July 2025. Search queries were adapted to the syntax of each database. Across all databases, the concepts “electrocardiography” or “ECG” were combined with “sonification” and related terms (e.g., auditory display, auditory feedback, data-to-sound, acoustic display, auditory representation) using Boolean operators. Filters were applied to restrict results to English-language, human-participant, peer-reviewed papers. The complete search strings for all databases are provided in the Supplementary Information.

Study selection

All retrieved records were imported into a single database, and duplicates were removed. The selection process was performed in two stages: (1) title and abstract screening, and (2) full-text review. Screening and eligibility assessments were conducted independently by multiple reviewers, with discrepancies resolved by discussion and, when necessary, by consultation with the corresponding author. We only included the studies that met the predefined eligibility criteria.

Eligibility criteria

Eligible studies met all the following criteria: (i) original peer-reviewed research involving human participants, (ii) use of ECG signals (clinical, wearable, or experimental) for sonification and/or visualization, and (iii) reporting of analyzable methodologies with identifiable signal-to-sound transformation or visualization parameters.

Exclusion criteria comprised review articles, meta-analyses, editorials, letters, conference abstracts, symposium papers, guidelines, case reports, non-English publications, studies without human participants, and studies in which sonification or visualization were not applied to cardiovascular signals.

Data extraction and synthesis

For each included study, the data were extracted on: study design, participant demographics, signal modality (ECG, or multimodal), sensor type and placement, acquisition parameters, preprocessing and mapping strategies, evaluation protocols, and reported outcomes. For data extraction, we used a standardized template to ensure consistency and comparability across studies.

Methodological limitation assessment

Six methodological limitation categories were defined a-priori: narrow mapping design (L1), participant sample bias (L2), evaluation metrics inadequacy (L3), real-time performance constraints (L4), signal quality and preprocessing variability (L5), and device/environment heterogeneity (L6). Each study was evaluated against these categories and assigned a rating of low, moderate, or serious concern according to predefined criteria.

Low indicates methodological adequacy in the given category (e.g., multiple mappings tested and justified for L1, diverse participant groups for L2, combination of objective and subjective evaluation metrics for L3, full reporting of latency and buffering for L4, standardized preprocessing pipelines for L5, and validation across multiple devices or environments for L6). Moderate reflects minor to moderate concerns that may affect generalizability or reproducibility (e.g., single mapping explored with partial justification for L1, small or homogeneous participant groups for L2, reliance mostly on subjective ratings with limited statistics for L3, partial system performance reporting for L4, some ad-hoc preprocessing for L5, or testing limited to a single device/context for L6). Serious denotes major methodological issues likely to bias results or limit interpretability (e.g., very restricted mappings without rationale for L1, very limited or biased participant groups for L2, only subjective impressions without validation for L3, no performance data for L4, undocumented or poor preprocessing for L5, or no validation beyond a lab prototype for L6).

Unclear/insufficiently reported indicates that the study did not provide sufficient methodological detail to confidently assess the limitation category, preventing classification as low, moderate, or serious concern. This rating highlights gaps in reporting transparency rather than inherent methodological flaws. These definitions were consistently applied across all eight included studies, ensuring reproducibility and enabling direct comparisons of methodological rigor.