Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Post-swallowing voice-based aspiration screening in dysphagia using a deep learning approach: insights from audio segmentation
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 21 May 2026

Post-swallowing voice-based aspiration screening in dysphagia using a deep learning approach: insights from audio segmentation

  • Jung-Min Kim1,2,
  • Min-Seop Kim3,
  • Sun-Young Choi2,
  • Hyun-Jin Kim2 &
  • …
  • Ju Seok Ryu2,4,5 

Scientific Reports (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Diseases
  • Health care
  • Medical research

Abstract

Dysphagia presents a serious risk of aspiration that requires continuous monitoring. This study introduces standardized 2 s voice segments for aspiration detection, derived from physiological constraints observed during clinical video-fluoroscopic swallowing study (VFSS) examinations. This prospective cohort study collected voice data from 198 participants aged ≥ 40 years. Of the 128 normal participants (healthy controls and VFSS-normal, penetration-aspiration scale (PAS) 1) and 70 with aspiration-risk (PAS ≥ 5) participants, the duration analysis revealed that the VFSS participants (overall mean: 2.22–2.48 s) produced significantly shorter phonations than healthy controls (6.19 ± 1.70 s) (p < 0.001), which justified the 2 s standardization. The aspiration-risk threshold was set at PAS ≥ 5 as material on the vocal cords significantly alters voice quality. Post-swallowing recordings were preprocessed into mel-spectrograms via short-time Fourier transform to extract time-frequency features. We developed three models based on MobileNetV3 (male, female, integrated) using the EfficientAT framework with 10-fold cross-validation. The integrated model demonstrated optimal performance with an area under the curve (AUC) of 0.8090 and 82.77% sensitivity. The male-specific model achieved an AUC of 0.7586 and 91.88% sensitivity, whereas the female model reached an AUC of 0.7376 and 61.11% sensitivity. This physiologically grounded approach shows promise for telemedicine-based aspiration screening.

Trial registration: This study was approved by the institutional review board of Seoul National University Bundang Hospital (approval number B-2109-707-303) and registered in the ClinicalTrials.gov database (NCT05149976).

Abbreviations

PAS:

Penetration-Aspiration scale

VFSS:

Video-fluoroscopic swallowing study

MPT:

Maximum phonation time

NHR:

Noise-to-harmonics ratio

RAP:

Relative average perturbation

GRBAS:

Grade, roughness, breathiness, asthenia, strain scale

SNR:

Signal-to-noise ratio

XGBoost:

eXtreme gradient boosting

CPP:

Cepstral peak prominence

MFCCs:

Mel-frequency cepstral coefficients

ASHA:

American speech-language-hearing association

STFT:

Short-time fourier transform

AUC:

Area under the ROC curve

ROC:

Receiver operating characteristic

STROBE:

Strengthening the reporting of observational studies in epidemiology

HDF5:

Hierarchical data format version 5

EfficientAT:

Efficient pre-trained CNNs for audio pattern recognition (audio tagging)

MLP:

Multilayer perceptron

PPV:

Positive predictive value

NPV:

Negative predictive value

CI:

Confidence interval

SD:

Standard deviation

Acknowledgements

This work is based in part on the author’s doctoral dissertation submitted to Seoul National University (August 2024)78. The current manuscript significantly expands the interpretation and theoretical framing while reanalyzing the previously collected dataset.· The authors thank MiriCanvas (https://www.miricanvas.com) for providing the tools used to create the figures in the Methods section.

Funding

This research was funded by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2022R1A2C1007780). This research was also funded by grant no. 14-2021-0018 from the SNUBH Research Fund. The funding did not participate in any aspect of the research process, including study design, data collection and analysis, interpretation of results, manuscript preparation, or publication decisions.

Author information

Authors and Affiliations

  1. Department of Research Planning, Biomedical Research Institute, Seoul National University Bundang Hospital, Seongnam, South Korea

    Jung-Min Kim

  2. Department of Rehabilitation Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea

    Jung-Min Kim, Sun-Young Choi, Hyun-Jin Kim & Ju Seok Ryu

  3. College of Engineering Industrial and Management Engineering, Korea University, Seoul, South Korea

    Min-Seop Kim

  4. Seoul National University College of Medicine, Seoul, South Korea

    Ju Seok Ryu

  5. Department of Rehabilitation Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, 82 Gumi-ro 173 Beon-gil, Bundang-gu, Seongnam-si, 13620, Gyeonggi-do, South Korea

    Ju Seok Ryu

Authors
  1. Jung-Min Kim
    View author publications

    Search author on:PubMed Google Scholar

  2. Min-Seop Kim
    View author publications

    Search author on:PubMed Google Scholar

  3. Sun-Young Choi
    View author publications

    Search author on:PubMed Google Scholar

  4. Hyun-Jin Kim
    View author publications

    Search author on:PubMed Google Scholar

  5. Ju Seok Ryu
    View author publications

    Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ju Seok Ryu.

Ethics declarations

Competing interests

Dr Ryu, Jung-Min Kim and Min-Seop Kim are named inventors on patent No. 10-2023-0095566, jointly held by RS Rehab and Bundang Seoul National University Hospital. The remaining authors declare no competing interests.

Ethics approval and consent to participate

All methods were carried out in accordance with the relevant guidelines and regulations, including the Declaration of Helsinki. This study was conducted after obtaining approval from the Seoul National University Bundang Hospital Institutional Review Board (IRB No.: B-2109-707-303). Written informed consent was obtained from all participants prior to enrollment; for participants unable to provide consent owing to age or health conditions, written informed consent was obtained from their guardians. All participants scheduled for VFSS were provided with detailed information about the study and gave their informed consent before participation. Healthy volunteers were recruited through a public notice (online and hospital bulletin) and participated only after providing consent.

Consent for publication

All voice recordings were anonymized using de-identification numbers to protect participant privacy. Written informed consent was obtained from all participants for the publication of this study after receiving a verbal explanation from the researcher.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download DOCX )

Supplementary Material 2 (download DOCX )

Supplementary Material 3 (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, JM., Kim, MS., Choi, SY. et al. Post-swallowing voice-based aspiration screening in dysphagia using a deep learning approach: insights from audio segmentation. Sci Rep (2026). https://doi.org/10.1038/s41598-026-53618-w

Download citation

  • Received: 09 December 2025

  • Accepted: 13 May 2026

  • Published: 21 May 2026

  • DOI: https://doi.org/10.1038/s41598-026-53618-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Dysphagia
  • Aspiration
  • Voice analysis
  • Deep learning
  • Voice segmentation
  • Real-time monitoring
Download PDF

Associated content

Collection

Applications of artificial intelligence in video- and audio-signal processing

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing