Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
LoRA-enhanced whisper for resource-efficient heliox speech recognition
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 18 March 2026

LoRA-enhanced whisper for resource-efficient heliox speech recognition

  • Weichang Mao1 na1,
  • Haojie Gu2 na1,
  • Jia He1,
  • Yu Li1 &
  • …
  • Shifeng Wang1 

Scientific Reports , Article number:  (2026) Cite this article

  • 382 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Ocean sciences

Abstract

In saturation diving, reliable speech communication under helium–oxygen (Heliox) conditions is critical for operational safety and efficiency. Heliox speech exhibits severe acoustic mismatch relative to standard air speech, and recognition performance further degrades in the presence of chamber/environmental noise and domain-specific terminology. To study this problem in a realistic setting, we collected Heliox speech recordings at two saturation conditions (12 m and 25 m equivalent depths) and constructed a corresponding dataset. We then adapt Whisper-large-v3 via Low-Rank Adaptation (LoRA) to enable parameter-efficient domain adaptation, and enhance decoding using practical inference-time components, including hotword biasing, language-model (LM) reranking, test-time augmentation (TTA) with speed perturbation, and rolling context prompts, together with chunked decoding for stable deployment. On our Heliox evaluation sets, the proposed system achieved a character error rate (CER) of 4.725% at a water depth of 12 m and a CER of 7.165% at a water depth of 25 m, under the reported decoding configuration, while maintaining practical inference cost on GPU/CPU server platforms. We note that inference-time strategies provide complementary robustness gains but do not fully eliminate the need for domain adaptation under severe Heliox-induced shifts.

Data availability

Restricted Access: The helium speech data supporting the findings of this study is available from PLA Naval Medical Center, but access to this data is restricted. This data has been licensed for use in this study and is therefore not publicly available. However, you may request access to this data with permission from PLA Naval Medical Center. It is included in the main text of the paper or in supplemental information (for the raw data, not summary data such as mean and variance).

References

  1. Richards, M. & Schafer, R. Acoustic tube analysis of formant bandwidths and frequencies in helium speech. In ICASSP 1984–1984 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 64–67 (IEEE, 1984).

  2. Nakatsui, M. Comments on helium speech—insight into speech event needed. IEEE Trans. Acoust. Speech Signal Process. 22, 472–473. https://doi.org/10.1109/TASSP.1974.1162606 (1974).

    Google Scholar 

  3. Lunde, P. Acoustic transmission-line analysis of formants in hyperbaric helium speech. In ICASSP 1985–1985 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1141–1144 (IEEE, 1985).

  4. Richards, M. Helium speech enhancement using the short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 30, 841–853. https://doi.org/10.1109/TASSP.1982.1163973 (1982).

    Google Scholar 

  5. Copel, M. Helium voice unscrambling. IEEE Trans. Audio Electroacoust. 14, 122–126. https://doi.org/10.1109/TAU.1966.1161862 (1966).

    Google Scholar 

  6. Nakatsui, M., Suzuki, J., Takasugi, T. & Tanaka, R. Nature of helium-speech and its unscrambling. In Ocean 73—IEEE International Conference on Engineering in the Ocean Environment, 137–140. https://doi.org/10.1109/OCEANS.1973.1161251 (1973).

  7. Radford, A. et al. Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning, ICML’23 (JMLR.org, 2023).

  8. Hu, E. J. et al. Lora: Low-rank adaptation of large language models (2021). arXiv:2106.09685.

  9. Baby, A., Joseph, G. & Singh, S. Robust speaker personalisation using generalized low-rank adaptation for automatic speech recognition. In ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 11381–11385 (IEEE, 2024).

  10. Masumura, R. et al. Sequence-level consistency training for semi-supervised end-to-end automatic speech recognition. In ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7054–7058 (IEEE, 2020).

  11. Wang, P. & Van hamme, H. Exploring width-adaptive transformers for automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 33, 4210–4225. https://doi.org/10.1109/TASLPRO.2025.3617232 (2025).

  12. Hazrati, O., Ghaffarzadegan, S. & Hansen, J. H. Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation. In ICASSP 2015—2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5093–5097 (IEEE, 2015).

  13. Grativol, L., Léonardon, M., Muller, G., Fresse, V. & Arzel, M. Flocora: Federated learning compression with low-rank adaptation. In 2024 32nd European Signal Processing Conference (EUSIPCO), 1786–1790. https://doi.org/10.23919/EUSIPCO63174.2024.10715461 (2024).

  14. Zhao, Z. & Shi, D. Loradip: Low-rank adaptation with deep image prior for generative low-light image enhancement. IEEE Trans. Artif. Intell. 6, 909–920. https://doi.org/10.1109/TAI.2024.3499950 (2025).

    Google Scholar 

  15. Xu, W., Liu, M. & Wen, B. Low-rank transformer adaptation for arbitrary style transfer. In ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2025).

  16. Lucas, A., Lopez-Tapia, S., Molina, R. & Katsaggelos, A. K. Efficient fine-tuning of neural networks for artifact removal in deep learning for inverse imaging problems. In 2019 IEEE International Conference on Image Processing (ICIP), 3591–3595. https://doi.org/10.1109/ICIP.2019.8803715 (2019).

  17. Zhang, L. et al. Distance-based weight transfer for fine-tuning from near-field to far-field speaker verification. In ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).

  18. Chen, L.-W. & Rudnicky, A. Exploring wav2vec 2.0 fine tuning for improved speech emotion recognition. In ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).

Download references

Funding

This work was supported by the 2024 Scientific Research Special Fund of the PLA Naval Medical Center, with the project number 24DZX01 and the National Natural Science Foundation of China under Grant 62571110.

Author information

Author notes
  1. Weichang Mao and Haojie Gu contributed equally to this work.

Authors and Affiliations

  1. PLA Naval Medical Center, Shanghai, 200433, China

    Weichang Mao, Jia He, Yu Li & Shifeng Wang

  2. School of Information and Intelligent Science, Donghua University, Shanghai, 201620, China

    Haojie Gu

Authors
  1. Weichang Mao
    View author publications

    Search author on:PubMed Google Scholar

  2. Haojie Gu
    View author publications

    Search author on:PubMed Google Scholar

  3. Jia He
    View author publications

    Search author on:PubMed Google Scholar

  4. Yu Li
    View author publications

    Search author on:PubMed Google Scholar

  5. Shifeng Wang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

The conceptualization and investigation were led by W.M., who also acquired funding and supervised the overall project administration, validation, and resource management. H.G. served as the submitting author and was responsible for methodology, software implementation, data curation, formal analysis, visualization, and writing of both the original draft and subsequent revisions. J.H. and Y.L. provided supervision and critical feedback throughout the study. S.W. acted as the corresponding author, contributing to funding acquisition, project administration, and overall supervision of the research. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Shifeng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, W., Gu, H., He, J. et al. LoRA-enhanced whisper for resource-efficient heliox speech recognition. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38201-7

Download citation

  • Received: 11 November 2025

  • Accepted: 29 January 2026

  • Published: 18 March 2026

  • DOI: https://doi.org/10.1038/s41598-026-38201-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Helium speech recognition
  • LoRA
  • Whisper
  • Hotword bias
  • LM reranking
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing Anthropocene

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Anthropocene