LoRA-enhanced whisper for resource-efficient heliox speech recognition

Mao, Weichang; Gu, Haojie; He, Jia; Li, Yu; Wang, Shifeng

doi:10.1038/s41598-026-38201-7

Download PDF

Article
Open access
Published: 18 March 2026

LoRA-enhanced whisper for resource-efficient heliox speech recognition

Weichang Mao¹^na1,
Haojie Gu²^na1,
Jia He¹,
Yu Li¹ &
…
Shifeng Wang¹

Scientific Reports , Article number: (2026) Cite this article

382 Accesses
1 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

In saturation diving, reliable speech communication under helium–oxygen (Heliox) conditions is critical for operational safety and efficiency. Heliox speech exhibits severe acoustic mismatch relative to standard air speech, and recognition performance further degrades in the presence of chamber/environmental noise and domain-specific terminology. To study this problem in a realistic setting, we collected Heliox speech recordings at two saturation conditions (12 m and 25 m equivalent depths) and constructed a corresponding dataset. We then adapt Whisper-large-v3 via Low-Rank Adaptation (LoRA) to enable parameter-efficient domain adaptation, and enhance decoding using practical inference-time components, including hotword biasing, language-model (LM) reranking, test-time augmentation (TTA) with speed perturbation, and rolling context prompts, together with chunked decoding for stable deployment. On our Heliox evaluation sets, the proposed system achieved a character error rate (CER) of 4.725% at a water depth of 12 m and a CER of 7.165% at a water depth of 25 m, under the reported decoding configuration, while maintaining practical inference cost on GPU/CPU server platforms. We note that inference-time strategies provide complementary robustness gains but do not fully eliminate the need for domain adaptation under severe Heliox-induced shifts.

Data availability

Restricted Access: The helium speech data supporting the findings of this study is available from PLA Naval Medical Center, but access to this data is restricted. This data has been licensed for use in this study and is therefore not publicly available. However, you may request access to this data with permission from PLA Naval Medical Center. It is included in the main text of the paper or in supplemental information (for the raw data, not summary data such as mean and variance).

References

Richards, M. & Schafer, R. Acoustic tube analysis of formant bandwidths and frequencies in helium speech. In ICASSP 1984–1984 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 64–67 (IEEE, 1984).
Nakatsui, M. Comments on helium speech—insight into speech event needed. IEEE Trans. Acoust. Speech Signal Process. 22, 472–473. https://doi.org/10.1109/TASSP.1974.1162606 (1974).
Google Scholar
Lunde, P. Acoustic transmission-line analysis of formants in hyperbaric helium speech. In ICASSP 1985–1985 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1141–1144 (IEEE, 1985).
Richards, M. Helium speech enhancement using the short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 30, 841–853. https://doi.org/10.1109/TASSP.1982.1163973 (1982).
Google Scholar
Copel, M. Helium voice unscrambling. IEEE Trans. Audio Electroacoust. 14, 122–126. https://doi.org/10.1109/TAU.1966.1161862 (1966).
Google Scholar
Nakatsui, M., Suzuki, J., Takasugi, T. & Tanaka, R. Nature of helium-speech and its unscrambling. In Ocean 73—IEEE International Conference on Engineering in the Ocean Environment, 137–140. https://doi.org/10.1109/OCEANS.1973.1161251 (1973).
Radford, A. et al. Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning, ICML’23 (JMLR.org, 2023).
Hu, E. J. et al. Lora: Low-rank adaptation of large language models (2021). arXiv:2106.09685.
Baby, A., Joseph, G. & Singh, S. Robust speaker personalisation using generalized low-rank adaptation for automatic speech recognition. In ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 11381–11385 (IEEE, 2024).
Masumura, R. et al. Sequence-level consistency training for semi-supervised end-to-end automatic speech recognition. In ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7054–7058 (IEEE, 2020).
Wang, P. & Van hamme, H. Exploring width-adaptive transformers for automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 33, 4210–4225. https://doi.org/10.1109/TASLPRO.2025.3617232 (2025).
Hazrati, O., Ghaffarzadegan, S. & Hansen, J. H. Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation. In ICASSP 2015—2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5093–5097 (IEEE, 2015).
Grativol, L., Léonardon, M., Muller, G., Fresse, V. & Arzel, M. Flocora: Federated learning compression with low-rank adaptation. In 2024 32nd European Signal Processing Conference (EUSIPCO), 1786–1790. https://doi.org/10.23919/EUSIPCO63174.2024.10715461 (2024).
Zhao, Z. & Shi, D. Loradip: Low-rank adaptation with deep image prior for generative low-light image enhancement. IEEE Trans. Artif. Intell. 6, 909–920. https://doi.org/10.1109/TAI.2024.3499950 (2025).
Google Scholar
Xu, W., Liu, M. & Wen, B. Low-rank transformer adaptation for arbitrary style transfer. In ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2025).
Lucas, A., Lopez-Tapia, S., Molina, R. & Katsaggelos, A. K. Efficient fine-tuning of neural networks for artifact removal in deep learning for inverse imaging problems. In 2019 IEEE International Conference on Image Processing (ICIP), 3591–3595. https://doi.org/10.1109/ICIP.2019.8803715 (2019).
Zhang, L. et al. Distance-based weight transfer for fine-tuning from near-field to far-field speaker verification. In ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
Chen, L.-W. & Rudnicky, A. Exploring wav2vec 2.0 fine tuning for improved speech emotion recognition. In ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).

Download references

Funding

This work was supported by the 2024 Scientific Research Special Fund of the PLA Naval Medical Center, with the project number 24DZX01 and the National Natural Science Foundation of China under Grant 62571110.

Author information

Weichang Mao and Haojie Gu contributed equally to this work.

Authors and Affiliations

PLA Naval Medical Center, Shanghai, 200433, China
Weichang Mao, Jia He, Yu Li & Shifeng Wang
School of Information and Intelligent Science, Donghua University, Shanghai, 201620, China
Haojie Gu

Authors

Weichang Mao
View author publications
Search author on:PubMed Google Scholar
Haojie Gu
View author publications
Search author on:PubMed Google Scholar
Jia He
View author publications
Search author on:PubMed Google Scholar
Yu Li
View author publications
Search author on:PubMed Google Scholar
Shifeng Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

The conceptualization and investigation were led by W.M., who also acquired funding and supervised the overall project administration, validation, and resource management. H.G. served as the submitting author and was responsible for methodology, software implementation, data curation, formal analysis, visualization, and writing of both the original draft and subsequent revisions. J.H. and Y.L. provided supervision and critical feedback throughout the study. S.W. acted as the corresponding author, contributing to funding acquisition, project administration, and overall supervision of the research. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Shifeng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Mao, W., Gu, H., He, J. et al. LoRA-enhanced whisper for resource-efficient heliox speech recognition. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38201-7

Download citation

Received: 11 November 2025
Accepted: 29 January 2026
Published: 18 March 2026
DOI: https://doi.org/10.1038/s41598-026-38201-7

LoRA-enhanced whisper for resource-efficient heliox speech recognition

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links