Abstract
In saturation diving, reliable speech communication under helium–oxygen (Heliox) conditions is critical for operational safety and efficiency. Heliox speech exhibits severe acoustic mismatch relative to standard air speech, and recognition performance further degrades in the presence of chamber/environmental noise and domain-specific terminology. To study this problem in a realistic setting, we collected Heliox speech recordings at two saturation conditions (12 m and 25 m equivalent depths) and constructed a corresponding dataset. We then adapt Whisper-large-v3 via Low-Rank Adaptation (LoRA) to enable parameter-efficient domain adaptation, and enhance decoding using practical inference-time components, including hotword biasing, language-model (LM) reranking, test-time augmentation (TTA) with speed perturbation, and rolling context prompts, together with chunked decoding for stable deployment. On our Heliox evaluation sets, the proposed system achieved a character error rate (CER) of 4.725% at a water depth of 12 m and a CER of 7.165% at a water depth of 25 m, under the reported decoding configuration, while maintaining practical inference cost on GPU/CPU server platforms. We note that inference-time strategies provide complementary robustness gains but do not fully eliminate the need for domain adaptation under severe Heliox-induced shifts.
Data availability
Restricted Access: The helium speech data supporting the findings of this study is available from PLA Naval Medical Center, but access to this data is restricted. This data has been licensed for use in this study and is therefore not publicly available. However, you may request access to this data with permission from PLA Naval Medical Center. It is included in the main text of the paper or in supplemental information (for the raw data, not summary data such as mean and variance).
References
Richards, M. & Schafer, R. Acoustic tube analysis of formant bandwidths and frequencies in helium speech. In ICASSP 1984–1984 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 64–67 (IEEE, 1984).
Nakatsui, M. Comments on helium speech—insight into speech event needed. IEEE Trans. Acoust. Speech Signal Process. 22, 472–473. https://doi.org/10.1109/TASSP.1974.1162606 (1974).
Lunde, P. Acoustic transmission-line analysis of formants in hyperbaric helium speech. In ICASSP 1985–1985 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1141–1144 (IEEE, 1985).
Richards, M. Helium speech enhancement using the short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 30, 841–853. https://doi.org/10.1109/TASSP.1982.1163973 (1982).
Copel, M. Helium voice unscrambling. IEEE Trans. Audio Electroacoust. 14, 122–126. https://doi.org/10.1109/TAU.1966.1161862 (1966).
Nakatsui, M., Suzuki, J., Takasugi, T. & Tanaka, R. Nature of helium-speech and its unscrambling. In Ocean 73—IEEE International Conference on Engineering in the Ocean Environment, 137–140. https://doi.org/10.1109/OCEANS.1973.1161251 (1973).
Radford, A. et al. Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning, ICML’23 (JMLR.org, 2023).
Hu, E. J. et al. Lora: Low-rank adaptation of large language models (2021). arXiv:2106.09685.
Baby, A., Joseph, G. & Singh, S. Robust speaker personalisation using generalized low-rank adaptation for automatic speech recognition. In ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 11381–11385 (IEEE, 2024).
Masumura, R. et al. Sequence-level consistency training for semi-supervised end-to-end automatic speech recognition. In ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7054–7058 (IEEE, 2020).
Wang, P. & Van hamme, H. Exploring width-adaptive transformers for automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 33, 4210–4225. https://doi.org/10.1109/TASLPRO.2025.3617232 (2025).
Hazrati, O., Ghaffarzadegan, S. & Hansen, J. H. Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation. In ICASSP 2015—2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5093–5097 (IEEE, 2015).
Grativol, L., Léonardon, M., Muller, G., Fresse, V. & Arzel, M. Flocora: Federated learning compression with low-rank adaptation. In 2024 32nd European Signal Processing Conference (EUSIPCO), 1786–1790. https://doi.org/10.23919/EUSIPCO63174.2024.10715461 (2024).
Zhao, Z. & Shi, D. Loradip: Low-rank adaptation with deep image prior for generative low-light image enhancement. IEEE Trans. Artif. Intell. 6, 909–920. https://doi.org/10.1109/TAI.2024.3499950 (2025).
Xu, W., Liu, M. & Wen, B. Low-rank transformer adaptation for arbitrary style transfer. In ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2025).
Lucas, A., Lopez-Tapia, S., Molina, R. & Katsaggelos, A. K. Efficient fine-tuning of neural networks for artifact removal in deep learning for inverse imaging problems. In 2019 IEEE International Conference on Image Processing (ICIP), 3591–3595. https://doi.org/10.1109/ICIP.2019.8803715 (2019).
Zhang, L. et al. Distance-based weight transfer for fine-tuning from near-field to far-field speaker verification. In ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
Chen, L.-W. & Rudnicky, A. Exploring wav2vec 2.0 fine tuning for improved speech emotion recognition. In ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
Funding
This work was supported by the 2024 Scientific Research Special Fund of the PLA Naval Medical Center, with the project number 24DZX01 and the National Natural Science Foundation of China under Grant 62571110.
Author information
Authors and Affiliations
Contributions
The conceptualization and investigation were led by W.M., who also acquired funding and supervised the overall project administration, validation, and resource management. H.G. served as the submitting author and was responsible for methodology, software implementation, data curation, formal analysis, visualization, and writing of both the original draft and subsequent revisions. J.H. and Y.L. provided supervision and critical feedback throughout the study. S.W. acted as the corresponding author, contributing to funding acquisition, project administration, and overall supervision of the research. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Mao, W., Gu, H., He, J. et al. LoRA-enhanced whisper for resource-efficient heliox speech recognition. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38201-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-38201-7