Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Brain-inspired perception-decision machine for fake speech detection
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 05 March 2026

Brain-inspired perception-decision machine for fake speech detection

  • Chang Feng1,
  • Xiaolong Wu2,
  • Hamdulla Askar2,
  • Mingxing Xu1,
  • Lihong Cao3 &
  • …
  • Thomas Fang Zheng1 

Scientific Reports , Article number:  (2026) Cite this article

  • 695 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Engineering
  • Mathematics and computing

Abstract

The rapid advancement of Artificial Intelligence Generated Content (AIGC) technologies challenges fake speech detection with an ever-evolving diversity of spoofed audio. Current approaches, which rely on a classification-based perspective, are highly dependent on a big amount of training data and show limited generalization to unseen attack types. To address these limitations, this paper introduces a brain-inspired, multi-clue detection paradigm. We propose a perception-decision machine composed of two core components. The perception module utilizes multiple independent detectors, each optimized for Maximum Detection Precision (MaxDP) to identify a specific forgery clue. By standardizing their outputs into binary Boolean values, this design allows for flexible computational models. The decision-making module then renders a final judgment by first evaluating learned combinations of the detected clues through a logical reasoning process. The outcomes of this reasoning are then aggregated using a variable-length OR operation, a mechanism that enables the seamless incremental learning of new forgery clues without retraining the entire system. Our results validate the effectiveness of the multi-clue detection perspective, demonstrating the framework’s potential for enhanced explainability and practical adaptability to new threats.

Similar content being viewed by others

A multiscale brain emulation-based artificial intelligence framework for dynamic environments

Article Open access 21 May 2025

Language and culture internalization for human-like autotelic AI

Article 20 December 2022

AI hallucination: towards a comprehensive classification of distorted information in artificial intelligence-generated content

Article Open access 27 September 2024

Data availability

The fake speech data in this study is open source. ASVspoof2019 LA can be accessed at https://datashare.ed. ac.uk/handle/10283/3336. ASVspoof2021 LA can be accessed at https://zenodo.org/record/4837263. CFAD can be accessed at https://zenodo.org/records/8122764.

References

  1. Kaur, N. & Singh, P. Conventional and contemporary approaches used in text to speech synthesis: A review. Artif. Intell. Rev. 56, 5837–5880 (2023).

    Google Scholar 

  2. Walczyna, T. & Piotrowski, Z. Overview of voice conversion methods based on deep learning. Appl. Sci. 13, 3100 (2023).

    Google Scholar 

  3. Shah, A. J. & Patil, H. A. Significance of lower frequency regions for audio deepfake detection. In Proceedings of 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). 1–6 (IEEE, 2024).

  4. Kirchhübel, C. & Brown, G. Spoofed speech from the perspective of a forensic phonetician. In Proceedings of the 2022 Interspeech. 1308–1312 (Incheon, 2022).

  5. Sun, C., Jia, S., Hou, S. & Lyu, S. Ai-synthesized voice detection using neural vocoder artifacts. In Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 904–912 (IEEE, 2023).

  6. Yang, J., Das, R. K. & Li, H. Significance of subband features for synthetic speech detection. IEEE Trans. Inf. For. Secur. 15, 2160–2170 (2020).

    Google Scholar 

  7. Zhang, Y., Wang, W. & Zhang, P. The effect of silence and dual-band fusion in anti-spoofing system. In Proceedings of 2021 Interspeech. 4279–4283 (International Speech Communication Association, 2021).

  8. Todisco, M., Delgado, H. & Evans, N. W. A new feature for automatic speaker verification anti-spoofing: Constant q cepstral coefficients. In Proceedings of the 9th Speaker and Language Recognition Workshop Odyssey. Vol. 2016. 283–290. 10.21437/Odyssey.2016-4 (Bilbao, 2016).

  9. Wu, Z., Das, R. K., Yang, J. & Li, H. Light convolutional neural network with feature genuinization for detection of synthetic speech attacks. In Proceedings of 2020 Interspeech. 1101–1105 (International Speech Communication Association, 2020).

  10. Wang, C. et al. Detection of cross-dataset fake audio based on prosodic and pronunciation features. In Proceedings of 2023 Interspeech. 3844–3848 (International Speech Communication Association, 2023).

  11. Tak, H., weon Jung, J., Patino, J., Todisco, M. & Evans, N. Graph attention networks for anti-spoofing. In Proceedings of the 22nd Interspeech Conference. 2356–2360. 10.21437/Interspeech.2021-993 (ISCA, 2021).

  12. Tak, H. et al. End-to-end anti-spoofing with rawnet2. In Proceedings of the 46th IEEE International Conference on Acoustics, Speech, and Signal Processing. 6369–6373. 10.1109/ICASSP39728.2021.9414234 (IEEE, 2021).

  13. Chen, Y. et al. Rawbmamba: End-to-end bidirectional state space model for audio deepfake detection. In Proceedings of the 2024 Interspeech. 2720–2724 (International Speech Communication Association, 2024).

  14. Chettri, B. et al. Ensemble models for spoofing detection in automatic speaker verification. In Proceedings of the 20th Interspeech Conference. 1018–1022. 10.21437/Interspeech.2019-2505 (ISCA, 2019).

  15. Lavrentyeva, G. et al. STC Antispoofing systems for the ASVspoof2019 challenge. In Proceedings of the 20th Interspeech Conference. 1033–1037. 10.21437/Interspeech. 2019–1768 (ISCA, 2019).

  16. Tak, H., Patino, J., Nautsch, A., Evans, N. & Todisco, M. Spoofing attack detection using the non-linear fusion of sub-band classifiers. In Proceedings of the 21st Interspeech Conference. 1106–1110. 10.21437/Interspeech.2020-1844 (ISCA, 2020).

  17. Li, M., Ahmadiadli, Y. & Zhang, X.-P. A survey on speech deepfake detection. ACM Comput. Surv. 57, 1–38 (2025).

    Google Scholar 

  18. Dhamyal, H., Ali, A., Qazi, I. A. & Raza, A. A. Using self attention dnns to discover phonemic features for audio deep fake detection. In Proceedings of 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 1178–1184 (IEEE, 2021).

  19. Li, K., Lu, X., Akagi, M. & Unoki, M. Contributions of jitter and shimmer in the voice for fake audio detection. IEEE Access 11, 84689–84698 (2023).

    Google Scholar 

  20. Parise, C. V. & Ernst, M. O. Correlation detection as a general mechanism for multisensory integration. Nat. Commun. 7, 11543 (2016).

    Google Scholar 

  21. Pesnot Lerousseau, J., Parise, C. V., Ernst, M. O. & van Wassenhove, V. Multisensory correlation computations in the human brain identified by a time-resolved encoding model. Nat. Commun. 13, 2489 (2022).

    Google Scholar 

  22. Rohlf, S., Li, L., Bruns, P. & Röder, B. Multisensory integration develops prior to crossmodal recalibration. Curr. Biol. 30, 1726–1732 (2020).

    Google Scholar 

  23. Wang, X. et al. Asvspoof 2019: A large-scale public database of synthesized, conted and replayed speech. Comput. Speech Lang. 64, 101114. https://doi.org/10.1016/j.csl.2020.101114 (2020).

    Google Scholar 

  24. Liu, X. et al. Asvspoof 2021: Towards spoofed and deepfake speech detection in the wild. IEEE/ACM Trans. Audio Speech Lang. Process. 31, 2507–2522 (2023).

    Google Scholar 

  25. Ma, H. et al. Cfad: A Chinese dataset for fake audio detection. Speech Commun. 164, 103122 (2024).

    Google Scholar 

  26. Jung, J.-W. et al. Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks. In Proceedings of the 47th IEEE International Conference on Acoustics, Speech, and Signal Processing. 6367–6371. 10.1109/ICASSP43922.2022.9747766 (IEEE, 2022).

  27. Tak, H. et al. Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation. In Proceedings of the 12th Speaker and Language Recognition Workshop Odyssey. 112–119 (2022).

  28. Liu, X. et al. Leveraging positional-related local-global dependency for synthetic speech detection. In Proceedings of the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1–5 (IEEE, 2023).

  29. Wang, S. et al. Memristor-based adaptive neuromorphic perception in unstructured environments. Nat. Commun. 15, 4671 (2024).

    Google Scholar 

  30. Yu, F. et al. Brain-inspired multimodal hybrid neural network for robot place recognition. Sci. Robot. 8, eabm6996 (2023).

  31. Lin, X. et al. A brain-inspired computational model for spatio-temporal information processing. Neural Netw. 143, 74–87 (2021).

    Google Scholar 

  32. Grinberg, P., Kumar, A., Koppisetti, S. & Bharaj, G. What does an audio deepfake detector focus on? a study in the time domain. In ICASSP 2025–2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5 (IEEE, 2025).

  33. Jung, H. & Oh, Y. Towards better explanations of class activation mapping. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1336–1344 (2021).

  34. Ravanelli, M. & Bengio, Y. Speaker recognition from raw waveform with SincNet. In Proceedings of 2018 IEEE Spoken Language Technology Workshop. 1021–1028 (IEEE, 2018).

  35. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. 770–778 (IEEE, 2016).

  36. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. 770–778 (IEEE, 2016).

  37. Babu, A. et al. Xls-r: Self-supervised cross-lingual speech representation learning at scale. In Proceedings of the 23rd Interspeech Conference. 2278–2282. 10.21437/Interspeech.2022-143 (ISCA, 2022).

Download references

Acknowledgements

Acknowledged to the Beijing Municipal Science and Technology Commission for its funding and support of Project Z221100001222005 under the Beijing Science and Technology Plan. And also acknowledged to Tianshan Talents Cultivation Program - Leading Talents for Scientific and Technological Innovation (No. 2024TSYCLJ0002).

Funding

This work was funded by Beijing Science and Technology Financial Innovation Support Project (Z221100001222005).

Author information

Authors and Affiliations

  1. Center for Speech and Language Technologies, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, 100084, China

    Chang Feng, Mingxing Xu & Thomas Fang Zheng

  2. School of Computer Science and Technology, Xinjiang University, Urumqi, 830000, China

    Xiaolong Wu & Hamdulla Askar

  3. Neuroscience and Intelligent Media Institute, Communication University of China, Beijing, 100024, China

    Lihong Cao

Authors
  1. Chang Feng
    View author publications

    Search author on:PubMed Google Scholar

  2. Xiaolong Wu
    View author publications

    Search author on:PubMed Google Scholar

  3. Hamdulla Askar
    View author publications

    Search author on:PubMed Google Scholar

  4. Mingxing Xu
    View author publications

    Search author on:PubMed Google Scholar

  5. Lihong Cao
    View author publications

    Search author on:PubMed Google Scholar

  6. Thomas Fang Zheng
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Chang Feng analyzed the data, performed the experiments and wrote the initial manuscript. Xiaolong Wu edited the figures, provided feedback and revised the manuscript. Hamdulla Askar and Mingxing Xu supervised the research. Lihong Cao reviewed and edited the final manuscript. Thomas Fang Zheng conceived the project, designed the study, reviewed and edited the final manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Thomas Fang Zheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Table 3.

Table 3 Detailed thresholds and performance for each detector on the development set. MaxDP strategy ensures high precision across all detectors, regardless of the missed detection in a single detector.
Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, C., Wu, X., Askar, H. et al. Brain-inspired perception-decision machine for fake speech detection. Sci Rep (2026). https://doi.org/10.1038/s41598-026-41859-8

Download citation

  • Received: 01 July 2025

  • Accepted: 23 February 2026

  • Published: 05 March 2026

  • DOI: https://doi.org/10.1038/s41598-026-41859-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Human-centred pattern recognition

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics