Collection 

AI and Machine Learning in Acoustic Signal Processing

Submission status
Open
Submission deadline

In recent years, the intersection of artificial intelligence (AI), machine learning (ML), and acoustic signal processing has emerged as a rapidly advancing field, offering new ways to analyze, interpret, and enhance sound. The integration of AI and ML technologies into acoustic signal processing is poised to revolutionize a wide range of applications, from speech recognition and audio content analysis to environmental sound monitoring and biomedical diagnostics. This Special Collection seeks to capture the latest innovations and breakthroughs in this dynamic area, providing a platform for researchers and practitioners to showcase their work on cutting-edge advancements and challenges.

The primary aim of this Collection is to explore how AI and ML techniques can be harnessed to address complex challenges in acoustic signal processing. These technologies offer significant advantages, such as the ability to process large volumes of data, adapt to new patterns in real-time, and enhance the accuracy and efficiency of signal analysis and enhancement tasks. This Collection aims to attract papers that advance both theoretical research and practical solutions, offering novel tools that can be implemented in real-world applications. The collection will encompass a broad range of topics, including:

  • Acoustic Scene Analysis: Developing algorithms for recognizing and classifying various acoustic environments, crucial for applications like smart surveillance, urban noise monitoring, and context-aware systems.
  • Signal Enhancement: AI-driven techniques for noise reduction, dereverberation, and signal clarity improvement are of particular interest. These methods are vital in scenarios such as telecommunications, hearing aids, robust speech/underwater communication, and weak underwater signal detection in various types of noisy environments.
  • Source Localization and Separation: Advanced ML models for detecting, localizing, and isolating sound sources in multi-source environments. Applications include speaker separation in conference systems, enhancing audio quality in entertainment, improving environmental awareness in autonomous systems, and improving the accuracy and resolution in underwater source localization/detection.
  • Speech and Audio Processing: Applying deep learning models to tasks such as robust speech recognition (ASR), audio generation and synthesis, voice conversion, emotion recognition, speech translation. These tasks are essential for developing more natural and intuitive human-computer interaction systems.
  • Music Signal Processing: Developing AI models for accessing, analyzing, manipulating, and creating music, such as music information retrieval, music generation, music synthesis, computer accompaniment and machine musicianship.
  • Environmental and Biomedical Acoustics: AI applications in these areas involve using acoustic signals for wildlife monitoring, detecting environmental changes, or medical diagnostics, such as analyzing heart or lung sounds. Such as processing underwater acoustics signals for marine biology and environmental monitoring, analyzing heart or lung sounds for diagnosis, ultrasonic signal processing for medical imaging
  • Acoustic Inspection Methods: Novel methodologies for inspection of other assets including the processing and classification of ultrasonic signals. For example, the detection of anomalies in acoustic data, time-series classification, crack/defect localisation.

Contributions showcasing interdisciplinary approaches or collaborations between AI, ML, and acoustic signal processing experts are particularly encouraged. Submissions that highlight new methodologies, innovative applications, and emerging trends shaping the future of acoustic signal processing are especially welcome​.

Submit manuscript
Manuscript editing services
An illustration of a robot processing voice commands with digital sound waves and a friendly interface

Editors

Jing Chen, PhD, Peking University, China

Dr. Jing Chen achieved her PhD degree in signal and information processing from Peking University, and then worked as a research fellow in the auditory perception lab of the University of Cambridge, before she joined in Peking University as a research professor in 2013. Her research interests are in auditory and speech perception, auditory computational models, speech signal processing, and speech neural decoding via brain-computer-interface.

 

Jing Lu, PhD, Nanjing University, China

Dr. Jing Lu is currently a professor and serves as the deputy head of the Department of Acoustical Science and Engineering, as well as the director of the “Nanjing University – Horizon Intelligent Audio Lab” at Nanjing University. He is also an editorial board member of npj Acoustics and serves as the executive director of the Acoustical Society of China, the vice chair of the Audio Engineering Society of China, and a member of the Engineering Acoustics, Signal Processing, and Speech Communication committees of the Acoustical Society of America. He has served as a member of the international scientific committee or as a session chair at more than 10 important international conferences on acoustics, including ASA, ICA, ICSV, and InterNoise. His research interests encompass audio signal processing, immersive audio, acoustic transducers, and real-time implementation of audio processing systems. He has successfully completed more than 20 government and industry projects and has published over 300 journal and conference papers. He has been granted more than 40 patents, and his research results have been widely adopted in various industries.

Yanmin Qian, PhD, Shanghai Jiao Tong University, China

Dr. Yanmin Qian (Senior Member, IEEE) received the BS degree from the Department of Electronic and Information Engineering, Huazhong University of Science and Technology, Wuhan, China, in 2007, and the PhD degree from the Department of Electronic Engineering, Tsinghua University, Beijing, China, in 2012. Since 2013, he has been with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China, where he is currently a Full Professor. From 2015 to 2016, he was an Associate Research with the Speech Group, Cambridge University Engineering Department, Cambridge, UK. He has authored or coauthored more than 300 papers in peer-reviewed journals and conferences on speech and language processing, including T-ASLP, Speech Communication, ICASSP, INTERSPEECH and ASRU. He has applied for more than 80 Chinese and American patents and won 6 championships of international challenges. His current research interests include speech and audio signal processing, automatic speech recognition and translation, speaker and language recognition, speech separation and enhancement, music generation and understanding, speech emotion perception, multimodal information processing, natural language understanding, deep learning and multi-media signal processing. He was the recipient of several top academic awards in China, including Chang Jiang Scholars Program of the Ministry of Education and Excellent Youth Fund of the National Natural Science Foundation of China. 

Haiqiang Niu, PhD, Institute of Acoustics, Chinese Academy of Sciences, China

Dr. Haiqiang Niu is a full Professor at the State Key Laboratory of Acoustics, Institute of Acoustics, Chinese Academy of Sciences. He is currently serving as an associate editor for the Journal of the Acoustical Society of America (JASA). He is also a member of the Young Scientist Committee for several journals, including Chinese Physics Letters, Chinese Physics B, Acta Physica Sinica, Physics, and Acta Acustica. Additionally, he is a member of the Youth Innovation Promotion Association of the Chinese Academy of Sciences. Dr. Niu received his PhD in Acoustics from the Institute of Acoustics, Chinese Academy of Sciences, in 2014. From 2015 to 2017, he worked as a postdoctoral researcher at the Scripps Institution of Oceanography, University of California San Diego. He became an associate professor in 2018 and was promoted to full professor in 2021. His research interests include machine learning in ocean acoustics, sparse Bayesian learning in acoustical signal processing, geoacoustic inversion, and ocean acoustical tomography.

Alan Hunter, PhD, University of Bath, UK

Dr. Alan Hunter received the BE(Hons) and PhD degrees in Electrical and Electronic Engineering from the University of Canterbury, New Zealand in 2001 and 2006, respectively. From 2007 to 2010, he was a Research Associate with the University of Bristol, United Kingdom, and from 2010 to 2014, he was a Defence Scientist with TNO (Netherlands Organisation for Applied Scientific Research) in The Hague, Netherlands. In 2014, he joined the Faculty of Engineering at the University of Bath, United Kingdom, where he is currently a Reader (Associate Professor) and Deputy Head of Department in Mechanical Engineering. Since 2017, he has also been an Adjunct Associate Professor with the Department of Informatics, University of Oslo, Norway. His research interests are in underwater acoustics, signal processing, imaging, and machine intelligence. He is particularly interested in applications in underwater remote sensing using sonar and marine robotics. Alan is a Senior Member of the Institute of Electrical and Electronic Engineers (IEEE), and Associate Editor for the IEEE Journal of Oceanic Engineering.

Timothy Rogers, PhD, The University of Sheffield, UK

Dr. Timothy Rogers is a Senior Lecturer in the Dynamics Research Group at The University of Sheffield. His research interests lie at the intersection of data-driven analysis and engineering dynamics. Problems of interest include machine learning, particularly Bayesian methods, for signal processing, system identification and monitoring of engineering systems. His research is motivated by combining maximum value from engineering insight and understanding with the flexibility of models which are able to learn from data, when doing so there is interest not just in predictive accuracy of developed models but also in enabling interpretability, in a physical sense, of the solutions being returned.