Abstract
Precise forecasting of the Air Quality Index (AQI) is essential for environmental management and public health protection. However, the non-linear and non-stationary nature of AQI time series presents a significant challenge for traditional predictive models. Most current deep learning approaches still face limitations in feature extraction and rely on inefficient manual hyperparameter tuning. To address these constraints, this study proposes an integrated forecasting framework combining Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), Convolutional Neural Networks (CNN), Bidirectional Gated Recurrent Units (BiGRU), and an Attention mechanism. The methodology begins by using CEEMDAN to decompose the complex AQI signals into multiple stable frequency components, which effectively reduces the impact of data noise. Each component is then processed by a hybrid sub-network where CNNs extract local features and BiGRU units capture long-term temporal dependencies. An attention layer is incorporated to dynamically assign weights to critical time steps. Furthermore, an Improved Grey Wolf Optimizer (IGWO) is introduced to automate the hyperparameter search, ensuring optimal network performance without manual intervention. Experimental results on a long-term dataset from Guangzhou (2014–2024) show that the proposed model achieves an MSE of 10.2456 and a coefficient of determination (R2 ) of 0.9615. These findings, supported by detailed ablation studies and cross-city generalization tests, demonstrate that the model is both robust and accurate for real-world air quality early-warning systems.
Similar content being viewed by others
Data availability
The datasets used in this study are publicly available from the Tianqi Houbao (AQI) database athttps:/www.tianqihoubao.com. Descriptive statistics, preprocessing details, sensitivity analyses, and additional results are provided in the SupplementaryMaterial (Tables S1–S7, Figures S1–S4).
References
Hu, J., Chen, J., Ying, Q. & Zhang, H. One-year simulation of ozone and particulate matter in China using WRF/CMAQ modeling system. Atmos. Chem. Phys. 16, 10333–10350. https://doi.org/10.5194/acp-16-10333-2016 (2016).
Zhang, L. Y., Qiu, R. Z. & Hu, X. S. Trend analysis and forecast of PM2.5 in Fuzhou, China using the ARIMA model. Ecol. Indic. 95, 702–710. https://doi.org/10.1016/j.ecolind.2018.08.032 (2018).
Mani, G., Viswanadhapalli, J. K. & Soni, A. Prediction and forecasting of air quality index in chennai using regression and arima time series models. J. Eng. Res. 9 https://doi.org/10.36909/jer.9517 (2021).
Gourav, Rekhi, J. K., Nagrath, P. & Jain, R. Springer, Forecasting air quality of delhi using arima model. In Advances in Data Science, Security and Applications, vol. 612 of Lecture Notes in Electrical Engineering, 315–325, (2019). https://doi.org/10.1007/978-981-15-0372-6_25
Bhatti, U. A. et al. Time series analysis and forecasting of air pollution particulate matter (PM2.5): An SARIMA and factor analysis approach. IEEE Access 9, 41019–41031. https://doi.org/10.1109/ACCESS.2021.3060744 (2021).
Dong, Y., Li, Z. & Zhong, W. Analysis of average air temperature in New York based on SARIMA model. Highlights Sci. Eng. Technol. 49, 148–156. https://doi.org/10.54097/hset.v49i.8495 (2023).
Cecen, R. K. Flight regimes and environmental impact: A linear regression analysis of CO2 emissions for en-route air traffic operations. Int. J. Aeronaut. Space Sci. 26, 2058–2069. https://doi.org/10.1007/s42405-024-00860-z (2025).
Sethi, J. K. & Mittal, M. An efficient correlation based adaptive lasso regression method for air quality index prediction. Earth Sci. Informatics 14, 1777–1786. https://doi.org/10.1007/s12145-021-00618-1 (2021).
Kant, S. From data to decision-making: Utilizing decision tree for air quality monitoring in smart urban areas. Int. J. Inf. Technol. 17, 665–672. https://doi.org/10.1007/s41870-024-02208-y (2025).
Anuradha, Y., Mishra, V. P., Bali, M. & Ara, T. Explainable forecasting of air quality index using a hybrid random forest and arima model. MethodsX 15, 103517. https://doi.org/10.1016/j.mex.2025.103517 (2025).
Chao, B. & Guangqiu, H. Innovative svm optimization with differential gravitational fireworks for superior air pollution classification. Sci. Rep. 14(1), 24450. https://doi.org/10.1038/s41598-024-75839-7 (2024).
Liu, B.-C., Binaykia, A., Chang, P.-C., Tiwari, M. K. & Tsao, C.-C. Urban air quality forecasting based on multi- dimensional collaborative support vector regression (svr): A case study of Beijing-Tianjin-Shijiazhuang. PLoS One 12, e0179763. https://doi.org/10.1371/journal.pone.0179763 (2017).
Borge, R., Jung, D., Lejarraga, I., de la Paz, D. & Cordero, J. M. Assessment of the Madrid region air quality zoning based on mesoscale modelling and k-means clustering. Atmos. Environ. 287, 119258. https://doi.org/10.1016/j.atmosenv.2022.119258 (2022).
Kazemi, Z. et al. Assessment of the risk of exposure to air pollutants and identifying the affecting factors on making pollution by pca, cfa. Int. J. Environ. Anal. Chem. 104, 2128–2147. https://doi.org/10.1080/03067319.2022.2059364 (2022).
Naz, F. et al. Air quality and healthy ageing: Predictive modeling of pollutants using cnn quantum-lstm. IEEE Access 13, 94212–94223. https://doi.org/10.1109/ACCESS.2025.3570526 (2025).
Wang, S. & Zhang, Y. An attention-based cnn model integrating observational and simulation data for high-resolution spatial estimation of urban air quality. Atmos. Environ. 340, 120921. https://doi.org/10.1016/j.atmosenv.2024.120921 (2025).
Kushwah, V. & Agrawal, P. Hybrid model for air quality prediction based on lstm with random search and bayesian optimization techniques. Earth Sci. Inform. 18, 32. https://doi.org/10.1007/s12145-024-01514-0 (2025).
Wang, X., Yan, J., Wang, X. & Wang, Y. Air quality forecasting using the gru model based on multiple sensors nodes. IEEE Sens. Lett. 7, 1–4. https://doi.org/10.1109/LSENS.2023.3290144 (2023).
Wang, B., Kong, W. & Zhao, P. An air quality forecasting model based on improved convnet and rnn. Soft Comput. 25, 9209–9218. https://doi.org/10.1007/s00500-021-05843-w (2021).
Qi, Z. et al. Deep-air: A hybrid cnn-lstm framework for fine-grained air pollution estimation and forecast in metropolitan cities. IEEE Access. 10, 55818–55841. https://doi.org/10.1109/ACCESS.2022.3177893 (2022).
Yongkang, Z. et al. Air quality forecasting with hybrid lstm and extended stationary wavelet transform. Build. Environ. 213, 108822. https://doi.org/10.1016/j.buildenv.2022.108822 (2022).
Wani, T. et al. Hybridization of air quality forecasting models using machine learning and clustering: an original approach to detect pollutant peaks. Aerosol Air Qual. Res. 24, 405–416. https://doi.org/10.4209/aaqr.230181 (2024).
Jin, X.-B. et al. Deep hybrid model based on emd with classification by frequency characteristics for long-term air quality prediction. Mathematics 8(2), 214. https://doi.org/10.3390/math8020214 (2020).
Jiang, X., Wei, P., Luo, Y. & Li, Y. Air pollutant concentration prediction based on a ceemdan-fe-bilstm model. Atmosphere 12, 1452. https://doi.org/10.3390/atmos12111452 (2021).
Tang, C., Wang, Z., Wei, Y., Zhao, Z. & Li, W. A novel hybrid prediction model of air quality index based on variational modal decomposition and ceemdan-se-gru. Process. Saf. Environ. Prot. https://doi.org/10.1016/j.psep.2024.11.061 (2024).
Zhao, Z. & Yap, D. S. Research on air quality prediction model based on ceemdan-cnn-bilstm-attention. In Proceedings of the 2024 8th International Conference on Electronic Information Technology and Computer Engineering, (2024). https://doi.org/10.1145/3689969.3689987
Xiaokai, Q. et al. Prediction of air leakage rate of sintering furnace based on bp neural network optimized by pso. Wirel. Commun. Mob. Comput. 1–9, (2022). https://doi.org/10.1155/2022/5128993 (2022).
Hongzhuo, W. et al. Prediction of air quality index (aqi) based on optimized gwo-lstm neural network. In 2022 IEEE 8th International Conference on Computer and Communications (ICCC), 1445–1449, (2022). https://doi.org/10.1109/ICCC56324.2022.10065731
Jing, C., Jieshu, H., He, G. & Yu, S. Optimization of cnn-lstm air quality prediction based on the poa algorithm. Sustainability 17 https://doi.org/10.3390/su17125006 (2025).
Zhou, H. et al. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. on Artif. Intell. 35, 11106–11115, (2021). https://doi.org/10.1609/aaai.v35i12.17325
Wu, N., Green, B., Ben, X. & O’Banion, S. Transformer for time series: A survey 2009.07685. (2020).
Torres, M. E., Colominas, M. A., Schlotthauer, G. & Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4144–4147, https://doi.org/10.1109/ICASSP.2011.5947265 (IEEE, 2011).
Cho, K. et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1724–1734, (2014). https://doi.org/10.3115/v1/D14-1179
Mirjalili, S., Mirjalili, S. M. & Lewis, A. Grey wolf optimizer. Adv Eng. Softw 69, 46–61, (2014). https://doi.org/10.1016/j.advengsoft.2013.12.007
Funding
This work has been supported by Guangdong University of Science and Technology Doctoral Startup Fund Project (No. GKY-2025BSQDK-14).
Author information
Authors and Affiliations
Contributions
YF conceived the study, proposed the hybrid forecasting framework, and led the overall implementation. SPL and ZHS contributed to the methodological design and the development of the optimization strategy. YF and SPL conducted the experiments and performed the formal analysis of the results. YF drafted the manuscript and handled project correspondence. All authors reviewed and approved the final version of the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fang, Y., Liu, S. & Su, Z. Air quality index prediction using a hybrid CEEMDAN-CNN-IGWO-BiGRU-Attention model. Sci Rep (2026). https://doi.org/10.1038/s41598-026-46978-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-46978-w


