Air quality index AQI classification based on hybrid particle swarm and grey wolf optimization with ensemble machine learning model

Elabd, Emad; Hamouda, Hany Mohamed; Ali, M. A. Mohamed; Hamid, A. S.; Fouad, Yasser

doi:10.1038/s41598-025-34278-8

Download PDF

Article
Open access
Published: 05 January 2026

Air quality index AQI classification based on hybrid particle swarm and grey wolf optimization with ensemble machine learning model

Emad Elabd^1,2,
Hany Mohamed Hamouda¹,
M. A. Mohamed Ali³,
A. S. Hamid⁴ &
…
Yasser Fouad⁵

Scientific Reports , Article number: (2026) Cite this article

748 Accesses
1 Citations
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Accurate Air Quality Index (AQI) classification is essential for environmental surveillance and public health decision-making. Using a publicly available daily U.S. county-level dataset with six AQI categories (Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, Hazardous), we conducted a comprehensive benchmarking study. Data preprocessing included missing-value imputation and class balancing via Synthetic Minority Over-sampling Technique (SMOTE). We trained and evaluated classical and deep models (Random Forest (RF), Extra Trees (ET), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), and a Multi-Layer Perceptron (MLP)) and assessed performance using cross-validation accuracy, test accuracy, macro-averaged recall, F1-score, and ROC-AUC. Ensemble methods (RF, ET) and the MLP consistently outperformed traditional baselines. RF achieved 99.3% test accuracy with perfect recall, F1-score, and ROC-AUC; MLP achieved 99.0% test accuracy. A stacking ensemble, optimized with a hybrid Particle Swarm–Grey Wolf Optimizer (PSO–GWO), delivered 99.99% test accuracy, 99.99% macro-averaged recall, and 1.0000 ROC-AUC. These findings demonstrate that combining ensemble learning with metaheuristic optimization can substantially enhance multi-class AQI classification performance and offer a practical path toward reliable, real-time air-quality assessment.

Data availability

The data that support the findings of this study are available at https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report.

Code availability

The code used in this study is available from the corresponding author upon reasonable request.

References

Kampa, M. & Castanas, E. July. Human health effects of air pollution. Environmental Pollution. 151 (2), 362–367. https://doi.org/10.1016/j.envpol.2007.06.012 (2007).
Lai, W. I. et al. Ensemble machine learning model for accurate air pollution detection using commercial gas sensors. Sensors. 22 (12), 4393. https://doi.org/10.3390/s22124393 (2022).
Lin, C. Y. et al. Ensemble multifeatured deep learning models for air quality forecasting. Atmospheric Pollution Res., 12 (5), 101045. https://doi.org/10.1016/j.apr.2021.03.008 (2021).
Gupta, N. et al. Prediction of air quality index using machine learning techniques: a comparative analysis. J. Environ. Public. Health, 1–26. https://doi.org/10.1155/2023/4916267 (2023).
Air pollution. https://www.who.int/health-topics/air-pollution (2025).
New State of Global Air. Report finds air pollution is second leading risk factor for death worldwide | Health Effects Institute, https://www.healtheffects.org/announcements/new-state-global-air-report-finds-air-pollution-second-leading-risk-factor-death (2025).
Guo, J. et al. Long-term exposure to particulate matter on cardiovascular and respiratory diseases in low- and middle-income countries: A systematic review and meta-analysis. Front. Public. Health. 11, 1134341. https://doi.org/10.3389/FPUBH.2023.1134341 (2023).
Google Scholar
Ambient (ed) (outdoor) air pollution. https://www.who.int/news-room/fact-sheets/detail/ambient-%28outdoor%29-air-quality-and-health (2025).
Organización Mundial de la Salud (OMS): WHO global air quality guidelines. Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. 1–360. (2021).
Ketu, S. and Pramod Kumar, M. Scalable Kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare. Complex. Intell. Syst. 7 (5), 2597–2615. https://doi.org/10.1007/s40747-021-00435-5 (2021). .
Alkabbani, H. et al. An improved air quality index machine learning-based forecasting with multivariate data imputation approach. Atmosphere. 13 (7), 1144. https://doi.org/10.3390/atmos13071144 (2022),
Razavi-Termeh, S. et al. Spatial modeling of asthma-prone areas using remote sensing and ensemble machine learning algorithms. Remote Sens., 13,16,2021,3222. https://doi.org/10.3390/rs13163222
Udristioiu, M. T. et al. Prediction, Modelling, and forecasting of PM and AQI using hybrid machine learning. J. Clean. Prod., 421, 138496. https://doi.org/10.1016/j.jclepro.2023.138496 (2023).
Sethi, J. K. & Mittal, M. An efficient correlation based adaptive LASSO regression method for air quality index prediction. Earth Sci. Inf., 14 (4), 1777–1786. https://doi.org/10.1007/s12145-021-00618-1 (2021).
Rao, R. et al. Multimodal Imputation-based stacked ensemble for prediction and classification of air quality index. in Indian Cities Computers Electr. Eng., 114, 109098. https://doi.org/10.1016/j.compeleceng.2024.109098 (2024).
Mohan, A. S. & Abraham, L. An ensemble deep learning approach for air quality estimation in Delhi, India. Earth Science Informatics. 17, (3), 1923–48. https://doi.org/10.1007/s12145-023-01210-5 (2024).
Farooq, O. et al. An enhanced approach for predicting air pollution using quantum support vector machine. Sci. Rep., 14 (1). https://doi.org/10.1038/s41598-024-69663-2 (2024).
Ma, S. et al. Forecasting air quality index in yan’an using Temporal encoded informer. Expert Syst. Appl.. 255, 124868. https://doi.org/10.1016/j.eswa.2024.124868 (2024).
Ahmadi, M. et al. Enhancing air quality classification using a novel discrete learning-based multilayer perceptron model (DMLP). Int. J. Environ. Sci. Technol. https://doi.org/10.1007/s13762-024-06017-5 (2024).
Google Scholar
Singh, S. & Suthar, G. Machine learning and deep learning approaches for pm2.5 prediction: a study on urban air quality in Jaipur, India. Earth Sci. Inf., 18 (1), https://doi.org/10.1007/s12145-024-01648-1 (2024).
Rajagopal, K. & Narayanan, K. A novel approach for air quality index prognostication using hybrid optimization techniques. Int. Res. J. Multidisciplinary Technovation. 84–99. https://doi.org/10.54392/irjmt2427 (2024).
Subrahmanyam, V. et al. June. An environmental green approach by optimization of air quality index (AQI) prediction using hybrid machine learning combines with swarm intelligence algorithm. International Journal of Environmental Sciences. 11 (10), 724–34. https://doi.org/10.64252/70hanh87 (2025).
Ghorbal, A. et al. Air pollution prediction using blind source separation with Greylag Goose optimization algorithm. Front. Environ. Sci.. 12, https://doi.org/10.3389/fenvs.2024.1429410 (2024).
Lakshmipathy, M. et al. Health and ecological risk assessment-based air quality prediction framework using ensemble learning network with optimal weighted prediction score. Int. J. Image Graphics , https://doi.org/10.1142/s0219467827500604 (2025).
Air Quality Index Report | US EPA. https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report (2025).
Panchbhai, K. G., Lanjewar, M. G. & Naik, A. V. Modified MobileNet with leaky ReLU and LSTM with balancing technique to classify the soil types. Earth Sci. Inf. 18, 77. https://doi.org/10.1007/s12145-024-01521-1 (2025).
Google Scholar
Panchbhai, K. G. & Lanjewar, M. G. Detection of amylose content in rice samples with spectral augmentation and advanced machine learning. J. Food Compos. Anal. 107455. https://doi.org/10.1016/j.jfca.2025.107455 (2025).
Panchbhai, K. G. et al. Near-infrared spectroscopy coupled with machine learning for soil properties prediction. Int. J. Remote Sens. 1–33. https://doi.org/10.1080/01431161.2025.2541943 (2025).
Panchbhai, K. G. & Lanjewar, M. G. Identification of Mango varieties with vitamin C and titratable acidity using stacking generalization from NIR spectra. Food Measure. 19, 4257–4277. https://doi.org/10.1007/s11694-025-03251-4 (2025).
Google Scholar
Panchbhai, K. G. & Lanjewar, M. G. Integrating ATR-MIR spectroscopy with stacking machine learning for detecting palm Olein adulterants in groundnut oil. Food Measure. 19, 5871–5885. https://doi.org/10.1007/s11694-025-03360-0 (2025).
Google Scholar
Elshewey, A. M. et al. Water potability classification based on hybrid stacked model and feature selection. Environ. Sci. Pollut. Res. https://doi.org/10.1007/s11356-025-36120-0 (2025).
Google Scholar
Elshewey, A. M. et al. Prediction of aerodynamic coefficients based on machine learning models. Model. Earth Syst. Environ.. https://doi.org/10.1007/s40808-025-02355-6 (2025).
Fouad, Y. et al. Adaptive visual sentiment prediction model based on event concepts and object detection techniques in social media. Int. J. Adv. Comput. Sci. Appl.. https://doi.org/10.14569/ijacsa.2023.0140728 (2023).
Rainio, O. et al. Evaluation metrics and statistical tests for machine learning. Sci. Rep.. https://doi.org/10.1038/s41598-024-56706-x (2024).
Choi, Y. et al. Utilizing machine learning-based classification models for tracking air pollution sources: a case study in Korea. Aerosol Air Qual. Res., 24 (7), 230222. https://doi.org/10.4209/aaqr.230222 (2024).
Rao, R. et al. Multimodal imputation-based multimodal autoencoder framework for AQI classification and prediction of Indian cities. IEEE Access.. 12, 108350–108363. https://doi.org/10.1109/access.2024.3438573 (2024).
Barthwal, A. and Amit Kumar Goel. Advancing air quality prediction models in urban india: a deep learning approach integrating DCNN and LSTM architectures for AQI time-series classification. Model. Earth Syst. Environ., 10 (2), 2935–2955. https://doi.org/10.1007/s40808-023-01934-9 (2024).
Rafi, M. A. et al. Air pollution prediction and classification with a hybrid ANN-LSTM model in modern cities: a comparative study. IET Conference Proceedings.,2024,30,2025,580–85. https://doi.org/10.1049/icp.2025.0313
Domingos, P. Sept. A few useful things to know about machine Learning. Communications of the ACM. 55 (10), 78–87. https://doi.org/10.1145/2347736.2347755 (2012).
Lobo, J. M. et al. AUC: A misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr.17 (2), 145–151. https://doi.org/10.1111/j.1466-8238.2007.00358.x (2007).
Singh, K. P. et al. Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmos. Environ. 80, 426–437. https://doi.org/10.1016/j.atmosenv.2013.08.023 (2013).
Zhang, B. et al. Sept. Air quality index prediction in six major Chinese urban agglomerations: a comparative study of single machine learning model, ensemble model, and hybrid model. Atmosphere. 14 (10), 1478. https://doi.org/10.3390/atmos14101478 (2023).
Almaliki, A. H. et al. Sept. Air quality index (AQI) prediction in holy Makkah based on machine learning methods. Sustainability. 15 (17), 13168. https://doi.org/10.3390/su151713168 (2023).
Diallo, A. et al. Enhancing outlier detection in air quality index data using a stacked machine learning model. Eng. Rep.. https://doi.org/10.1002/eng2.12936, (2024).
Özüpak, Y. et al. Air quality forecasting using machine learning: comparative analysis and ensemble strategies for enhanced prediction. Water Air Soil. Pollution, 236, https://doi.org/10.1007/s11270-025-08122-8 (2025).
Afreen, S., Bhurjee, A. K. & Aziz, R. M. Feature selection using game Shapley improved grey Wolf optimizer for optimizing cancer classification. Knowl. Inf. Syst. 67, 3631–3662. https://doi.org/10.1007/s10115-025-02340-6 (2025).
Google Scholar
Yaqoob, A., Kumar, V. N., Rao, G. V. V., Jagannadha & Aziz, R. Musheer. 8 Efficient gene selection for breast cancer classification using Brownian Motion Search Algorithm and Support Vector Machine. Drug Discovery and Telemedicine: Through Artificial Intelligence, Computer Vision, and IoT, edited by Saurav Mallik, Zubair Rahaman, Soumita Seth, Anjan Bandyopadhyay, Sujata Swain and Somenath Chakraborty, De Gruyter, 109–126. (2025). https://doi.org/10.1515/9783111504667-008
Yaqoob, A., Kumar, V. N., Rao, G. V. V., Jagannadha & Aziz, R. 9 A hybrid feature gene selection approach by integrating variance filter, extremely randomized tree, and cuckoo search algorithm for cancer classification. Drug discovery and telemedicine: through artificial Intelligence, computer Vision, and IoT, edited by Saurav Mallik, Zubair Rahaman, Soumita Seth, Anjan Bandyopadhyay, Sujata Swain and Somenath Chakraborty. De Gruyter. 127–150. https://doi.org/10.1515/9783111504667-009 (2025).

Download references

Acknowledgements

The Researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2026).

Author information

Authors and Affiliations

Department of Management Information Systems, College of Business and Economics, Qassim University, Buraidah, Qassim, 51452, Saudi Arabia
Emad Elabd & Hany Mohamed Hamouda
Department of Information Systems, Faculty of Computers and Information, Menoufia University, Shebin El Kom, Egypt
Emad Elabd
Department of Mathematics, College of Science, Qassim University, Buraidah, Qassim, 51452, Saudi Arabia
M. A. Mohamed Ali
Department of Physics, College of Science, Qassim University, Buraidah, Qassim, 51452, Saudi Arabia
A. S. Hamid
Department of Computer Science, Faculty of Computers and Information, Suez University, P.O.Box:43221, Suez, Egypt
Yasser Fouad

Authors

Emad Elabd
View author publications
Search author on:PubMed Google Scholar
Hany Mohamed Hamouda
View author publications
Search author on:PubMed Google Scholar
M. A. Mohamed Ali
View author publications
Search author on:PubMed Google Scholar
A. S. Hamid
View author publications
Search author on:PubMed Google Scholar
Yasser Fouad
View author publications
Search author on:PubMed Google Scholar

Contributions

Yasser Fouad conceived and designed the study; developed the methodology; implemented the software; performed validation and formal analysis; conducted the investigation; curated the data; wrote the original draft; contributed to review and editing; and prepared the visualizations. Emad Elabd conceived and designed the study; contributed to the methodology; conducted the investigation; reviewed and edited the manuscript. M. A. Mohamed Ali contributed to the methodology, performed validation and formal analysis, and participated in review and editing. Hany Mohamed Hamouda provided resources, curated the data, contributed to visualization, and participated in review and editing. A S Hamid participated in review and editing. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Emad Elabd or Yasser Fouad.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Elabd, E., Hamouda, H.M., Ali, M.A.M. et al. Air quality index AQI classification based on hybrid particle swarm and grey wolf optimization with ensemble machine learning model. Sci Rep (2026). https://doi.org/10.1038/s41598-025-34278-8

Download citation

Received: 06 September 2025
Accepted: 26 December 2025
Published: 05 January 2026
DOI: https://doi.org/10.1038/s41598-025-34278-8

Keywords

This article is cited by

Hybrid deep learning model for air quality prediction and its impact on healthcare
- Tanisha Madan
- Shrddha Sagar
- Arvind Panwar
Scientific Reports (2026)