Abstract
Accurate Air Quality Index (AQI) classification is essential for environmental surveillance and public health decision-making. Using a publicly available daily U.S. county-level dataset with six AQI categories (Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, Hazardous), we conducted a comprehensive benchmarking study. Data preprocessing included missing-value imputation and class balancing via Synthetic Minority Over-sampling Technique (SMOTE). We trained and evaluated classical and deep models (Random Forest (RF), Extra Trees (ET), K-Nearest Neighbors (KNN), Naive Bayes (NB), Logistic Regression (LR), and a Multi-Layer Perceptron (MLP)) and assessed performance using cross-validation accuracy, test accuracy, macro-averaged recall, F1-score, and ROC-AUC. Ensemble methods (RF, ET) and the MLP consistently outperformed traditional baselines. RF achieved 99.3% test accuracy with perfect recall, F1-score, and ROC-AUC; MLP achieved 99.0% test accuracy. A stacking ensemble, optimized with a hybrid Particle Swarm–Grey Wolf Optimizer (PSO–GWO), delivered 99.99% test accuracy, 99.99% macro-averaged recall, and 1.0000 ROC-AUC. These findings demonstrate that combining ensemble learning with metaheuristic optimization can substantially enhance multi-class AQI classification performance and offer a practical path toward reliable, real-time air-quality assessment.
Data availability
The data that support the findings of this study are available at https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report.
Code availability
The code used in this study is available from the corresponding author upon reasonable request.
References
Kampa, M. & Castanas, E. July. Human health effects of air pollution. Environmental Pollution. 151 (2), 362–367. https://doi.org/10.1016/j.envpol.2007.06.012 (2007).
Lai, W. I. et al. Ensemble machine learning model for accurate air pollution detection using commercial gas sensors. Sensors. 22 (12), 4393. https://doi.org/10.3390/s22124393 (2022).
Lin, C. Y. et al. Ensemble multifeatured deep learning models for air quality forecasting. Atmospheric Pollution Res., 12 (5), 101045. https://doi.org/10.1016/j.apr.2021.03.008 (2021).
Gupta, N. et al. Prediction of air quality index using machine learning techniques: a comparative analysis. J. Environ. Public. Health, 1–26. https://doi.org/10.1155/2023/4916267 (2023).
Air pollution. https://www.who.int/health-topics/air-pollution (2025).
New State of Global Air. Report finds air pollution is second leading risk factor for death worldwide | Health Effects Institute, https://www.healtheffects.org/announcements/new-state-global-air-report-finds-air-pollution-second-leading-risk-factor-death (2025).
Guo, J. et al. Long-term exposure to particulate matter on cardiovascular and respiratory diseases in low- and middle-income countries: A systematic review and meta-analysis. Front. Public. Health. 11, 1134341. https://doi.org/10.3389/FPUBH.2023.1134341 (2023).
Ambient (ed) (outdoor) air pollution. https://www.who.int/news-room/fact-sheets/detail/ambient-%28outdoor%29-air-quality-and-health (2025).
Organización Mundial de la Salud (OMS): WHO global air quality guidelines. Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. 1–360. (2021).
Ketu, S. and Pramod Kumar, M. Scalable Kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare. Complex. Intell. Syst. 7 (5), 2597–2615. https://doi.org/10.1007/s40747-021-00435-5 (2021). .
Alkabbani, H. et al. An improved air quality index machine learning-based forecasting with multivariate data imputation approach. Atmosphere. 13 (7), 1144. https://doi.org/10.3390/atmos13071144 (2022),
Razavi-Termeh, S. et al. Spatial modeling of asthma-prone areas using remote sensing and ensemble machine learning algorithms. Remote Sens., 13,16,2021,3222. https://doi.org/10.3390/rs13163222
Udristioiu, M. T. et al. Prediction, Modelling, and forecasting of PM and AQI using hybrid machine learning. J. Clean. Prod., 421, 138496. https://doi.org/10.1016/j.jclepro.2023.138496 (2023).
Sethi, J. K. & Mittal, M. An efficient correlation based adaptive LASSO regression method for air quality index prediction. Earth Sci. Inf., 14 (4), 1777–1786. https://doi.org/10.1007/s12145-021-00618-1 (2021).
Rao, R. et al. Multimodal Imputation-based stacked ensemble for prediction and classification of air quality index. in Indian Cities Computers Electr. Eng., 114, 109098. https://doi.org/10.1016/j.compeleceng.2024.109098 (2024).
Mohan, A. S. & Abraham, L. An ensemble deep learning approach for air quality estimation in Delhi, India. Earth Science Informatics. 17, (3), 1923–48. https://doi.org/10.1007/s12145-023-01210-5 (2024).
Farooq, O. et al. An enhanced approach for predicting air pollution using quantum support vector machine. Sci. Rep., 14 (1). https://doi.org/10.1038/s41598-024-69663-2 (2024).
Ma, S. et al. Forecasting air quality index in yan’an using Temporal encoded informer. Expert Syst. Appl.. 255, 124868. https://doi.org/10.1016/j.eswa.2024.124868 (2024).
Ahmadi, M. et al. Enhancing air quality classification using a novel discrete learning-based multilayer perceptron model (DMLP). Int. J. Environ. Sci. Technol. https://doi.org/10.1007/s13762-024-06017-5 (2024).
Singh, S. & Suthar, G. Machine learning and deep learning approaches for pm2.5 prediction: a study on urban air quality in Jaipur, India. Earth Sci. Inf., 18 (1), https://doi.org/10.1007/s12145-024-01648-1 (2024).
Rajagopal, K. & Narayanan, K. A novel approach for air quality index prognostication using hybrid optimization techniques. Int. Res. J. Multidisciplinary Technovation. 84–99. https://doi.org/10.54392/irjmt2427 (2024).
Subrahmanyam, V. et al. June. An environmental green approach by optimization of air quality index (AQI) prediction using hybrid machine learning combines with swarm intelligence algorithm. International Journal of Environmental Sciences. 11 (10), 724–34. https://doi.org/10.64252/70hanh87 (2025).
Ghorbal, A. et al. Air pollution prediction using blind source separation with Greylag Goose optimization algorithm. Front. Environ. Sci.. 12, https://doi.org/10.3389/fenvs.2024.1429410 (2024).
Lakshmipathy, M. et al. Health and ecological risk assessment-based air quality prediction framework using ensemble learning network with optimal weighted prediction score. Int. J. Image Graphics , https://doi.org/10.1142/s0219467827500604 (2025).
Air Quality Index Report | US EPA. https://www.epa.gov/outdoor-air-quality-data/air-quality-index-report (2025).
Panchbhai, K. G., Lanjewar, M. G. & Naik, A. V. Modified MobileNet with leaky ReLU and LSTM with balancing technique to classify the soil types. Earth Sci. Inf. 18, 77. https://doi.org/10.1007/s12145-024-01521-1 (2025).
Panchbhai, K. G. & Lanjewar, M. G. Detection of amylose content in rice samples with spectral augmentation and advanced machine learning. J. Food Compos. Anal. 107455. https://doi.org/10.1016/j.jfca.2025.107455 (2025).
Panchbhai, K. G. et al. Near-infrared spectroscopy coupled with machine learning for soil properties prediction. Int. J. Remote Sens. 1–33. https://doi.org/10.1080/01431161.2025.2541943 (2025).
Panchbhai, K. G. & Lanjewar, M. G. Identification of Mango varieties with vitamin C and titratable acidity using stacking generalization from NIR spectra. Food Measure. 19, 4257–4277. https://doi.org/10.1007/s11694-025-03251-4 (2025).
Panchbhai, K. G. & Lanjewar, M. G. Integrating ATR-MIR spectroscopy with stacking machine learning for detecting palm Olein adulterants in groundnut oil. Food Measure. 19, 5871–5885. https://doi.org/10.1007/s11694-025-03360-0 (2025).
Elshewey, A. M. et al. Water potability classification based on hybrid stacked model and feature selection. Environ. Sci. Pollut. Res. https://doi.org/10.1007/s11356-025-36120-0 (2025).
Elshewey, A. M. et al. Prediction of aerodynamic coefficients based on machine learning models. Model. Earth Syst. Environ.. https://doi.org/10.1007/s40808-025-02355-6 (2025).
Fouad, Y. et al. Adaptive visual sentiment prediction model based on event concepts and object detection techniques in social media. Int. J. Adv. Comput. Sci. Appl.. https://doi.org/10.14569/ijacsa.2023.0140728 (2023).
Rainio, O. et al. Evaluation metrics and statistical tests for machine learning. Sci. Rep.. https://doi.org/10.1038/s41598-024-56706-x (2024).
Choi, Y. et al. Utilizing machine learning-based classification models for tracking air pollution sources: a case study in Korea. Aerosol Air Qual. Res., 24 (7), 230222. https://doi.org/10.4209/aaqr.230222 (2024).
Rao, R. et al. Multimodal imputation-based multimodal autoencoder framework for AQI classification and prediction of Indian cities. IEEE Access.. 12, 108350–108363. https://doi.org/10.1109/access.2024.3438573 (2024).
Barthwal, A. and Amit Kumar Goel. Advancing air quality prediction models in urban india: a deep learning approach integrating DCNN and LSTM architectures for AQI time-series classification. Model. Earth Syst. Environ., 10 (2), 2935–2955. https://doi.org/10.1007/s40808-023-01934-9 (2024).
Rafi, M. A. et al. Air pollution prediction and classification with a hybrid ANN-LSTM model in modern cities: a comparative study. IET Conference Proceedings.,2024,30,2025,580–85. https://doi.org/10.1049/icp.2025.0313
Domingos, P. Sept. A few useful things to know about machine Learning. Communications of the ACM. 55 (10), 78–87. https://doi.org/10.1145/2347736.2347755 (2012).
Lobo, J. M. et al. AUC: A misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr.17 (2), 145–151. https://doi.org/10.1111/j.1466-8238.2007.00358.x (2007).
Singh, K. P. et al. Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmos. Environ. 80, 426–437. https://doi.org/10.1016/j.atmosenv.2013.08.023 (2013).
Zhang, B. et al. Sept. Air quality index prediction in six major Chinese urban agglomerations: a comparative study of single machine learning model, ensemble model, and hybrid model. Atmosphere. 14 (10), 1478. https://doi.org/10.3390/atmos14101478 (2023).
Almaliki, A. H. et al. Sept. Air quality index (AQI) prediction in holy Makkah based on machine learning methods. Sustainability. 15 (17), 13168. https://doi.org/10.3390/su151713168 (2023).
Diallo, A. et al. Enhancing outlier detection in air quality index data using a stacked machine learning model. Eng. Rep.. https://doi.org/10.1002/eng2.12936, (2024).
Özüpak, Y. et al. Air quality forecasting using machine learning: comparative analysis and ensemble strategies for enhanced prediction. Water Air Soil. Pollution, 236, https://doi.org/10.1007/s11270-025-08122-8 (2025).
Afreen, S., Bhurjee, A. K. & Aziz, R. M. Feature selection using game Shapley improved grey Wolf optimizer for optimizing cancer classification. Knowl. Inf. Syst. 67, 3631–3662. https://doi.org/10.1007/s10115-025-02340-6 (2025).
Yaqoob, A., Kumar, V. N., Rao, G. V. V., Jagannadha & Aziz, R. Musheer. 8 Efficient gene selection for breast cancer classification using Brownian Motion Search Algorithm and Support Vector Machine. Drug Discovery and Telemedicine: Through Artificial Intelligence, Computer Vision, and IoT, edited by Saurav Mallik, Zubair Rahaman, Soumita Seth, Anjan Bandyopadhyay, Sujata Swain and Somenath Chakraborty, De Gruyter, 109–126. (2025). https://doi.org/10.1515/9783111504667-008
Yaqoob, A., Kumar, V. N., Rao, G. V. V., Jagannadha & Aziz, R. 9 A hybrid feature gene selection approach by integrating variance filter, extremely randomized tree, and cuckoo search algorithm for cancer classification. Drug discovery and telemedicine: through artificial Intelligence, computer Vision, and IoT, edited by Saurav Mallik, Zubair Rahaman, Soumita Seth, Anjan Bandyopadhyay, Sujata Swain and Somenath Chakraborty. De Gruyter. 127–150. https://doi.org/10.1515/9783111504667-009 (2025).
Acknowledgements
The Researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2026).
Author information
Authors and Affiliations
Contributions
Yasser Fouad conceived and designed the study; developed the methodology; implemented the software; performed validation and formal analysis; conducted the investigation; curated the data; wrote the original draft; contributed to review and editing; and prepared the visualizations. Emad Elabd conceived and designed the study; contributed to the methodology; conducted the investigation; reviewed and edited the manuscript. M. A. Mohamed Ali contributed to the methodology, performed validation and formal analysis, and participated in review and editing. Hany Mohamed Hamouda provided resources, curated the data, contributed to visualization, and participated in review and editing. A S Hamid participated in review and editing. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Elabd, E., Hamouda, H.M., Ali, M.A.M. et al. Air quality index AQI classification based on hybrid particle swarm and grey wolf optimization with ensemble machine learning model. Sci Rep (2026). https://doi.org/10.1038/s41598-025-34278-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-34278-8
Keywords
This article is cited by
-
Hybrid deep learning model for air quality prediction and its impact on healthcare
Scientific Reports (2026)