Abstract
Fine particulate matter (PM2.5) is a significant air pollutant in the Indo Gangetic Basin (IGB), where levels frequently exceed national and WHO air quality standards. Ground observations from 183 CPCB automatic stations, along with MERRA-2 reanalysis products and meteorological variables, were utilized in this study to analyse PM2.5 characteristics over a recent decade for the period from 2014–2023. A machine learning (ML) framework was developed using Random Forest, Extra Trees, LightGBM, and a stacking ensemble model to improve surface PM2.5 estimation in four major IGB cities: Delhi, Kanpur, Lucknow, and Patna. It is found that the raw MERRA-2 estimates systematically underestimated PM2.5, with R2 values of only 0.28–0.42 and RMSE as high as 82 µg m−3. By contrast, the stacking ensemble achieved R2 values of 0.79–0.82, FAC2 above 0.94, RMSE reduced to 27–31 µg m−3, and near-zero bias (1.7–2.3 µg m−3). The model successfully reproduced extreme winter pollution episodes as well as monsoon conditions, highlighting the critical role of meteorological parameters such as boundary layer height, wind speed, and precipitation in regulating PM2.5 variability. Trajectory clustering and concentration-weighted trajectory (CWT) analysis showed that north-westerly transport contributes 55–65% of wintertime PM2.5 in Delhi, Kanpur, and Lucknow, while Patna is affected by both regional inflows and local sources. Major contributing regions include Punjab, Haryana, Rajasthan, and the Nepal plains, associated with crop residue burning and dust transport. By integrating ground observations, reanalysis data, meteorological predictors, and atmospheric transport analysis, this study provides a robust framework for improving PM2.5 prediction and identifying dominant pollution sources in the IGB. The results provide scientific evidence for designing both regional and city-specific mitigation strategies to reduce exposure in one of the world’s most polluted and densely populated regions.
Data availability
The datasets and codes generated and/or analysed during the current study are not publicly available due to institutional/data policy restrictions, but are available from the corresponding author on reasonable request.
References
Ojha, N. et al. On the widespread enhancement in fine particulate matter across the Indo-Gangetic Plain towards winter. Sci. Rep. 10, 5862 (2020).
Ali, M. A. et al. Long-term PM2.5 exposure in Bangladesh: identification of pollution hotspots, trends, sources and health risk assessment. Air Qual. Atmos. Health https://doi.org/10.1007/s11869-025-01768-7 (2025).
Wan Mahiyuddin, W. R., Ismail, R., Mohammad Sham, N., Ahmad, N. I. & Nik Hassan, N. M. N. Cardiovascular and respiratory health effects of fine particulate matters (PM2.5): A review on time series studies. Atmosphere 14, 856 (2023).
Nkansah, F. K., Durosimi Belford, E. J., Hogarh, J. N. & Anim, A. K. Assessment of ambient air quality and health risks from vehicular emissions in urban Ghana: A case study of Winneba. J. Air Pollut. Health https://doi.org/10.18502/japh.v10i1.18092 (2025).
Chatterjee, D. et al. Source contributions to fine particulate matter and attributable mortality in India and the surrounding region. Environ. Sci. Technol. 57, 10263–10275 (2023).
Sharma, N., Dahal, S., Patel, K. & Kumar, S. Study of the correlation between Angstrom exponent and fine mode fraction in the Indo-Gangetic plain using ground-based remote sensing AERONET Data. J. Indian Soc. Remote Sens. 53, 975–991 (2025).
Aslam, M. Y. et al. Seasonal characteristics of boundary layer over a high-altitude rural site in western India: Implications on dispersal of particulate matter. Environ. Sci. Pollut. Res. 28, 35266–35277 (2021).
Paulot, F., Naik, V. & W. Horowitz, L. Reduction in near‐surface wind speeds with increasing CO2 may Worsen winter air quality in the Indo‐Gangetic Plain. Geophys. Res. Lett. 49, (2022).
Dwivedi, P., Radha, R. S., Shekhar, H. & Sharma, S. K. The impact assessment of diwali firecrackers emissions on air quality in Delhi, India: a comparative study of eight consecutive years (2017–2024). J. Atmos. Chem. 82, 9 (2025).
Mandal, S. et al. Nationwide estimation of daily ambient PM2.5 from 2008 to 2020 at 1 km2 in India using an ensemble approach. PNAS Nexus 3, pgae088 (2024).
Sharma, N., Dave, J. A., Kumar, S., Patel, K. & Singh, A. K. Variability in the concentration of particulate matter in Delhi-NCR: Analysis and prediction using machine learning algorithms. Atmos. Environ. 360, 121422 (2025).
Wang, S. et al. Reconstructing long-term (1980–2022) daily ground particulate matter concentrations in India (LongPMInd). Earth System Science Data 16, 3565–3577 (2024).
Mandal, S. et al. Assessing daily PM2.5 at every square kilometer of India over 2008–2020 using a machine learning framework. ISEE Conf. Abstr. 2022, (2022).
Anand, M. & D., Sahu, A. & Prakash, J.,. Assessment of fine aerosol in two different climate regions of India using MERRA-2 products, ground-based measurements, and machine learning. Aerosol Sci. Eng. https://doi.org/10.1007/s41810-024-00279-9 (2025).
Masood, A. et al. Improving PM2.5 prediction in New Delhi using a hybrid extreme learning machine coupled with snake optimization algorithm. Sci. Rep. 13, 21057 (2023).
Prakriti, et al. Deciphering seasonal variability and source dynamics of urban pollutants over Delhi under surface meteorological influence using ground-based and trajectory modeling techniques. Earth Syst. Environ. 9, 1447–1463 (2025).
Shukla, G. & Kumar, A. Chemical composition of aerosols over the Bay of Bengal based on global reanalyses data and on-board ship measurements. Int. J. Remote Sens. https://doi.org/10.1080/01431161.2025.2577974 (2025).
Pant, P. et al. Characterization of ambient PM2.5 at a pollution hotspot in New Delhi, India and inference of sources. Atmos. Environ. 109, 178–189 (2015).
Das, M., Das, A., Ghosh, S., Sarkar, R. & Saha, S. Spatio-temporal concentration of atmospheric particulate matter (PM2.5) during pandemic: A study on most polluted cities of indo-gangetic plain. Urban Climate 35, 100758 (2021).
Srimuruganandam, B. & Shiva Nagendra, S. M. Source characterization of PM10 and PM2.5 mass using a chemical mass balance model at urban roadside. Sci. Total Environ. 433, 8–19 (2012).
Saharan, U. S. et al. Hotspot driven air pollution during crop residue burning season in the Indo-Gangetic Plain, India. Environ. Pollut. 350, 124013 (2024).
Gargava, P. & Rajagopalan, V. Source apportionment studies in six Indian cities—Drawing broad inferences for urban PM10 reductions. Air Qual. Atmos. Health 9, 471–481 (2016).
Sharma, N., Kumar, S. & Patel, K. Variability of the optical and radiative characteristics of aerosols and classification of aerosol types over the Indo-Gangetic Plain during 2008 to 2018. J. Sci. Res. 69, 48–58 (2025).
Sharma, N., Dahal, S., Chaurasiya, S. K., Kumar, S. & Patel, K. Impact of climatic factors on volume aerosol size distribution over Northern India. J. Atmos. Solar Terr. Phys. 277, 106633 (2025).
Dey, S. et al. A satellite-based high-resolution (1-km) Ambient PM2.5 database for India over two decades (2000–2019): Applications for air quality management. Remote Sens. 12, 3872 (2020).
Ganguly, T., Selvaraj, K. L. & Guttikunda, S. K. National clean air programme (NCAP) for Indian cities: Review and outlook of clean air action plans. Atmos. Environ. X 8, 100096 (2020).
Randles, C. A. et al. The MERRA-2 aerosol reanalysis, 1980 onward. Part I: system description and data assimilation evaluation. J. Clim. 30, 6823–6850 (2017).
Gelaro, R. et al. The modern-era retrospective analysis for research and applications, version 2 (MERRA-2). J. Clim. 30, 5419–5454 (2017).
Singh, S. et al. Assessment of surface PM2.5 concentrations over India using modern-era retrospective analysis for research and applications, version 2 (MERRA-2) reanalysis data. Pure Appl. Geophys. https://doi.org/10.1007/s00024-025-03666-6 (2025).
Buchard, V. et al. Evaluation of the surface PM2.5 in version 1 of the NASA MERRA aerosol reanalysis over the United States. Atmos. Environ. 125, 100–111 (2016).
Peng, W. & Weng, F. Impacts of aerosol scattering and absorption on FY‐4B geostationary interferometric infrared sounder (GIIRS) observations. J. Geophys. Res. Atmos. 130, (2025).
Su, L., Yuan, Z., Fung, J. C. H. & Lau, A. K. H. A comparison of HYSPLIT backward trajectories generated from two GDAS datasets. Sci. Total Environ. 506–507, 527–537 (2015).
Luo, Y., Wei, H. & Yang, K. The impact of biomass burning occurred in the Indo-China Peninsula on PM2.5 and its spatiotemporal characteristics over Yunnan Province. Sci. Total Environ. 908, 168185 (2024).
Dimitriou, K., Remoundaki, E., Mantas, E. & Kassomenos, P. Spatial distribution of source areas of PM2.5 by concentration weighted trajectory (CWT) model applied in PM2.5 concentration and composition data. Atmos. Environ. 116, 138–145 (2015).
Warner, M. S. C. Introduction to PySPLIT: A Python Toolkit for NOAA ARL’s HYSPLIT model. Comput. Sci. Eng. 20, 47–62 (2018).
Cui, L., Song, X. & Zhong, G. Comparative analysis of three methods for HYSPLIT atmospheric trajectories clustering. Atmosphere 12, 698 (2021).
Dimitriou, K. The dependence of PM size distribution from meteorology and local-regional contributions, in Valencia (Spain)—A CWT model approach. Aerosol Air Qual. Res. 15, 1979–1989 (2015).
Brereton, C. A. & Johnson, M. R. Identifying sources of fugitive emissions in industrial facilities using trajectory statistical methods. Atmos. Environ. 51, 46–55 (2012).
Sayeed, A. et al. Hourly and Daily PM 2.5 Estimations Using MERRA‐2: A machine learning approach. Earth Space Sci. 9, (2022).
Hu, X. et al. Estimating PM 2.5 concentrations in the conterminous United States using the random forest approach. Environ. Sci. Technol. 51, 6936–6944 (2017).
Doris, M. et al. Eighteen years of daily PM2.5 predictions (2005–2022) for a region of western Canada: Machine learning and satellite inputs for applications in rural health. Atmos. Environ. 355, 121281 (2025).
Navinya, C. D., Vinoj, V. & Pandey, S. K. Evaluation of PM2.5 surface concentrations simulated by NASA’s MERRA version 2 aerosol reanalysis over India and its relation to the air quality index. Aerosol Air Qual. Res. 20, 1329–1339 (2020).
Dhandapani, A., Iqbal, J. & Kumar, R. N. Application of machine learning (individual vs stacking) models on MERRA-2 data to predict surface PM2.5 concentrations over India. Chemosphere 340, 139966 (2023).
Kumar, P. et al. New directions: Air pollution challenges for developing megacities like Delhi. Atmos. Environ. 122, 657–661 (2015).
Guttikunda, S. K. & Jawahar, P. Atmospheric emissions and pollution from the coal-fired thermal power plants in India. Atmos. Environ. 92, 449–460 (2014).
Lelieveld, J., Evans, J. S., Fnais, M., Giannadaki, D. & Pozzer, A. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature 525, 367–371 (2015).
Singh, A., Patel, A., Satish, R., Tripathi, S. N. & Rastogi, N. Wintertime oxidative potential of PM2.5 over a big urban city in the central Indo-Gangetic Plain. Sci. Total Environ. 905, 167155 (2023).
Kumar, A., Yadav, I. C., Shukla, A. & Devi, N. L. Seasonal variation of PM2.5 in the central Indo-Gangetic Plain (Patna) of India: Chemical characterization and source assessment. SN Appl. Sci. 2, 1366 (2020).
Singh, S. et al. Assessment of surface PM2.5 concentrations over India using modern-era retrospective analysis for research and applications, version 2 (MERRA-2) reanalysis data. Pure Appl. Geophys. 182, 1713–1735 (2025).
Kumari, S., Verma, N., Lakhani, A. & Kumari, K. M. Severe haze events in the Indo-Gangetic Plain during post-monsoon: Synergetic effect of synoptic meteorology and crop residue burning emission. Sci. Total Environ. 768, 145479 (2021).
Tripathi, S. N., Yadav, S. & Sharma, K. Air pollution from biomass burning in India. Environ. Res. Lett. 19, 073007 (2024).
Roy, C., Ayantika, D. C., Girach, I. & Chakrabarty, C. Intense biomass burning over northern India and its impact on air quality, chemistry and climate. In 169–204 (2022). https://doi.org/10.1007/978-981-16-7727-4_8.
Sharma, N., Kumar, S. & Patel, K. Aerosol type classification and its temporal distribution in Kanpur using ground-based remote sensing. J. Atmos. Solar Terr. Phys. 265, 106366 (2024).
Singh, N. et al. Aerosol chemistry, transport, and climatic implications during extreme biomass burning emissions over the Indo-Gangetic Plain. Atmos. Chem. Phys. 18, 14197–14215 (2018).
Mogno, C., Palmer, P. I., Knote, C., Yao, F. & Wallington, T. J. Seasonal distribution and drivers of surface fine particulate matter and organic aerosol over the Indo-Gangetic Plain. Atmos. Chem. Phys. 21, 10881–10909 (2021).
Buchard, V. et al. The MERRA-2 aerosol reanalysis, 1980 onward. Part II: evaluation and case studies. J. Clim. 30, 6851–6872 (2017).
Arif, M., Kumar, R., Kumar, R., Eric, Z. & Gourav, P. Ambient black carbon, PM2.5 and PM10 at Patna: Influence of anthropogenic emissions and brick kilns. Sci. Total Environ. 624, 1387–1400 (2018).
Kumar, P. et al. Seasonal and spatial variations in particulate matter, black carbon and metals in Delhi, India’s Megacity. Urban Sci. 8, 101 (2024).
Ghosh, S., Biswas, J., Guttikunda, S., Roychowdhury, S. & Nayak, M. An investigation of potential regional and local source regions affecting fine particulate matter concentrations in Delhi, India. J. Air Waste Manag. Assoc. 65, 218–231 (2015).
Gupta, L. et al. Assessment of PM10 and PM2.5 over Ghaziabad, an industrial city in the Indo-Gangetic Plain: spatio-temporal variability and associated health effects. Environ. Monitor. Assess. 193, 735 (2021).
Ravindra, K., Singh, T., Mandal, T. K., Sharma, S. K. & Mor, S. Seasonal variations in carbonaceous species of PM2.5 aerosols at an urban location situated in Indo-Gangetic Plain and its relationship with transport pathways, including the potential sources. J. Environ. Manag. 303, 114049 (2022).
Sembhi, H. et al. Post-monsoon air quality degradation across Northern India: assessing the impact of policy-related shifts in timing and amount of crop residue burnt. Environ. Res. Lett. 15, 104067 (2020).
Rahman, M. M., Begum, B. A., Hopke, P. K., Nahar, K. & Thurston, G. D. Assessing the PM2.5 impact of biomass combustion in megacity Dhaka, Bangladesh. Environ. Pollut. 264, 114798 (2020).
Emily, U. Bending Agricultural Burning Trajectories in Eastern India (Graduate School of Cornell University, 2023).
Mazzeo, A. et al. Impact of residential combustion and transport emissions on air pollution in Santiago during winter. Atmos. Environ. 190, 195–208 (2018).
Acknowledgements
The authors acknowledge the Central Pollution Control Board (CPCB), India, for providing long-term surface PM2.5 measurements from its monitoring network, which formed the observational backbone of this study. Authors express their gratitude to Manipal University Jaipur for providing Open access funding for the current publication. We also thank the NASA Global Modeling and Assimilation Office (GMAO) for the provision of the MERRA-2 reanalysis products and the meteorological variables used in this analysis. The computational facilities and research infrastructure provided by the authors’ host institutions are duly acknowledged. The authors also express their sincere respect and appreciation to the Director of the Indian Institute of Tropical Meteorology (IITM), Pune, Dr. A. Suryachandra Rao, for his guidance, encouragement, and continued support towards ongoing collaborative research in atmospheric and air quality sciences. The integration of multi-source datasets, combined with advanced machine learning frameworks, was made possible through these resources. The authors gratefully acknowledge the Ministry of Earth Sciences (MoES), Government of India, New Delhi for their guidance, support, and collaborative framework that facilitated this research. The authors also acknowledge the scientific discussions and constructive feedback from colleagues that helped refine the methodology and strengthen the interpretations presented in this work. We also express our appreciation for the ‘PyCaret’ machine learning framework, an open-source, low-code Python library that streamlines end-to-end ML workflows by automating data preparation, model training, comparison, and deployment.
Author statement
The views and conclusions presented in this article are solely those of the authors and do not necessarily represent the perspectives of their affiliated organizations. This work is entirely original, has not been submitted elsewhere, and the copyright of this article is exclusively held by the Scientific Reports Journal.
Funding
Open access funding provided by Manipal University Jaipur.
Author information
Authors and Affiliations
Contributions
CRediT Taxonomy **VS:** Conceptualization; Formal analysis; Visualization; Software, Validation, Writing—original draft; and Writing—review &;editing. **SS:** Visualization, Software; **NS:** Visualization, Software, **AS:** Data curation; Formal analysis; Visualization, **AS:** Data Curation; Visualization; **AKS:** Supervision; Validation, **DSB:** Data Curation; Visualization, **KP:** Visualization; Validation, **NS:** Software; Validation, **MA:** Supervision, **AC:** Supervision, Software, Validation.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Singh, V., Singh, S., Sharma, N. et al. Estimation of surface PM2.5 over the Indo-Gangetic Basin using MERRA-2 reanalysis and machine learning. Sci Rep (2026). https://doi.org/10.1038/s41598-026-37934-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-37934-9