Abstract
Reliable, comparable greenhouse gas (GHG) emissions data at the subnational level remain scarce, despite growing expectations for cities and regions to lead on climate action. Inconsistent reporting, methodological variation, and limited coverage of self-reported inventories hinder efforts to track progress and guide mitigation opportunities. To address these challenges, we develop a machine learning (ML) framework to estimate annual Scope 1 and 2 CO2-equivalent emissions for subnational jurisdictions in G20 countries from 2000 to 2020. Our approach integrates publicly available geospatial, socioeconomic, and environmental data with self-reported inventories where available, and aligns predictions with subnational administrative boundaries. Compared to traditional downscaling or proxy-based approaches, our model improves spatial relevance and predictive performance while capturing locally specific emission drivers. This globally consistent, administratively-aligned dataset can serve as a baseline for assessing climate progress, especially in data-poor or inconsistent reporting contexts, and supports more targeted, data-informed policy decisions for urban and regional decarbonization.
Similar content being viewed by others
Data availability
Machine learning and plotting were performed using Python and R. The final trained machine learning model, data, and materials are publicly available via the Data-Driven EnviroLab Dataverse [https://doi.org/10.15139/S3/N5SVSP]46.
Code availability
Machine learning and plotting were performed using Python and R. The final trained machine learning model, data, and materials are publicly available via the Data-Driven EnviroLab Dataverse [https://doi.org/10.15139/S3/N5SVSP]46.
References
UNFCCC. Global Climate Action Portal. https://climateaction.unfccc.int/ (2025).
Net Zero Tracker. Net Zero Tracker. https://zerotracker.net/ (2025).
Song, K., Burley Farr, K. & Hsu, A. Assessing subnational climate action in G20 cities and regions: Progress and ambition. One Earth 7, 2189–2203 (2024).
Ibrahim, N., Sugar, L., Hoornweg, D. & Kennedy, C. Greenhouse gas emissions from cities: comparison of international inventory frameworks. Local Environ. 17, 223–241 (2012).
Marcotullio, P., Sarzynski, A., Albrecht, J., Schulz, N. & Garcia, J. Assessing urban greenhouse gas emissions in European medium and large cities: Methodological considerations. https://academicworks.cuny.edu/hc_pubs/643/ (2016).
Gurney, K. R. et al. Under-reporting of greenhouse gas emissions in U.S. cities. Nat. Commun. 12, 553 (2021).
Crippa, M. et al. Insights into the spatial distribution of global, national, and subnational greenhouse gas emissions in the Emissions Database for Global Atmospheric Research (EDGAR v8. 0). Earth Syst. Sci. Data 16, 2811–2830 (2024).
Kuriakose, J., Jones, C., Anderson, K., McLachlan, C. & Broderick, J. What does the Paris climate change agreement mean for local policy? Downscaling the remaining global carbon budget to sub-national areas. Renew. Sustain. Energy Transit. 2, 100030 (2022).
Huo, D. et al. Carbon Monitor Cities near-real-time daily estimates of CO2 emissions from 1500 cities worldwide. Sci. Data 9, 533 (2022).
Moran, D. et al. Carbon footprints of 13 000 cities. Environ. Res. Lett. 13, 064041 (2018).
Moran, D. et al. Estimating CO2 emissions for 108000 European cities. Earth Syst. Sci. Data 14, 845–864 (2022).
Yu, Y., Manya, D. & Hsu, A. Bridging Territorial and Consumption-Based Emissions for Urban Climate Action Assessment. Eartharxiv Prepr. https://doi.org/10.31223/X5PB02 (2025).
Jin, Y. & Sharifi, A. Machine learning for predicting urban greenhouse gas emissions: A systematic literature review. Renew. Sustain. Energy Rev. 215, 115625 (2025).
Dodman, D. Forces driving urban greenhouse gas emissions. Curr. Opin. Environ. Sustain. 3, 121–125 (2011).
Marcotullio, P. J., Sarzynski, A., Albrecht, J., Schulz, N. & Garcia, J. The geography of global urban greenhouse gas emissions: an exploratory analysis. Clim. Change 121, 621–634 (2013).
Dodman, D. et al. Cities, Settlements and Key Infrastructure. in Climate Change 2022: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (eds. Pörtner, H.-O. et al.) 907–1040, https://doi.org/10.1017/9781009325844.008 (Cambridge University Press, Cambridge, UK and New York, NY, USA, 2022).
Hsu, A., Wang, X., Tan, J., Toh, W. & Goyal, N. Predicting European cities’ climate mitigation performance using machine learning. Nat. Commun. 13, 7487 (2022).
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining vols 13-17-Augu 785–794 (ACM, 2016).
Feng, W. et al. Application of Neural Networks on Carbon Emission Prediction: A Systematic Review and Comparison. Energies 17, 1628 (2024).
Lwasa, S. et al. Urban systems and other settlements. in Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (eds. Shukla, P. R. et al.) 861–952. https://doi.org/10.1017/9781009157926.010 (Cambridge University Press, Cambridge, UK and New York, NY, USA, 2022).
GADM. Database of Global Administrative Areas. (2024).
UNEP Emissions Gap Report 2024: No More Hot Air… Please! https://unepccc.org/emissions-gap-reports/ (2024).
Hsu, A. et al. ClimActor, harmonized transnational data on climate network participation by city and regional governments. Sci. Data 7, 374–374 (2020).
Manya, D. et al. ClimActor 2.0: A spatialized database of subnational climate pledges and emissions data. Preprint at https://doi.org/10.31223/X5BJ2S (2025).
IPCC. Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. https://doi.org/10.1017/9781009157926 (Cambridge University Press, Cambridge, UK and New York, NY, USA, 2022).
Kona, A. et al. Global Covenant of Mayors, a dataset of greenhouse gas emissions for 6200 cities in Europe and the Southern Mediterranean countries. Earth Syst. Sci. Data 13, 3551–3564 (2021).
Data Driven Yale, NewClimate Institute, & PBL Environmental Assessment Agency. Global Climate Action from Cities, Regions, and Businesses: Individual Actors, Collective Initiatives and Their Impact on Global Greenhouse Gas Emissions. https://datadrivenlab.org/wp-content/uploads/2018/08/YALE-NCI-PBL_Global_climate_action.pdf (2018).
World Bank. Total greenhouse gas emissions excluding LULUCF per capita. (2023).
Oda, T., Maksyutov, S. & Andres, R. J. The Open-source Data Inventory for Anthropogenic CO2, gridded emissions data product for tracer transport simulations and surface flux inversions. Earth Syst. Sci. Data 10, 87–107 (2018).
Bosilovich, M. G., Lucchesi, R. & Suarez, M. MERRA-2: FileSpecification. GMAO Office Note No. 9 (Version 1.1), 73 pp, available from http://gmao.gsfc.nasa.gov/pubs/office_notes (2016).
Spinoni, J. et al. Changes of heating and cooling degree-days in Europe from 1981 to 2100. Int. J. Climatol. 38, e191–e208 (2018).
Engel-Cox, J., Kim Oanh, N. T., van Donkelaar, A., Martin, R. V. & Zell, E. Toward the next generation of air quality monitoring: Particulate Matter. Atmos. Environ. 80, 584–590 (2013).
van Donkelaar, A. et al. Monthly Global Estimates of Fine Particulate Matter and Their Uncertainty. Environ. Sci. Technol. 55, 15287–15300 (2021).
Hammer, M. S. et al. Assessment of the impact of discontinuity in satellite instruments and retrievals on global PM2.5 estimates. Remote Sens. Environ. 294, 113624 (2023).
Cooper, M. J. et al. Global fine-scale changes in ambient NO2 during COVID-19 lockdowns. Nature 601, 380–387 (2022).
Global Modeling And Assimilation Office & Pawson, S. MERRA-2 tavgM_2d_aer_Nx: 2d,Monthly mean,Time-averaged,Single-Level,Assimilation,Aerosol Diagnostics V5.12.4. NASA Goddard Earth Sciences Data and Information Services Center https://doi.org/10.5067/FH9A0MLJPC7N (2015).
Qi, L. & Wang, S. Fossil fuel combustion and biomass burning sources of global black carbon from GEOS-Chem simulation and carbon isotope measurements. Atmospheric Chem. Phys. 19, 11545–11557 (2019).
Stanelle, T., Bey, I., Raddatz, T., Reick, C. & Tegen, I. Anthropogenically induced changes in twentieth century mineral dust burden and the associated impact on radiative forcing. J. Geophys. Res. Atmospheres 119, 13,526–13,546 (2014).
Chen, J. et al. Global 1 km\times 1 km gridded revised real gross domestic product and electricity consumption during 1992–2019 based on calibrated nighttime light data. Sci. Data 9, 202 (2022).
Schiavina, M., Melchiorri, M. & Freire, S. GHS-DUC R2023A - GHS Degree of Urbanisation Classification, application of the Degree of Urbanisation methodology (stage II) to GADM 4.1 layer, multitemporal (1975–2030). European Commission, Joint Research Centre (JRC) https://doi.org/10.2905/DC0EB21D-472C-4F5A-8846-823C50836305 (2023).
Kummu, M., Kosonen, M. & Masoumzadeh Sayyar, S. Downscaled gridded global dataset for gross domestic product (GDP) per capita PPP over 1990–2022. Sci. Data 12, 178 (2025).
Perry, M. rasterstats (2021).
Yu, Y., Li, X., Hsu, A. & Kittner, N. Mapping Spatiotemporal Disparities in Residential Electricity Inequality Using Machine Learning. Environ. Sci. Technol. 58, 19999–20008 (2024).
Erickson, N. et al. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. Preprint at https://doi.org/10.48550/arXiv.2003.06505 (2020).
Lundberg, S. M. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. in Advances in Neural Information Processing Systems 30 (Curran Associates, Inc., 2017).
Wang, X. A global machine learning model for predicting urban greenhouse gas predictions in the G20 from 2000-2020. UNC Dataverse https://doi.org/10.15139/S3/N5SVSP (2025).
EPA. Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990-2022. https://www.epa.gov/ghgemissions/inventory-us-greenhouse-gas-emissions-and-sinks-1990-2022 (2024).
UNFCCC. GHG data from UNFCCC. GHG data from UNFCCC https://unfccc.int/topics/mitigation/resources/registry-and-data/ghg-data-from-unfccc (2025).
Global Covenant of Mayors for Climate & Energy. Data Portal for Cities. http://www.dataportalforcities.org (2025).
Dou, X. et al. Near-real-time global gridded daily CO2 emissions 2021. Sci. Data 10, 69 (2023).
Hüllermeier, E. & Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110, 457–506 (2021).
Acknowledgements
The authors thank Noah Civiletti, Emma Holmes, and Izzy Bukovnik for assistance in data extraction. This work was funded by an IKEA Foundation (Grant no. G-2306-02289) and the National Science Foundation (Grant no. 2216592) to A. Hsu.
Author information
Authors and Affiliations
Contributions
Y.Y., X.W. and D.M. contributed equally to the study. Y.Y., X.W., and D.M. collected data and conducted the analysis. X.W., D.M., and A.H. cleaned and compiled the data. Y.Y. and X.W. contributed to the method. A.H. conceptualized and supervised the study. Y.Y., X.W., D.M., and A.H. wrote the manuscript. All coauthors reviewed, edited, and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yu, Y., Wang, X., Manya, D. et al. Machine learning estimates for G20 subnational urban GHG emissions from 2000–2020. Sci Data (2026). https://doi.org/10.1038/s41597-026-06691-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06691-9


