Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Machine learning estimates for G20 subnational urban GHG emissions from 2000–2020
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 19 February 2026

Machine learning estimates for G20 subnational urban GHG emissions from 2000–2020

  • Ying Yu  ORCID: orcid.org/0000-0003-4900-20772 na1 nAff1,
  • Xuewei Wang2,3 na1,
  • Diego Manya  ORCID: orcid.org/0000-0001-8429-09542 na1 &
  • …
  • Angel Hsu  ORCID: orcid.org/0000-0003-4913-94792,3 

Scientific Data , Article number:  (2026) Cite this article

  • 447 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Climate-change mitigation
  • Climate-change policy

Abstract

Reliable, comparable greenhouse gas (GHG) emissions data at the subnational level remain scarce, despite growing expectations for cities and regions to lead on climate action. Inconsistent reporting, methodological variation, and limited coverage of self-reported inventories hinder efforts to track progress and guide mitigation opportunities. To address these challenges, we develop a machine learning (ML) framework to estimate annual Scope 1 and 2 CO2-equivalent emissions for subnational jurisdictions in G20 countries from 2000 to 2020. Our approach integrates publicly available geospatial, socioeconomic, and environmental data with self-reported inventories where available, and aligns predictions with subnational administrative boundaries. Compared to traditional downscaling or proxy-based approaches, our model improves spatial relevance and predictive performance while capturing locally specific emission drivers. This globally consistent, administratively-aligned dataset can serve as a baseline for assessing climate progress, especially in data-poor or inconsistent reporting contexts, and supports more targeted, data-informed policy decisions for urban and regional decarbonization.

Similar content being viewed by others

Aligning artificial intelligence with climate change mitigation

Article 09 June 2022

Global emission factor dataset for Scope 3 machine learning applications

Article Open access 04 February 2026

A machine learning approach to carbon emissions prediction of the top eleven emitters by 2030 and their prospects for meeting Paris agreement targets

Article Open access 03 June 2025

Data availability

Machine learning and plotting were performed using Python and R. The final trained machine learning model, data, and materials are publicly available via the Data-Driven EnviroLab Dataverse [https://doi.org/10.15139/S3/N5SVSP]46.

Code availability

Machine learning and plotting were performed using Python and R. The final trained machine learning model, data, and materials are publicly available via the Data-Driven EnviroLab Dataverse [https://doi.org/10.15139/S3/N5SVSP]46.

References

  1. UNFCCC. Global Climate Action Portal. https://climateaction.unfccc.int/ (2025).

  2. Net Zero Tracker. Net Zero Tracker. https://zerotracker.net/ (2025).

  3. Song, K., Burley Farr, K. & Hsu, A. Assessing subnational climate action in G20 cities and regions: Progress and ambition. One Earth 7, 2189–2203 (2024).

    Google Scholar 

  4. Ibrahim, N., Sugar, L., Hoornweg, D. & Kennedy, C. Greenhouse gas emissions from cities: comparison of international inventory frameworks. Local Environ. 17, 223–241 (2012).

    Google Scholar 

  5. Marcotullio, P., Sarzynski, A., Albrecht, J., Schulz, N. & Garcia, J. Assessing urban greenhouse gas emissions in European medium and large cities: Methodological considerations. https://academicworks.cuny.edu/hc_pubs/643/ (2016).

  6. Gurney, K. R. et al. Under-reporting of greenhouse gas emissions in U.S. cities. Nat. Commun. 12, 553 (2021).

    Google Scholar 

  7. Crippa, M. et al. Insights into the spatial distribution of global, national, and subnational greenhouse gas emissions in the Emissions Database for Global Atmospheric Research (EDGAR v8. 0). Earth Syst. Sci. Data 16, 2811–2830 (2024).

    Google Scholar 

  8. Kuriakose, J., Jones, C., Anderson, K., McLachlan, C. & Broderick, J. What does the Paris climate change agreement mean for local policy? Downscaling the remaining global carbon budget to sub-national areas. Renew. Sustain. Energy Transit. 2, 100030 (2022).

    Google Scholar 

  9. Huo, D. et al. Carbon Monitor Cities near-real-time daily estimates of CO2 emissions from 1500 cities worldwide. Sci. Data 9, 533 (2022).

    Google Scholar 

  10. Moran, D. et al. Carbon footprints of 13 000 cities. Environ. Res. Lett. 13, 064041 (2018).

    Google Scholar 

  11. Moran, D. et al. Estimating CO2 emissions for 108000 European cities. Earth Syst. Sci. Data 14, 845–864 (2022).

    Google Scholar 

  12. Yu, Y., Manya, D. & Hsu, A. Bridging Territorial and Consumption-Based Emissions for Urban Climate Action Assessment. Eartharxiv Prepr. https://doi.org/10.31223/X5PB02 (2025).

  13. Jin, Y. & Sharifi, A. Machine learning for predicting urban greenhouse gas emissions: A systematic literature review. Renew. Sustain. Energy Rev. 215, 115625 (2025).

    Google Scholar 

  14. Dodman, D. Forces driving urban greenhouse gas emissions. Curr. Opin. Environ. Sustain. 3, 121–125 (2011).

    Google Scholar 

  15. Marcotullio, P. J., Sarzynski, A., Albrecht, J., Schulz, N. & Garcia, J. The geography of global urban greenhouse gas emissions: an exploratory analysis. Clim. Change 121, 621–634 (2013).

    Google Scholar 

  16. Dodman, D. et al. Cities, Settlements and Key Infrastructure. in Climate Change 2022: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (eds. Pörtner, H.-O. et al.) 907–1040, https://doi.org/10.1017/9781009325844.008 (Cambridge University Press, Cambridge, UK and New York, NY, USA, 2022).

  17. Hsu, A., Wang, X., Tan, J., Toh, W. & Goyal, N. Predicting European cities’ climate mitigation performance using machine learning. Nat. Commun. 13, 7487 (2022).

    Google Scholar 

  18. Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining vols 13-17-Augu 785–794 (ACM, 2016).

  19. Feng, W. et al. Application of Neural Networks on Carbon Emission Prediction: A Systematic Review and Comparison. Energies 17, 1628 (2024).

    Google Scholar 

  20. Lwasa, S. et al. Urban systems and other settlements. in Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (eds. Shukla, P. R. et al.) 861–952. https://doi.org/10.1017/9781009157926.010 (Cambridge University Press, Cambridge, UK and New York, NY, USA, 2022).

  21. GADM. Database of Global Administrative Areas. (2024).

  22. UNEP Emissions Gap Report 2024: No More Hot Air… Please! https://unepccc.org/emissions-gap-reports/ (2024).

  23. Hsu, A. et al. ClimActor, harmonized transnational data on climate network participation by city and regional governments. Sci. Data 7, 374–374 (2020).

    Google Scholar 

  24. Manya, D. et al. ClimActor 2.0: A spatialized database of subnational climate pledges and emissions data. Preprint at https://doi.org/10.31223/X5BJ2S (2025).

  25. IPCC. Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. https://doi.org/10.1017/9781009157926 (Cambridge University Press, Cambridge, UK and New York, NY, USA, 2022).

  26. Kona, A. et al. Global Covenant of Mayors, a dataset of greenhouse gas emissions for 6200 cities in Europe and the Southern Mediterranean countries. Earth Syst. Sci. Data 13, 3551–3564 (2021).

    Google Scholar 

  27. Data Driven Yale, NewClimate Institute, & PBL Environmental Assessment Agency. Global Climate Action from Cities, Regions, and Businesses: Individual Actors, Collective Initiatives and Their Impact on Global Greenhouse Gas Emissions. https://datadrivenlab.org/wp-content/uploads/2018/08/YALE-NCI-PBL_Global_climate_action.pdf (2018).

  28. World Bank. Total greenhouse gas emissions excluding LULUCF per capita. (2023).

  29. Oda, T., Maksyutov, S. & Andres, R. J. The Open-source Data Inventory for Anthropogenic CO2, gridded emissions data product for tracer transport simulations and surface flux inversions. Earth Syst. Sci. Data 10, 87–107 (2018).

    Google Scholar 

  30. Bosilovich, M. G., Lucchesi, R. & Suarez, M. MERRA-2: FileSpecification. GMAO Office Note No. 9 (Version 1.1), 73 pp, available from http://gmao.gsfc.nasa.gov/pubs/office_notes (2016).

  31. Spinoni, J. et al. Changes of heating and cooling degree-days in Europe from 1981 to 2100. Int. J. Climatol. 38, e191–e208 (2018).

    Google Scholar 

  32. Engel-Cox, J., Kim Oanh, N. T., van Donkelaar, A., Martin, R. V. & Zell, E. Toward the next generation of air quality monitoring: Particulate Matter. Atmos. Environ. 80, 584–590 (2013).

    Google Scholar 

  33. van Donkelaar, A. et al. Monthly Global Estimates of Fine Particulate Matter and Their Uncertainty. Environ. Sci. Technol. 55, 15287–15300 (2021).

    Google Scholar 

  34. Hammer, M. S. et al. Assessment of the impact of discontinuity in satellite instruments and retrievals on global PM2.5 estimates. Remote Sens. Environ. 294, 113624 (2023).

    Google Scholar 

  35. Cooper, M. J. et al. Global fine-scale changes in ambient NO2 during COVID-19 lockdowns. Nature 601, 380–387 (2022).

    Google Scholar 

  36. Global Modeling And Assimilation Office & Pawson, S. MERRA-2 tavgM_2d_aer_Nx: 2d,Monthly mean,Time-averaged,Single-Level,Assimilation,Aerosol Diagnostics V5.12.4. NASA Goddard Earth Sciences Data and Information Services Center https://doi.org/10.5067/FH9A0MLJPC7N (2015).

  37. Qi, L. & Wang, S. Fossil fuel combustion and biomass burning sources of global black carbon from GEOS-Chem simulation and carbon isotope measurements. Atmospheric Chem. Phys. 19, 11545–11557 (2019).

    Google Scholar 

  38. Stanelle, T., Bey, I., Raddatz, T., Reick, C. & Tegen, I. Anthropogenically induced changes in twentieth century mineral dust burden and the associated impact on radiative forcing. J. Geophys. Res. Atmospheres 119, 13,526–13,546 (2014).

    Google Scholar 

  39. Chen, J. et al. Global 1 km\times 1 km gridded revised real gross domestic product and electricity consumption during 1992–2019 based on calibrated nighttime light data. Sci. Data 9, 202 (2022).

    Google Scholar 

  40. Schiavina, M., Melchiorri, M. & Freire, S. GHS-DUC R2023A - GHS Degree of Urbanisation Classification, application of the Degree of Urbanisation methodology (stage II) to GADM 4.1 layer, multitemporal (1975–2030). European Commission, Joint Research Centre (JRC) https://doi.org/10.2905/DC0EB21D-472C-4F5A-8846-823C50836305 (2023).

  41. Kummu, M., Kosonen, M. & Masoumzadeh Sayyar, S. Downscaled gridded global dataset for gross domestic product (GDP) per capita PPP over 1990–2022. Sci. Data 12, 178 (2025).

    Google Scholar 

  42. Perry, M. rasterstats (2021).

  43. Yu, Y., Li, X., Hsu, A. & Kittner, N. Mapping Spatiotemporal Disparities in Residential Electricity Inequality Using Machine Learning. Environ. Sci. Technol. 58, 19999–20008 (2024).

    Google Scholar 

  44. Erickson, N. et al. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. Preprint at https://doi.org/10.48550/arXiv.2003.06505 (2020).

  45. Lundberg, S. M. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. in Advances in Neural Information Processing Systems 30 (Curran Associates, Inc., 2017).

  46. Wang, X. A global machine learning model for predicting urban greenhouse gas predictions in the G20 from 2000-2020. UNC Dataverse https://doi.org/10.15139/S3/N5SVSP (2025).

    Google Scholar 

  47. EPA. Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990-2022. https://www.epa.gov/ghgemissions/inventory-us-greenhouse-gas-emissions-and-sinks-1990-2022 (2024).

  48. UNFCCC. GHG data from UNFCCC. GHG data from UNFCCC https://unfccc.int/topics/mitigation/resources/registry-and-data/ghg-data-from-unfccc (2025).

  49. Global Covenant of Mayors for Climate & Energy. Data Portal for Cities. http://www.dataportalforcities.org (2025).

  50. Dou, X. et al. Near-real-time global gridded daily CO2 emissions 2021. Sci. Data 10, 69 (2023).

    Google Scholar 

  51. Hüllermeier, E. & Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110, 457–506 (2021).

    Google Scholar 

Download references

Acknowledgements

The authors thank Noah Civiletti, Emma Holmes, and Izzy Bukovnik for assistance in data extraction. This work was funded by an IKEA Foundation (Grant no. G-2306-02289) and the National Science Foundation (Grant no. 2216592) to A. Hsu.

Author information

Author notes
  1. Ying Yu

    Present address: School of Humanities and Social Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen, 518172, China

  2. These authors contributed equally: Ying Yu, Xuewei Wang, Diego Manya.

Authors and Affiliations

  1. Data-Driven EnviroLab, Institute for the Environment, University of North Carolina at Chapel Hill, Chapel Hill, 27516, USA

    Ying Yu, Xuewei Wang, Diego Manya & Angel Hsu

  2. Department of Public Policy, University of North Carolina at Chapel Hill, Chapel Hill, 27599, USA

    Xuewei Wang & Angel Hsu

Authors
  1. Ying Yu
    View author publications

    Search author on:PubMed Google Scholar

  2. Xuewei Wang
    View author publications

    Search author on:PubMed Google Scholar

  3. Diego Manya
    View author publications

    Search author on:PubMed Google Scholar

  4. Angel Hsu
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Y.Y., X.W. and D.M. contributed equally to the study. Y.Y., X.W., and D.M. collected data and conducted the analysis. X.W., D.M., and A.H. cleaned and compiled the data. Y.Y. and X.W. contributed to the method. A.H. conceptualized and supervised the study. Y.Y., X.W., D.M., and A.H. wrote the manuscript. All coauthors reviewed, edited, and approved the manuscript.

Corresponding author

Correspondence to Angel Hsu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Online Only Table 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Y., Wang, X., Manya, D. et al. Machine learning estimates for G20 subnational urban GHG emissions from 2000–2020. Sci Data (2026). https://doi.org/10.1038/s41597-026-06691-9

Download citation

  • Received: 21 July 2025

  • Accepted: 23 January 2026

  • Published: 19 February 2026

  • DOI: https://doi.org/10.1038/s41597-026-06691-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing