Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Global emission factor dataset for Scope 3 machine learning applications
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 04 February 2026

Global emission factor dataset for Scope 3 machine learning applications

  • Yanming Guo  ORCID: orcid.org/0009-0001-8501-06401,
  • Charles Guan2 &
  • Jin Ma1 

Scientific Data , Article number:  (2026) Cite this article

  • 670 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Climate-change mitigation
  • Environmental economics

Abstract

The accurate and transparent estimation of greenhouse gas emissions is essential for corporate sustainability reporting and machine learning applications. Existing emission-factor datasets have restrictive licenses, insufficient spatiotemporal granularity, or outdated information, limiting their reproducibility and utility across disciplines. We present ExioML, an open-source dataset derived from Exiobase 3.8.2. It integrates environmentally extended multi-regional input-output tables with a graphics processing unit (GPU)-accelerated computational toolkit, facilitating compatibility with and extensibility to other datasets. ExioML encompasses sector-level emission factor data for 49 regions and 28 years from 1995 to 2022, structured into two aggregation schemes: a product-by-product format covering 200 categories, and an industry-by-industry format covering 163 categories. To validate dataset usability and establish a reproducible baseline, we define a regression task for predicting sectoral greenhouse gas emissions. The task is evaluated using tree-based and neural-network-based models, with mean squared error as the evaluation metric. ExioML provides openly accessible emission-factor tables and a reproducible baseline intended to support reuse and benchmarking across sustainability and machine-learning studies.

Similar content being viewed by others

A unified modelling framework for projecting sectoral greenhouse gas emissions

Article Open access 19 March 2024

Addressing data gaps in sustainability reporting: A benchmark dataset for greenhouse gas emission extraction

Article Open access 27 August 2025

A dataset of structural breaks in greenhouse gas emissions for climate policy evaluation

Article Open access 10 January 2025

Data availability

The ExioML dataset40, including the Factor Accounting and Footprint Network tables, is publicly available on the Zenodo repository (https://doi.org/10.5281/zenodo.10604610). The repository provides four CSV files: ExioML_factor_accounting_PxP.csv, ExioML_factor_accounting_IxI.csv, ExioML_footprint_network_PxP.csv, and ExioML_footprint_network_IxI.csv, covering 49 regions from 1995 to 2022. The PxP/IxI suffixes distinguish product-by-product and industry-by-industry variants, and the two components correspond to the tabular factor tables and footprint edge lists described in Data Records. ExioML redistributes only derived emission factors and footprint summaries computed from the openly licensed EXIOBASE 3.8.2 dataset (CC BY-SA 4.0)15; no proprietary MRIO inputs are included.

Code availability

The code for constructing ExioML can be found on GitHub (https://github.com/Yvnminc/ExioML).

References

  1. Dumit, A. et al. Atlas: A spend classification benchmark for estimating scope 3 carbon emissions. In: NeurIPS 2024 Workshop on Tackling Climate Change with Machine Learning https://www.climatechange.ai/papers/neurips2024/70 (2024).

  2. Balaji, B. et al. Flamingo: Environmental impact factor matching for life cycle assessment with zero-shot machine learning. ACM Journal on Computing and Sustainable Societies 1(2), 1–23 (2023).

    Google Scholar 

  3. Jain, A., Padmanaban, M., Hazra, J., Godbole, S., Weldemariam, K.: Scope 3 emission estimation using large language models. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2023)

  4. Balaji, B., Vunnava, V.S.G., Guest, G., Kramer, J.: Caml: Carbon footprinting of household products with zero-shot semantic text similarity. In: Proceedings of the ACM Web Conference 2023, pp. 4004-4014 (2023)

  5. Rao, N. D., Riahi, K. & Grubler, A. Climate impacts of poverty eradication. Nature Climate Change 4(9), 749–751, https://doi.org/10.1038/nclimate2340 (2014).

    Google Scholar 

  6. Jorgenson, A. K. Economic development and the carbon intensity of human well-being. Nature Climate Change 4(3), 186–189, https://doi.org/10.1038/nclimate2110 (2014).

    Google Scholar 

  7. Rolnick, D. et al. Tackling climate change with machine learning. ACM Computing Surveys (CSUR) 55(2), 1–96, https://doi.org/10.1145/3485128 (2022).

    Google Scholar 

  8. Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382(6677), 1416–1421, https://doi.org/10.1126/science.adi2336 (2023).

    Google Scholar 

  9. Stanimirova, R. et al. A global land cover training dataset from 1984 to 2020. Sci. Data 10(1), 879, https://doi.org/10.1038/s41597-023-02798-5 (2023).

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778 (2016)

  11. Zheng, X. et al. A multi-scale time-series dataset with benchmark for machine learning in decarbonized energy grids. Sci. Data 9(1), 359, https://doi.org/10.1038/s41597-022-01455-7 (2022).

    Google Scholar 

  12. Nangini, C. et al. A global dataset of co2 emissions and ancillary data related to emissions for 343 cities. Scientific data 6(1), 1–29, https://doi.org/10.1038/sdata.2018.280 (2019).

    Google Scholar 

  13. Zhu, B. et al. Carbonmonitor-power near-real-time monitoring of global power generation on hourly to daily scales. Sci. Data 10(1), 217, https://doi.org/10.1038/s41597-023-02094-2 (2023).

    Google Scholar 

  14. Ballarin, A. S. et al. Climbra-climate change dataset for brazil. Sci. Data 10(1), 47, https://doi.org/10.1038/s41597-023-01956-z (2023).

    Google Scholar 

  15. Stadler, K. Exiobase 3: Developing a time series of detailed environmentally extended multi-regional input-output tables. Journal of Industrial Ecology 22(3), 502–515, https://doi.org/10.1111/jiec.12715 (2018).

    Google Scholar 

  16. Leontief, W., Strout, A. Multiregional input-output analysis. In: Structural Interdependence and Economic Development: Proceedings of an International Conference on Input-Output Techniques, Geneva, September 1961, pp. 119–150 https://doi.org/10.1007/978-1-349-81634-7_8 (1963).

  17. Wang, S., Zhao, Y. & Wiedmann, T. Carbon emissions embodied in china–australia trade: A scenario analysis based on input–output analysis and panel regression models. Journal of cleaner production 220, 721–731, https://doi.org/10.1016/j.jclepro.2019.02.071 (2019).

    Google Scholar 

  18. Sun, C., Chen, L. & Zhang, F. Exploring the trading embodied co2 effect and low-carbon globalization from the international division perspective. Environmental Impact Assessment Review 83, 106414, https://doi.org/10.1016/j.eiar.2020.106414 (2020).

    Google Scholar 

  19. Steinberger, J. K., Roberts, J. T., Peters, G. P. & Baiocchi, G. Pathways of human development and carbon emissions embodied in trade. Nature Climate Change 2(2), 81–85, https://doi.org/10.1038/nclimate1371 (2012).

    Google Scholar 

  20. Jakob, M. & Marschinski, R. Interpreting trade-related co2 emission transfers. Nature Climate Change 3(1), 19–23, https://doi.org/10.1038/nclimate1630 (2013).

    Google Scholar 

  21. Isard, W.: Interregional and regional input-output analysis: a model of a space-economy. The review of Economics and Statistics, 318–328 (1951).

  22. Chenery, H.B., Watanabe, T. International comparisons of the structure of production. Econometrica: Journal of the Econometric Society, 487–521 (1958).

  23. Hoekstra, R. & Bergh, J. C. Comparing structural decomposition analysis and index. Energy economics 25(1), 39–64, https://doi.org/10.1016/S0140-9883(02)00059-2 (2003).

    Google Scholar 

  24. Peters, G. P. et al. Key indicators to track current progress and future ambition of the paris agreement. Nature Climate Change 7(2), 118–122, https://doi.org/10.1038/nclimate3202 (2017).

    Google Scholar 

  25. Duan, Y. & Yan, B. Economic gains and environmental losses from international trade: A decomposition of pollution intensity in china’s value-added trade. Energy economics 83, 540–554, https://doi.org/10.1016/j.eneco.2019.08.002 (2019).

    Google Scholar 

  26. Kitzes, J. An introduction to environmentally-extended input-output analysis. Resources 2(4), 489–503 (2013).

    Google Scholar 

  27. Peters, G. P. & Hertwich, E. G. Pollution embodied in trade: The norwegian case. Global Environmental Change 16(4), 379–387 (2006).

    Google Scholar 

  28. Hertwich, E. G. & Peters, G. P. Carbon footprint of nations: a global, trade-linked analysis. Environmental science & technology 43(16), 6414–6420 (2009).

    Google Scholar 

  29. Meng, J. et al. The narrowing gap in developed and developing country emission intensities reduces global trade’s carbon leakage. Nature Communications 14(1), 3775, https://doi.org/10.1038/s41467-023-39449-7 (2023).

    Google Scholar 

  30. Tian, K. et al. Regional trade agreement burdens global carbon emissions mitigation. Nature communications 13(1), 408, https://doi.org/10.1038/s41467-022-28004-5 (2022).

    Google Scholar 

  31. Akbari, M. & Do, T. N. A. A systematic review of machine learning in logistics and supply chain management: current trends and future directions. Benchmarking: An International Journal 28(10), 2977–3005, https://doi.org/10.1108/BIJ-10-2020-0514 (2021).

    Google Scholar 

  32. Rolnick, D. et al. Tackling Climate Change with Machine Learning https://arxiv.org/abs/1906.05433 (2019).

  33. Abdella, G. M., Kucukvar, M., Onat, N. C., Al-Yafay, H. M. & Bulak, M. E. Sustainability assessment and modeling based on supervised machine learning techniques: The case for food consumption. Journal of Cleaner Production 251, 119661, https://doi.org/10.1016/j.jclepro.2019.119661 (2020).

    Google Scholar 

  34. Nilashi, M. et al. Measuring sustainability through ecological sustainability and human sustainability: A machine learning approach. Journal of Cleaner Production 240, 118162, https://doi.org/10.1016/j.jclepro.2019.118162 (2019).

    Google Scholar 

  35. He, Y. et al. Factors influencing carbon emissions from china’s electricity industry: Analysis using the combination of lmdi and k-means clustering. Environmental Impact Assessment Review 93, 106724, https://doi.org/10.1016/j.eiar.2021.106724 (2022).

    Google Scholar 

  36. Kijewska, A. & Bluszcz, A. Research of varying levels of greenhouse gas emissions in european countries using the k-means method. Atmospheric Pollution Research 7(5), 935–944, https://doi.org/10.1016/j.apr.2016.05.010 (2016).

    Google Scholar 

  37. Wiedmann, T. et al. Development of an embedded carbon emissions indicator–producing a time series of input–output tables for the uk by using a mrio data optimisation system. Report to the UK Department for Environment, Food and Rural Affairs by Stockholm Environment Institute at the University of York and Centre for Integrated Sustainability Analysis at the University of Sydney, London, DEFRA (2007).

  38. Stadler, K. Pymrio–a python based multi-regional input-output analysis toolbox https://doi.org/10.5334/jors.251 (2021).

  39. Ang, B. W. Decomposition analysis for policymaking in energy:: which is the preferred method? Energy policy 32(9), 1131–1139 (2004).

    Google Scholar 

  40. Guo, Y., Ma, J. ExioML: Eco-economic Dataset for Machine Learning in Global Sectoral Sustainability. Zenodo https://doi.org/10.5281/zenodo.10604610, https://zenodo.org/records/10604610 (2024).

  41. Sun, W. & Huang, C. Predictions of carbon emission intensity based on factor analysis and an improved extreme learning machine from the perspective of carbon emission efficiency. Journal of Cleaner Production 338, 130414, https://doi.org/10.1016/j.jclepro.2022.130414 (2022).

    Google Scholar 

  42. Riahi, K., Grübler, A. & Nakicenovic, N. Scenarios of long-term socio-economic and environmental development under climate stabilization. Technological forecasting and social change 74(7), 887–935, https://doi.org/10.1016/j.techfore.2006.05.026 (2007).

    Google Scholar 

  43. Matisoff, D. C. Different rays of sunlight: Understanding information disclosure and carbon transparency. Energy Policy 55, 579–592, https://doi.org/10.1016/j.enpol.2012.12.049 (2013).

    Google Scholar 

  44. Gorishniy, Y., Rubachev, I., Khrulkov, V. & Babenko, A. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems 34, 18932–18943, https://doi.org/10.48550/arXiv.2106.11959 (2021).

    Google Scholar 

  45. Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017).

  46. Zhang, O. Tips for data science competitions. https://datascience.stackexchange.com/questions/10839 (2016).

  47. Joseph, M. Pytorch tabular: A framework for deep learning with tabular data. arXiv preprint arXiv:2104.13638 https://doi.org/10.48550/arXiv.2104.13638 (2021).

    Google Scholar 

  48. Dietzenbacher, E., Los, B., Stehrer, R., Timmer, M. & De Vries, G. The construction of world input–output tables in the wiod project. Economic systems research 25(1), 71–98, https://doi.org/10.1080/09535314.2012.761180 (2013).

    Google Scholar 

  49. Chepeliev, M. Gtap-power data base: Version 11. Journal of Global Economic Analysis 8(2) https://doi.org/10.21642/JGEA.080203AF (2023).

  50. Lenzen, M., Moran, D., Kanemoto, K. & Geschke, A. Building eora: a global multi-region input–output database at high country and sector resolution. Economic Systems Research 25(1), 20–49, https://doi.org/10.1080/09535314.2013.769938 (2013).

    Google Scholar 

  51. Ingwersen, W. W., Li, M., Young, B., Vendries, J. & Birney, C. Useeio v2. 0, the us environmentally-extended input-output model v2. 0. Sci. Data 9(1), 194 (2022).

    Google Scholar 

  52. Stadler, K. et al. Exiobase 3 (version 3.8.2). Zenodo https://doi.org/10.5281/zenodo.5589597 (2021).

  53. Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE transactions on information theory 13(1), 21–27, https://doi.org/10.1109/TIT.1967.1053964 (1967).

    Google Scholar 

  54. Hoerl, A. E. & Kennard, R. W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67, https://doi.org/10.1080/00401706.1970.10488634 (1970).

    Google Scholar 

  55. Quinlan, J. R. Induction of decision trees. Machine learning 1, 81–106, https://doi.org/10.1007/BF00116251 (1986).

    Google Scholar 

  56. Breiman, L. Random forests. Machine learning 45, 5–32, https://doi.org/10.1023/A:1010933404324 (2001).

    Google Scholar 

  57. Friedman, J.H. Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189–1232 (2001).

  58. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. nature 521(7553), 436–444, https://doi.org/10.1038/nature14539 (2015).

    Google Scholar 

  59. Joseph, M. & Raj, H. Gate: Gated additive tree ensemble for tabular classification and regression. arXiv preprint arXiv:2207.08548 https://doi.org/10.48550/arXiv.2207.08548 (2022).

    Google Scholar 

Download references

Acknowledgements

This work received no external funding.

Author information

Authors and Affiliations

  1. School of Electrical & Computer Engineering, University of Sydney, Sydney, Australia

    Yanming Guo & Jin Ma

  2. Rich Data Co, Sydney, Australia

    Charles Guan

Authors
  1. Yanming Guo
    View author publications

    Search author on:PubMed Google Scholar

  2. Charles Guan
    View author publications

    Search author on:PubMed Google Scholar

  3. Jin Ma
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Y. Guo designed the study and produced the dataset, visualisations, and regression models for technical validation. J. Ma supervised the project. C. Guan participated in the project design discussion and helped improve the paper draft. All authors contributed to the manuscript.

Corresponding author

Correspondence to Jin Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Y., Guan, C. & Ma, J. Global emission factor dataset for Scope 3 machine learning applications. Sci Data (2026). https://doi.org/10.1038/s41597-026-06699-1

Download citation

  • Received: 26 February 2025

  • Accepted: 23 January 2026

  • Published: 04 February 2026

  • DOI: https://doi.org/10.1038/s41597-026-06699-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing