Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
Tackling Discrepancies in Trade Data: The Harvard Growth Lab International Trade Datasets
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 22 January 2026

Tackling Discrepancies in Trade Data: The Harvard Growth Lab International Trade Datasets

  • Sebastián Bustos1 na1,
  • Ellie Jackson  ORCID: orcid.org/0009-0001-8233-15101 na1,
  • David Torun  ORCID: orcid.org/0000-0001-6460-15322 na1,
  • Brendan Leonard1,
  • Nil Tuzcu1,
  • Piotr Lukaszuk3,
  • Annie White1,
  • Ricardo Hausmann1,4 &
  • …
  • Muhammed A. Yıldırım1,5 

Scientific Data , Article number:  (2026) Cite this article

  • 2052 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Developing world
  • Economics
  • Industry

Abstract

Bilateral trade data informs foreign and domestic policy decisions, serves as a growth indicator, determines tariffs, and is the basis for financial and investment decisions for corporations. Accurate trade data translates into better decision-making. However, the raw bilateral trade data reported by UN Comtrade suffer from two structural problems: reporting differences between country partners and countries reporting in different product classification systems, which require product-level harmonization to compare data across countries. In this paper, we address these challenges by combining a mirroring technique and a data-driven concordance method. Mirroring reconciles importer and exporter differences by imputing country reliability scores and applying a weighted country-pair average to calculate the estimated trade value. We harmonize product classifications across vintages by calculating conversion weights that reflect a product’s market share. The resulting publicly available datasets mitigate issues in raw trade statistics, reducing reporting inconsistencies while maintaining product-level granularity across six decades.

Similar content being viewed by others

Industry-level estimates of export quality accounting for global value chains

Article Open access 04 March 2025

Beyond trade statistics: how much do exports actually contribute to domestic value added?

Article Open access 13 March 2024

Estimating digital product trade through corporate revenue data

Article Open access 19 June 2024

Data availability

The datasets are available at Harvard’s Dataverse within the Atlas of Economic Complexity page at URL https://dataverse.harvard.edu/dataverse/atlas. The datasets include: Weighted Classification Conversion Tables3 (https://doi.org/10.7910/DVN/6AADMR) and Bilateral Trade Data Aggregated by Year4 (https://doi.org/10.7910/DVN/5NGVOB).

Code availability

The code used to acquire data from Comtrade, generate conversion weights, and mirror the bilateral trade data, respectively, is available for public use via GitHub in the following repositories:• https://github.com/harvard-growth-lab/comtrade-downloader15• https://github.com/harvard-growth-lab/comtrade-conversion-weights48• https://github.com/harvard-growth-lab/comtrade-mirroring49

While UN Comtrade provides open access to all data used, these packages make use of bulk downloading features available with a premium API subscription through the UN Comtrade API Package17. These packages enable anyone to reproduce these datasets with the most recent data available from Comtrade in SITC and HS classification vintages. Our method is designed to accommodate new classification vintages. When the WCO releases a new classification as scheduled in 2027, our code will handle the new concordance with minimal modifications. Each of these packages are written primarily in Python, with some functionality implemented in R and Matlab.

References

  1. UN Department of Economic and Social Affairs; Statistics Division. Classifications on economic statistics https://unstats.un.org/unsd/classifications/Econ/ (2022).

  2. Lukaszuk, P. & Torun, D. Harmonizing the harmonized system. SEPS Discussion Paper 2022-12, SEPS (2022).

  3. Harvard Growth Lab. Weighted Classification Conversion Tables https://doi.org/10.7910/DVN/6AADMR (2025).

  4. Harvard Growth Lab. Bilateral Trade Data Aggregated by Year https://doi.org/10.7910/DVN/5NGVOB (2025).

  5. Bustos, S. & Yildirim, M. A. Uncovering trade flows. Unpublished Mimeo (2024).

  6. Hausmann, R., Stock, D. P. & Yildirim, M. A. Implied comparative advantage. Research Policy 51, 104143 (2022).

    Google Scholar 

  7. O’Clery, N., Yildirim, M. A. & Hausmann, R. Productive ecosystems and the arrow of development. Nature Communications 12, 1479 (2021).

    Google Scholar 

  8. Bustos, S. & Yildirim, M. A. Production ability and economic growth. Research Policy 51, 104153 (2022).

    Google Scholar 

  9. Bustos, S. & Morales-Arilla, J. Gains from globalization and economic nationalism: Amlo versus nafta in the 2006 mexican elections. Economics & Politics 36, 202–244 (2024).

    Google Scholar 

  10. Hausmann, R., Schetter, U. & Yildirim, M. A. On the design of effective sanctions: The case of bans on exports to russia. Economic Policy 39, 109–153 (2024).

    Google Scholar 

  11. Egger, P., Foellmi, R., Schetter, U. & Torun, D. Gravity with History: On Incumbency Effects in International Trade. Journal of the European Economic Association jvae052 (2024).

  12. Head, K. & Mayer, T. Gravity Equations: Workhorse, Toolkit, and Cookbook. In Handbook of International Economics, vol. 4, 131–195 (Elsevier, 2014).

  13. Anderson, J. E. & Van Wincoop, E. Gravity with gravitas: A solution to the border puzzle. American Economic Review 93, 170–192 (2003).

    Google Scholar 

  14. Yotov, Y. V. Gravity at sixty: the workhorse model of trade (2022).

  15. Harvard Growth Lab. Comtrade downloader. https://github.com/harvard-growth-lab/comtrade-downloader (2025).

  16. United Nations Statistics Division. Un comtrade database. https://comtradeplus.un.org/TradeFlow. Annual commodity data requested in originally reported classification for SITC and HS classifications and all countries (1962-2023). Accessed: March 26, 2025.

  17. untradestats. comtradeapicall [software]. https://pypi.org/project/comtradeapicall/ (2024).

  18. Mayer, T. & Zignago, S. Notes on CEPII’s distances measures: The GeoDist database. Tech. Rep. https://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=6 (2011).

  19. International Monetary Fund. World economic outlook database, april 2025 edition https://www.imf.org/en/Publications/WEO/weo-database/2025/april (2025).

  20. World Bank. World development indicators: Population https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL (2025).

  21. U.S. Bureau of Labor Statistics. Producer price index by commodity: Industrial commodities [ppiidc] https://fred.stlouisfed.org/series/PPIIDC (2025).

  22. United Nations Statistics Division. Complete correlations among hs, sitc and bec https://unstats.un.org/unsd/classifications/Econ (2022).

  23. Allen, R. L. & Berliner, J. S. Soviet economic warfare (1961).

  24. Bhagwati, J. On the underinvoicing of imports. Oxford Bulletin of Economics and Statistics 27, 389–397 (1964).

    Google Scholar 

  25. Naya, S. & Morgan, T. The accuracy of international trade data: the case of southeast asian countries. Journal of the American Statistical Association 64, 452–467 (1969).

    Google Scholar 

  26. Sheikh, M. A. Smuggling, production and welfare. Journal of International Economics 4, 355–364 (1974).

    Google Scholar 

  27. Yeats, A. J. On the accuracy of partner country trade statistics. Oxford Bulletin of Economics and Statistics 40, 341–361 (1978).

    Google Scholar 

  28. McDonald, D. C. Trade data discrepancies and the incentive to smuggle: An empirical analysis. Staff Papers 32, 668–692 (1985).

    Google Scholar 

  29. Yeats, A. J. On the accuracy of economic observations: Do sub-saharan trade statistics mean anything? The World Bank Economic Review 4, 135–156 (1990).

    Google Scholar 

  30. Rozanski, J. & Yeats, A. On the (in) accuracy of economic observations: An assessment of trends in the reliability of international trade statistics. Journal of Development Economics 44, 103–130 (1994).

    Google Scholar 

  31. Gehlhar, M. Reconciling bilateral trade data for use in gtap. Tech. Rep., GTAP Technical Papers (1996).

  32. Makhoul, B. & Otterstrom, S. M. Exploring the accuracy of international trade statistics. Applied Economics 30, 1603–1616 (1998).

    Google Scholar 

  33. Pohit, S. & Taneja, N. India’s informal trade with bangladesh: A qualitative assessment. The World Economy 26, 1187–1214 (2003).

    Google Scholar 

  34. Beja, E. L. Estimating trade mis-invoicing from china: 2000–2005. China & World Economy 16, 82–92 (2008).

    Google Scholar 

  35. Ferrantino, M. J. & Zhi, W. Accounting for discrepancies in bilateral trade: The case of china, hong kong, and the united states. China Economic Review 19, 502–520 (2008).

    Google Scholar 

  36. Barbieri, K., Keshk, O. M. & Pollins, B. M. Trading data: Evaluating our assumptions and coding rules. Conflict Management and Peace Science 26, 471–491 (2009).

    Google Scholar 

  37. Gaulier, G. & Zignago, S. Baci: International trade database at the product-level the 1994-2007 version (2010).

  38. Dong, G.Mirror statistics of international trade in manufacturing goods: The case of China (United Nations Industrial Development Organization, 2010).

  39. Hamanaka, S. et al. Usable data for economic policymaking and research? the case of lao pdr’s trade statistics. Asia-Pacific Research and Training Network on Trade (ARTNeT) (2010).

  40. Hamanaka, S. Whose trade statistics are correct? multiple mirror comparison techniques: A test case of cambodia. Journal of Economic Policy Reform 15, 33–56 (2012).

    Google Scholar 

  41. Ferrantino, M. J., Liu, X. & Wang, Z. Evasion behaviors of exporters and importers: Evidence from the us–china trade data discrepancy. Journal of International Economics 86, 141–157 (2012).

    Google Scholar 

  42. Pierce, J. R. & Schott, P. K. Concording U.S. Harmonized System Codes Over Time. Journal of Official Statistics 28, 53–68 (2012).

    Google Scholar 

  43. Cebeci, T. A Concordance among Harmonized System 1996, 2002 and 2007 Classifications. World Bank Working Papers, No. 74576 (2012).

  44. Diodato, D. A Network-based Method to Harmonize Data Classifications. Papers in Evolutionary Economic Geography (2018).

  45. Bellert, N. & Fauceglia, D. A Practical Routine to Harmonize Product Classifications over Time. International Economics 160, 84–89 (2019).

    Google Scholar 

  46. International Monetary Fund. World economic outlook https://www.imf.org/en/Publications/SPROLLs/world-economic-outlook-databases (2025).

  47. International Monetary Fund. Statistics Dept. External debt statistics: Guide for compilers and users (International Monetary Fund, 2014).

  48. Harvard Growth Lab. Comtrade conversion weights generator. https://github.com/harvard-growth-lab/comtrade-conversion-weights (2025).

  49. Harvard Growth Lab. Comtrade mirroring pipeline. https://github.com/harvard-growth-lab/comtrade-mirroring (2025).

Download references

Acknowledgements

We would like to thank Timothy P. Cheston for help with the data validation and contributions to the Atlas of Economic Complexity. We would like to thank Mali Akmanalp and Romain Vuillemot for working on earlier versions of the Atlas data and architecture. We would like to thank the current and former members of the Harvard Growth Lab for their continuous feedback on the data. David Torun gratefully acknowledges financial support from the Swiss National Science Foundation under Ambizione Grant 233238. The views expressed in this study are the authors' and do not reflect those of SECO.

Author information

Author notes
  1. These authors contributed equally: Sebastián Bustos, Ellie Jackson, David Torun.

Authors and Affiliations

  1. The Growth Lab, Center for International Development – Harvard University, Cambridge, MA, US

    Sebastián Bustos, Ellie Jackson, Brendan Leonard, Nil Tuzcu, Annie White, Ricardo Hausmann & Muhammed A. Yıldırım

  2. Department of Economics, University of Zurich, Zurich, Switzerland

    David Torun

  3. State Secretariat for Economic Affairs (SECO), Bern, Switzerland

    Piotr Lukaszuk

  4. Santa Fe Institute, Santa Fe, NM, US

    Ricardo Hausmann

  5. Department of Economics, College of Administrative Sciences and Economics - Koç University, Istanbul, Turkey

    Muhammed A. Yıldırım

Authors
  1. Sebastián Bustos
    View author publications

    Search author on:PubMed Google Scholar

  2. Ellie Jackson
    View author publications

    Search author on:PubMed Google Scholar

  3. David Torun
    View author publications

    Search author on:PubMed Google Scholar

  4. Brendan Leonard
    View author publications

    Search author on:PubMed Google Scholar

  5. Nil Tuzcu
    View author publications

    Search author on:PubMed Google Scholar

  6. Piotr Lukaszuk
    View author publications

    Search author on:PubMed Google Scholar

  7. Annie White
    View author publications

    Search author on:PubMed Google Scholar

  8. Ricardo Hausmann
    View author publications

    Search author on:PubMed Google Scholar

  9. Muhammed A. Yıldırım
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization: S.B., M.A.Y., R.H. Methodology: E.J., S.B., M.A.Y., D.T., P.L. Formal Analysis: S.B., E.J., D.T., M.A.Y. Investigation: S.B., E.J., D.T. Data Curation: S.B., E.J., D.T., B.L. Software: S.B., E.J., D.T., B.L. Validation: S.B., E.J., B.L. Visualization: S.B., E.J., N.T., B.L. Writing - Original Draft: S.B., E.J., D.T., M.A.Y. Writing - Review & Editing: S.B., E.J., D.T., B.L., M.A.Y., A.W. Project Administration: E.J. Supervision: M.A.Y., A.W., R.H. Funding Acquisition: R.H.

Corresponding authors

Correspondence to Ricardo Hausmann or Muhammed A. Yıldırım.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bustos, S., Jackson, E., Torun, D. et al. Tackling Discrepancies in Trade Data: The Harvard Growth Lab International Trade Datasets. Sci Data (2026). https://doi.org/10.1038/s41597-025-06488-2

Download citation

  • Received: 22 July 2025

  • Accepted: 17 December 2025

  • Published: 22 January 2026

  • DOI: https://doi.org/10.1038/s41597-025-06488-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing