Abstract
Monitoring global development aid provides important evidence for policymakers financing the Sustainable Development Goals (SDGs). To overcome the limitations of existing monitoring, we develop a machine learning framework that enables a comprehensive and granular categorization of development aid activities based on their textual descriptions. Specifically, we cluster the descriptions of ~3.2 million aid activities conducted between 2000 and 2019 totalling US$2.8 trillion. As a result, we generated 173 activity clusters representing the topics of underlying aid activities. Among them, 70 activity clusters cover topics that have not yet been analysed empirically (for example, greenhouse gas emissions reduction and maternal health care). On the basis of our activity clusters, global development aid can be monitored for new topics and at new levels of granularity, allowing the identification of unexplored spatio-temporal disparities. Our framework can be adopted by development finance and policy institutions to promote evidence-based decisions targeting the SDGs.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
Activity clusters can be monitored interactively via https://maltetoetzke.github.io/Monitoring-Global-Development-Aid/. The underlying data can be retrieved via https://github.com/MalteToetzke/Monitoring-Global-Development-Aid-With-Machine-Learning. For access to the raw data, please contact the DAC of the OECD.
Code availability
The scripts used for preprocessing the data and generating activity clusters can be retrieved via https://github.com/MalteToetzke/Monitoring-Global-Development-Aid-With-Machine-Learning. Analysis scripts are available on request from M.T.
References
Liu, J. et al. Systems integration for global sustainability. Science 347, (2015).
Sustainable Development Goals: The Sustainable Development Agenda (United Nations, 2015); https://www.un.org/sustainabledevelopment/development-agenda/
The Sustainable Development Goals Report 2018 (United Nations, 2018); https://unstats.un.org/sdgs/report/2018/
Global Indicator Framework for the Sustainable Development Goals and Targets of the 2030 Agenda for Sustainable Development (United Nations, 2019); https://unstats.un.org/sdgs/indicators/indicators-list/
World Investment Report 2014; Investing in the SDGs: An Action Plan (United Nations, 2014); https://unctad.org/en/PublicationsLibrary/wir2014_en.pdf
Development Co–operation Report 2018: Joining Forces to Leave No One Behind (OECD, 2018); http://www.oecd.org/social/development-co-operation-report-20747721.htm
Development Co–operation Report 2019: A Fairer, Greener, Safer Tomorrow (OECD, 2019); http://www.oecd.org/dac/development-co-operation-report-20747721.htm
Nunnenkamp, P., Öhler, H. & Thiele, R. Donor coordination and specialization: did the Paris declaration make a difference? Rev. World Econ. 149, 537–563 (2013).
Easterly, W. & Pfutze, T. Where does the money go? Best and worst practices in foreign aid. J. Econ. Perspect. 22, 29–52 (2008).
Clemens, M. A., Kenny, C. J. & Moss, T. J. The trouble with the MDGs: confronting expectations of aid and development success. World Dev. 35, 735–751 (2007).
Kenny, C. What is effective aid? How would donors allocate it? (World Bank, 2006).
Tierney, M. J. et al. More dollars than sense: refining our knowledge of development finance using AidData. World Dev. 39, 1891–1906 (2011).
Pitt, C., Grollman, C., Martinez-Alvarez, M., Arregoces, L. & Borghi, J. Tracking aid for global health goals: a systematic comparison of four approaches applied to reproductive, maternal, newborn, and child health. Lancet Glob. Health 6, 859–874 (2018).
Toward Mutual Accountability: The 2015 Adaptation Finance Transparency Gap Report (Adaptation Watch, 2015).
State of Inequality: Reproductive Maternal Newborn and Child Health; Interactive Visualization of Health Data (World Health Organization, 2015).
Flogstad, C. & Hagen, R. J. Aid dispersion: measurement in principle and practice. World Dev. 97, 232–250 (2017).
Creditor reporting system 2019. OECD Statistics https://stats.oecd.org/DownloadFiles.aspx?DatasetCode=CRS1 (2022).
Comparative Study of Data Reported to the OECD Creditor Reporting System (CRS) and to the Aid Management Platform (AMP) (OECD, 2009).
Purpose Codes: Sector Classification (OECD, 2021); https://www.oecd.org/development/financing-sustainable-development/development-finance-standards/purposecodessectorclassification.htm
Burke, M., Driscoll, A., Lobell, D. B. & Ermon, S. Using satellite imagery to understand and promote sustainable development. Science 371, (2021).
Kinyoki, D. K. Mapping child growth failure across low-and middle-income countries. Nature 577, 231–234 (2020).
Local Burden of Disease Educational Attainment Collaborators Mapping disparities in education across low-and-middle-income countries. Nature 577, 235–238 (2020).
Ricciardi, V. et al. A scoping review of research funding for small-scale farmers in water scarce regions. Nat. Sustain 3, 836–844 (2020).
Xie, M., Jean, N., Burke, M., Lobell, D. & Ermon, S. Transfer learning from deep features for remote sensing and poverty mapping. In Proc. 30th AAAI Conference on Artificial Intelligence (AAAI Press, 2016).
Blumenstock, J., Cadamuro, G. & On, R. Predicting poverty and wealth from mobile phone metadata. Science 350, 1073–1076 (2015).
Nature Editorial How science can put the Sustainable Development Goals back on track. Nature 589, 329–330 (2021).
Glossary of statistical terms: sector of destination (of aid). OECD Statistics https://stats.oecd.org/glossary/detail.asp?ID=6808 (2005).
GHG data from UNFCCC. UNFCCC https://unfccc.int/process-and-meetings/transparency-and-reporting/greenhouse-gas-data/ghg-data-unfccc/ghg-data-from-unfccc (2021).
Adoption of the Paris Agreement FCCC/CP/2015/L.9/Rev.1 (UNFCCC, 2015).
Glennie, J. & Sumner, A. Aid, Growth and Poverty (Springer, 2016).
Qian, N. Making progress on foreign aid. Annu. Rev. Econ. 7, 277–308 (2015).
Jakubik, J. & Feuerriegel, S. Data-driven allocation of development aid towards sustainable development goals: evidence from HIV/AIDS, Production and Operations Management (2022).
About us. World Food Programme Innovation Accelerator https://innovation.wfp.org/about-us (2021).
About givedirectly. GiveDirectly https://www.givedirectly.org/about/ (2021).
Adelman, M., Haimovich, F., Ham, A. & Vazquez, E. Predicting school dropout with administrative data: new evidence from Guatemala and Honduras. Educ. Econ. 26, 356–372 (2018).
Calantropio, A., Chiabrando, F., Codastefano, M. & Bourke, E. Deep learning for automatic building damage assessment: application in post-disaster scenarios using UAV data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 1, 113–120 (2021).
Glossary of statistical terms: aid activities. OECD Statistics https://stats.oecd.org/glossary/detail.asp?ID=6807 (2005).
Development Finance Standards (OECD, 2020); http://www.oecd.org/dac/financing-sustainable-development/development-finance-standards/
spacy-langdetect (SpaCy, 2019); https://spacy.io/universe/project/spacy-langdetect
Natural language toolkit (NLTK, 2019); https://www.nltk.org/
Hornik, K., Rauch, J., Buchta, C. & Feinerer, I. textcat: N-Gram Based Text Categorization. R version 3.2.0 https://cran.r-project.org/web/packages/textcat/textcat.pdf (2018).
Cloud translation API (Google Cloud, 2019); https://cloud.google.com/translate/docs/reference/rest/
Le, Q. & Mikolov, T. Distributed representations of sentences and documents. Proc. Mach. Learn. Res. 32, 1188–1196 (2014).
Dai, A. M., Olah, C. & Le, Q. V. Document embedding with paragraph vectors. Preprint at arXiv https://doi.org/10.48550/arXiv.1507.07998 (2015).
Campr, M. & Ježek, K. in International Conference on Text, Speech, and Dialogue (eds. Král, P. & Matoušek, V.) 252–260 (Springer, 2015).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (eds. Burges, C. J. C. et al.) 3111–3119 (Curran Associates, Inc., 2013).
Goodman, J. Classes for fast maximum entropy training. In IEEE International Conference on Acoustics, Speech, and Signal Processing. 561–564 (IEEE, 2001).
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proc. of COMPSTATʹ2010 (eds. Lechevallier, Y. & Saporta, G.) 177–186 (Springer, 2010).
Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Arthur, D. & Vassilvitskii, S. k-means++: The Advantages of Careful Seeding (Stanford Univ., 2006).
Dhillon, I. S. & Modha, D. S. Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001).
Wu, H. C., Luk, R. W. P., Wong, K. F. & Kwok, K. L. Interpreting tf–idf term weights as making relevance decisions. ACM Trans. Inf. Syst. 26, 1–37 (2008).
Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S. & Blei, D. M. Reading tea leaves: how humans interpret topic models. Adv. Neural Inf. Process. Syst. 32, 288–296 (2009).
Foreign Aid Explorer (USAID, 2021); https://explorer.usaid.gov/
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Acknowledgements
We thank the SDG financing lab of the OECD for the provision of the raw data and the mutual exchange over the course of this study. Furthermore, we would like to thank all researchers from the Swiss Federal Institute of Technology (ETH Zurich) who helped us in evaluating and naming activity clusters.
Author information
Authors and Affiliations
Contributions
M.T. performed data analysis and visualized the results. All the authors contributed to the conceptualization, interpretation of the results and the writing of the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Sustainability thanks Max Callaghan, Lynn Kaack and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Discussions 1–5, Figs. 1–12 and Tables 1–9.
Supplementary Data 1
Descriptive statistics of activity clusters from Supplementary Table 9.
Rights and permissions
About this article
Cite this article
Toetzke, M., Banholzer, N. & Feuerriegel, S. Monitoring global development aid with machine learning. Nat Sustain 5, 533–541 (2022). https://doi.org/10.1038/s41893-022-00874-z
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41893-022-00874-z
This article is cited by
-
A Benchmark Dataset of Chinese Development Finance with Climate Relevance and SDG Annotations from 2000–2021
Scientific Data (2026)
-
Global and regional patterns of soil metal(loid) mobility and associated risks
Nature Communications (2025)
-
Using natural language processing to analyse text data in behavioural science
Nature Reviews Psychology (2025)
-
Negativity drives online news consumption
Nature Human Behaviour (2023)
-
What determines international climate finance? Payment capability, self-interests and political commitment
Global Public Policy and Governance (2023)


