Main

The 2030 Agenda for Sustainable Development, adopted by the United Nations in 2015, provides a global blueprint for achieving peace, prosperity and environmental sustainability through 17 interlinked Sustainable Development Goals (SDGs)1. Achieving these ambitious goals will require integrated and collaborative approaches that address interrelated challenges such as poverty eradication, ecosystem protection, peace-building and inclusive economic growth2,3. Sustainability science plays a critical role in this effort by identifying solutions that enable systems-level change rather than shallow, fragmented interventions4. Emerging advances in artificial intelligence (AI) offer new opportunities to accelerate progress towards the SDGs by facilitating systems-thinking approaches and data-driven insights.

The impacts of AI on sustainable development are projected to be both immediate and long term, encompassing both positive and negative outcomes5,6. AI holds the potential to revolutionize sustainable development research by providing powerful tools, such as the ability to rapidly analyse complex datasets, predict climate trends and address critical challenges. However, the developments in the field of AI are often based on the requirements and values of the nation where AI is being advanced7. The technology is also unevenly distributed, with its affordability highly linked to the economic potential and motives of a nation8,9. Moreover, the rapid pace of AI development has outpaced the establishment of the ethical and regulatory frameworks needed to ensure its equitable and sustainable use10. As a result, it remains uncertain whether AI represents a true paradigm shift in sustainable development research and its application.

We provide a review of AI and its applications in SDG-related peer-reviewed research to assess the extent to which AI tools are integrated with deep knowledge on sustainable development. By dissecting spatial and temporal characteristics as well as research foci, we investigate the role of AI in SDG-related research by examining the available highly cited literature.

We extracted metadata from the Scopus database using tailored prompts for the corresponding policy areas of each SDG, including word variations of artificial intelligence and sustainability. We analysed the full text of 792 articles that met the inclusion criteria of engaging with both AI and one or several SDGs. Our analysis identifies two key dimensions within the literature: (1) a disciplinary axis, ranging from the natural sciences to the humanities, and (2) a focus axis, distinguishing studies focused on economics from those focused on socioecological content. Despite the potential of AI, our findings highlight a critical gap: very few studies effectively bridge advanced AI methodologies with deep sustainability expertise. Addressing this disconnect is essential to fully realize the promise of AI for sustainable development.

Results

Overview

We analysed 792 peer-reviewed research articles that applied AI to SDGs. The number of articles per year increased substantially over time, exceeding 100 publications in 2020 and surpassing 200 publications annually in 2022 and 2023 (Fig. 1a). Geographically, most research tackling SDGs using AI originates in Europe and Asia (Fig. 1b), with 38% of the analysed articles (n = 303) being authored by researchers affiliated in China, India, the United States and Spain.

Fig. 1: Global distribution and publication years of the 792 scientific articles analysed.
figure 1

a, Articles per year (23 articles published before 2015 were omitted). b, Article shares per country. The number of articles per institution of the first author is colourized per country. For the seven countries with the highest publication rates, the number of articles and the shares of SDG focus are depicted. See Supplementary Fig. 1 for the seven countries with the most publications and the SDGs they did not cover. SDG 17 is not represented among the 792 articles, as no relevant articles were found during the review. Basemap in b from the World Food Programme under an Open Government Licence v3.0.

Source data

Our analysis reveals geographic variability in SDG research focus. For example, studies from Iran, India and Spain focus on SDG 6 (clean water and sanitation), while studies from Italy and the United Kingdom predominantly address SDG 3 (good health and well-being). Articles focusing on SDG 4 (quality education) were most commonly published by authors in the United States, Spain and China.

We classified the articles into empirical (n = 393) and conceptual/review (n = 399) articles. Most empirical articles were authored by researchers in China (n = 58), the United States (n = 29) and India (n = 27) (Supplementary Fig. 5). Conceptual and review articles originated largely from researchers in India (n = 54), the United States (n = 34) and China (n = 31) (Supplementary Fig. 8). In the following, we will provide a comprehensive overview of empirical articles. A detailed results description and interpretation including the conceptual and review articles can be found in the Supplementary Information.

Patterns in empirical studies using AI to address SDGs

We used hierarchical cluster analysis to group literature on the basis of conceptual vocabulary and identified significant indicator words for each group using indicator species analysis11. To account for differences in content, structure and vocabulary, we split the sample into empirical articles and review/conceptual articles. This separation ensures that clustering results in thematic patterns rather than structural differences. Our main analysis is on empirical articles because they directly apply AI algorithms in specific sustainability contexts (see Supplementary Figs. 2 and 3 for the methodological properties of empirical articles).

Groups were refined until no further separation was supported (Supplementary Figs. 6 and 10). We applied detrended correspondence analysis to reduce the dataset to two primary ordination axes, minimizing within-group variance while maximizing group differences12. The resulting eight groups of empirical articles are organized along two key axes:

  1. (1)

    Disciplinary axis: this axis ranges from the natural sciences to the humanities. Articles on health and education emerged as more closely related, while studies on hydrological systems and vegetative assessments clustered separately.

  2. (2)

    Focus axis: this axis distinguishes articles on the basis of their approach. AI-driven studies with a potential economic purpose in areas such as clean energy and industry contrast with socioecological studies investigating hydrology or health care.

Each group is characterized by five indicator words that highlight dominant themes (Fig. 2 and Table 1). For comparison, clustering results for review and conceptual articles are provided in Supplementary Fig. 10. A detailed description of each group of empirical articles, including corresponding SDGs and recommended readings, is available in Supplementary Table 7.

Fig. 2: Research groups of 383 empirical articles on AI and sustainable development positioned along the two axes of a detrended correspondence analysis with the most explanatory power.
figure 2

Group 1, health care (red, 49 articles); group 2, vegetation (green, 62 articles); group 3, forecasting (brown, 52 articles); group 4, water (blue, 53 articles); group 5, remote sensing (olive green, 47 articles); group 6, clean energy (pink, 42 articles); group 7, industry (violet, 39 articles); group 8, education (turquoise, 39 articles). For the same word analysis but applied to conceptual/review articles, see Supplementary Fig. 9. To avoid artefacts in the cluster analysis due to extreme article length, we excluded ten articles (Supplementary Figs. 11 and 12). Group titles are based on the most abundant words per group, followed by the number of articles in each group. Group ellipses are based on eigenvalues. Each group is represented by the five most abundant words. Axis titles are the pattern interpretation of the authors. See Table 1 for a short description and Supplementary Table 7 for a full description of the groups.

Source data

Table 1 Group characteristics for empirical articles on AI and SDGs

Role of AI

To better understand the role of AI in sustainable development, we analysed the broader field of AI using a machine-learning taxonomy13. As machine learning represents the dominant subset of AI methods, this framework aligns with most of the empirical applications in our review, while also encompassing the broader AI context (see Supplementary Fig. 7 for the distribution of the role of AI across the SDGs).

The following heatmap (Fig. 3) shows the prevalence of AI applications across groups, highlighting key roles in

  1. (1)

    Forecasting: particularly prominent within clean energy and vegetation, where predictive insights support resource management and environmental monitoring

  2. (2)

    System optimization: widespread in clean energy, reflecting a focus on improving operational efficiency and performance

  3. (3)

    Data mining and remote sensing: notable for extracting actionable insights from unstructured data, especially in the health-care and remote-sensing groups, underscoring the growing need for data-driven decision-making

  4. (4)

    Accelerated experimentation and fast approximate simulation: specialized tools in clean energy and health care that facilitate research and preliminary analysis (less common)

Fig. 3: Heatmap illustrating the frequency of use of AI roles in different word analysis groups.
figure 3

A darker colour indicates AI role presence in sustainability topics. Each cell represents the number of articles addressing one AI role and one sustainability topic. Not all 383 articles used in word analysis appear here, as some empirical articles did not specify a role for AI. Accelerated experimentation, experiments to speed up the design process; data mining and remote sensing, raw data translation such as text documents or satellite imagery into usable insights; fast approximate simulation, rapid modelling of complex systems and processes; forecasting, prediction of events by learning from time-series data; system optimization, control and improvement of operational efficiency of complex systems; predictive maintenance, maintenance of systems to improve efficiency, reduce costs and build resilience.

Source data

The overall pattern suggests that AI enhances efficiency, resilience and sustainability across diverse sectors, with specific applications reflecting sectoral needs and data availability.

The choice of AI algorithms is strongly determined by the group-specific demands, challenges and data availability. In this sense, AI encompasses a wide array of models, some of which overlap or are nested within each other (Fig. 4). For example, large language models are a deep-learning approach to natural-language processing, but have become a separate object of study and application from deep learning. See refs. 14,15 for further reference regarding AI classification. For example,

  • Deep-learning and supervised machine-learning algorithms dominate applications in vegetation, water and clean energy. These types of AI are employed for system optimization (for example, renewable energy systems and electric cars), energy and water demand prediction as well as pollution prediction and remote-sensing image classification, for example, refs. 16,17.

  • Evolutionary algorithms allow for efficient optimization in challenging scenarios, such as maximizing the efficiency of renewable energy layouts, for example, ref. 18.

  • Fuzzy-logic algorithms provide interpretability and adaptability, making them valuable for modelling systems where human input or interpretive insights are necessary, for example, ref. 19.

  • Natural-language-processing algorithms are emerging as critical tools in health care and education, where unstructured or textual data are more prevalent.

Fig. 4: Heatmap illustrating the application frequency of the top 20 most frequent AI types across sustainability topic groups in word analysis.
figure 4

Each cell’s intensity indicates the number of articles employing specific AI techniques within a sustainability topic. Some articles employed one or more types of AI algorithm (see Supplementary Table 1 for the top ten algorithms identified in the review). Not all 383 articles used in word analysis appear here as some empirical articles did not refer to any specific type of AI, but rather AI as a whole.

Source data

However, numerous gaps remain. In groups such as education and industry, AI is often treated as a study object rather than actively applied. This reflects limited empirical exploration of the use and impact of AI in organizations, highlighting opportunities for further research.

Discussion

Despite the growing body of literature on AI and its applications to the SDGs, studies that deeply integrate AI methods with SDG-related research remain surprisingly sparse. Most research focuses either on the technical aspects of AI or on addressing specific SDGs. The intersection where AI tools are applied to solve complex sustainability challenges or contribute to SDG attainment has hardly been realized to date.

Focus on local studies and gaps in social sustainability

There has been a substantial increase in the number of annual publications on sustainable development research using AI, particularly since 2019. This surge is consistent with broader trends in sustainability and AI research, as evidenced by some 600,000 articles published on sustainability (article includes ‘sustainab*’) and 400,000 articles published on ‘artificial intelligence’ in the first 10 months of 2024 alone (Scopus query 3 November 2024). A literature review covering the period 1990 to 2014 first documented the rapid rise of AI-related research, driven in part by third-party funding and increasing global interest20. This trend underscores the high relevance of both sustainability and AI as research domains.

While AI research output is positively correlated with third-party funding availability in earlier research20, this pattern was only partly confirmed in our study. For example, the prominence of AI-based watershed modelling in Spain, India and Iran reflects well-established research traditions21,22. In Spain, highly cited publications on AI by Spanish scholars have been documented since the 1990s20. Similarly, Italy’s notable focus on AI-based research on SDG 3 (good health) is related to several (national) initiatives that encourage data sharing and collaboration23,24. China’s AI strategy, embedded in its New Generation of Artificial Intelligence Development Plan until 2030, prioritizes economic development via megaprojects25. However, our review finds that most Chinese publications focus on climate change, clean energy and education, which clearly suggests a deviation from purely economic goals.

Despite this progress, the United Nations‘ 2023 and 2024 Sustainable Development Goals Reports highlight the lack of progress on Agenda 2030, with half of SDG targets off-track and insufficient data for most goals in 202326,27. The United Nations has advocated that AI support the SDGs26, but gaps remain. For example, AI is widely used in health, education and environmental modelling, but its use in poverty reduction (SDG 1) is minimal. We found only seven reviews and no empirical studies in the most cited literature applying AI to SDG 1 (no poverty). This finding is striking given that 575 million are projected to live in extreme poverty in 203026. Research on poverty relies predominantly on qualitative approaches or analyses of demographic data, for example, ref. 28, with few examples of AI-driven approaches for poverty prediction tools29.

The proliferation of machine-learning, deep-learning and evolutionary algorithms over the past decade has had a profound impact on environmental sustainability research. These methods excel at processing large-scale, sensor-based and image-rich datasets, making them particularly effective for tasks such as vegetation monitoring, water resource management and clean energy optimization. For example, they are used to track vegetation changes through satellite imagery, predict water levels and optimize renewable energy grid performance30. Evolutionary algorithms in particular have proved useful in complex optimization problems, such as wind farm layouts and solar panel configurations, while balancing environmental and economic objectives31.

In contrast to its environmental applications, AI remains underutilized in areas of social sustainability, such as policymaking, education for sustainable development and social equity, which are critical to the SDGs. This disparity, apart from the historical overrepresentation of some research fields32, is due to several challenges:

  1. (1)

    Complexity of social data: social systems involve qualitative and contextual variables that are difficult to model qualitatively.

  2. (2)

    Ethical constraints: privacy laws and ethical concerns limit access to human behavioural data.

  3. (3)

    Supervised learning limitations: many AI methods require large, labelled datasets that are sparse and context-dependent in political and social domains33.

These limitations contribute to notable gaps in AI applications for critical dimensions of sustainable development, such as SDG 10 (reduced inequalities) and SDG 16 (peace, justice, and strong institutions). Addressing these gaps requires more inclusive data frameworks that account for qualitative and context-rich variables, greater interdisciplinary collaboration to integrate AI with social sciences, and algorithmic innovations tailored to the complexity of social systems5. Our analysis reveals a gradual increase in the body of literature on AI in sustainable development research. However, many AI users tend to adopt techno-optimistic or ecomodernist perspectives, or align with other viewpoints that position technology as a great leap towards solutions. Correspondingly, working on social sustainability solutions with AI may require a deeper mindset shift3.

Disciplinary AI techniques by few research communities

The regional scope of most of the empirical articles in this study reflects the thematic focus and methodological nature of the research. For example, many studies deal with regional watershed assessments, for example, refs. 34,35. Other region-specific applications range from evaluating AI tools in education, such as single learning platforms, for example, ref. 36, to optimizing energy consumption in greenhouses37. By contrast, the few global empirical studies that we found primarily address system optimization, data mining and remote sensing, for example, refs. 38,39. However, the limited number of global studies may be due to the complexity and the high computational demands of AI methods, which remain a substantial barrier to large-scale applications40.

Although sustainable development inherently requires transformative, longitudinal research (Fig. 2), most empirical studies focus on the present and use quantitative approaches. This is consistent with previous observations in health-care research, where AI-powered studies rarely address temporal dynamics41. Furthermore, SDG terminology often functions as a rhetorical device rather than as the basis for actionable insights or transformative change42. Our findings suggest that the connection between AI methods and sustainable development research remains nascent, characterized by experimental applications and buzzword use rather than substantive contributions to sustainability goals.

The current research landscape is highly fragmented, with a clear disciplinary bias towards forecasting and optimization in technical areas such as water resource management, vegetation monitoring, energy systems and pollution control. This reflects the ongoing experimental phase of AI development, as researchers continue to explore its potential to advance the SDGs5. However, emerging breakthroughs in natural-language processing applications such as ChatGPT and other generative AI technologies are expected to shift research priorities. In the coming years, AI applications are likely to expand into the social sciences, psychology and education, enabling more nuanced investigations of societal changes43,44.

Disciplinary focus within the SDG perspectives

The disciplinary divide in sustainable development research, as illustrated in Fig. 2, is a well-documented phenomenon. While a growing body of transdisciplinary literature challenges this divide, particularly in the context of AI45, two distinct patterns emerge in our analysis. First, there is a clear distinction between studies that focus on the application of AI and those that use AI as a methodological approach to advance knowledge on sustainable development. For example, prediction is a common technique in both smart agriculture and clean energy research, but the former emphasizes remote sensing for image detection, while the latter focuses on grid and physical systems optimization46,47.

Most SDGs are directly represented in the groups that we find, including SDG 3 (health), SDG 4 (education), SDG 6 (water), SDG 7 (clean energy) and SDG 15 (life on land). Despite the extensive use of AI for many years in these areas, the conceptualization of sustainability in these studies remains weak (Supplementary Fig. 4). This narrow framing reflects the disciplinary silos from which causal reasoning is derived, often at odds with the solution-oriented, systems-level agenda in sustainability science48.

Achieving SDG 17 (partnership for the goals) will require inter- and transdisciplinary research pathways across all SDGs, potentially catalysed by the diverse applications of AI explored in this study. Examples include AI’s role in fostering collaboration, supporting data integration, and bridging disciplinary boundaries to address complex sustainability challenges.

However, notable gaps remain. AI-based research on poverty (SDG 1) and gender (SDG 5) is underrepresented (Supplementary Fig. 1). For poverty, we found only seven review articles, while for gender there were eight articles (five empirical and three conceptual or review). Despite its foundational importance27, poverty is often framed in terms of economic welfare49 or as an implicit component of broader sustainable progress metrics such as GDP or inclusive welfare50. In particular, smart agriculture is presented as a ‘pathway out of poverty’ in rural contexts51.

Gender (SDG 5) research has focused primarily on bias in research itself, such as, for example, in a study on gender representation in Canadian AI research52. In health care, AI algorithms often do not account for sex and gender bias53, limiting their equity and effectiveness. Addressing such disparities is critical to advancing social equity and inclusivity in AI applications across sustainable development domains. Certain SDGs, such as those related to industry and consumption, have long been closely associated with AI and data-driven analytics, even before the formal introduction of the SDGs. As a result, these SDGs are covered by a more extensive body of scientific literature. By contrast, other SDGs, such as poverty and gender equality, do not have a similarly well-established tradition of AI-driven quantitative data analysis. While relevant data sources exist for areas such as gender research, addressing this research gap remains a future challenge for the communities working on these specific SDGs.

Limitations of our study

Our analysis has several limitations, which stem from the scope and methodology of our review. First, our selection of articles was constrained by search terms that aimed at the intersection between AI and the SDGs. Since the SDGs represent a political framework and a policy compromise, many articles that explore the broader intersection between sustainability and AI may not have been included in our dataset. Second, our focus on the most cited articles introduces a potential bias towards well-established research, which may miss emerging studies with lower citation counts (Supplementary Information I and Supplementary Tables 4 and 5). Third, both the SDG as well as the AI communities often publish findings through policy reports or conference proceedings, which creates delays associated with the duration of the peer review process. This leads to a lag in the visibility of emerging patterns in the journal-based literature. Despite these limitations to our analysis, we believe that the patterns identified in our review are robust and unlikely to change significantly with the inclusion of more previously published literature or grey literature. However, future developments in the field may naturally refine or extend the findings presented here.

Our work provides a solid foundation to enable a more substantiated discussion about the current state of the art of the intersection between AI-focused and SDG-focused scientific literature. We present the patterns in the literature published to date to contribute to a change in the future literature. We highlight gaps and a lack of deeper ties between the two emerging research fields. The main aim of our review is thus to present an overview of the current literature, ideally to stimulate the discussion on AI and sustainability and to highlight where new lines of thinking need to emerge. It is clearly beyond this review to provide a more in-depth discussion on future trajectories. However, we hope to contribute by engaging in a critical debate on a larger integration.

Limitations of the current literature and outlook

The literature reviewed broadly reflects the three pillars of sustainability: social, environmental and economic54. However, notable gaps remain in the application of AI to specific SDGs. Specifically, SDG 1 (no poverty), SDG 5 (gender equality) and SDG 17 (partnership for the goals) remain severely underrepresented. These goals are foundational to the SDG framework, highlighting the interdependence between economic progress and societal well-being55. Their relative neglect in AI and sustainability research exacerbates climate injustice and undermines the capacity for long-term and inclusive collaboration across disciplines and between science and society.

The uneven distribution of articles across SDGs is partly due to the different definitions and applications of AI in the current literature. Many studies adopt a broad or undefined understanding of sustainability, often using the term without providing a clear definition. This lack of conceptual clarity reflects a broader problem: most studies are limited to systems knowledge, failing to address the normative or transformative dimensions that are central to sustainability science. Ethical considerations related to AI are clearly important for both research and policy, but they are almost entirely absent from the research reviewed. Beyond this review, they remain highly fragmented and insufficiently integrated. Prominent ethical debates typically revolve around the optimization of energy use, so-called ‘Green AI’, the growing energy demand due to AI, the efficiency of production and consumption processes, and the accessibility and confidentiality of training and test data56,57,58. However, there are deeper and often more tacit debates about national AI systems, their use, and the competitive dynamics between countries. Moreover, broader concerns about the potential risks of AI—such as its ability to control and disempower citizens, or even pose existential threats to humanity—are also central to these debates59,60,61. Although the data sources examined in this review underscore the importance of these issues, a comprehensive and substantial body of literature addressing them remains largely absent from the available literature. This omission contrasts sharply with the normative and values-driven nature of sustainability, which seeks to address profound societal challenges through inclusive and ethical solutions.

In addition, AI itself poses potential risks to sustainability, such as its high energy consumption and other environmental impacts, which can exacerbate challenges such as climate change. This highlights the responsibility of researchers to ensure that AI applications are consistent with sustainability principles and make a positive contribution to our common future.

While research on the intersection between AI and sustainability has grown exponentially in recent years, many SDGs remain underexplored. Current research often represents an innovative but opportunistic starting point, with limited integration across disciplines or transformative contributions to sustainability goals. To address these gaps, a more integrated research agenda is needed—one that emphasizes interdisciplinary collaboration, ethical considerations and the normative aspects of sustainability. Such an agenda has the potential to advance the role of AI as a transformative tool for achieving sustainable development in the years to come.

Methods

Our methodological approach is based on a content analysis as used in previous bibliometric analyses12,62,63,64,65,66. The approach largely follows standard procedures for accessing and filtering literature databases, and employs a unique set of multivariate statistical full-text analyses for pattern recognition, which are described in the following.

Data collection and coding

To investigate the interplay between sustainable development research and the use of AI, we queried metadata from the Scopus database in January 2024 for each of the 17 SDGs on the basis of the following criteria: (1) use of the term ‘AI’, ‘A.I.’ or ‘Artificial Intelligence’ in article title, abstract or keywords; (2) occurrence of the term ‘sustainab*’; (3) use of SDG-specific conceptual vocabulary; (4) publication as a peer-reviewed scientific journal article; and (5) publication in English language (see overview in Supplementary Table 2). We adapted the search strings for each SDG on the basis of the respective SDG description texts to produce sufficiently many relevant hits from the database, resulting in a total number of 14,423 articles. See Supplementary Information I for a detailed list of search strings in Supplementary Table 3 and coding scheme in Supplementary Table 6, and Supplementary Data 1 for the list of 792 investigated articles.

For each SDG, we sorted the Scopus hits according to citations per year. Two reviewers each screened the abstracts to ensure the inclusion of articles that contribute to the discourses on AI, sustainability and the respective SDG beyond buzzwords. Our target was to include the 100 most cited articles per SDG. Articles with more than 30 citations started to rapidly increase from 2018 and peaked in 2021 (Supplementary Table 4). While fewer recent articles (for example, 2023) were expected to be included due to our sampling approach, some highly cited articles from 2023 were included nonetheless (Supplementary Table 5). In cases where only a few articles were published on a particular SDG, the target of 100 articles was not met. The hits for SDG 17 did not yield any tangible results; we therefore had to exclude it from the review process entirely. We obtained PDFs of all accessible articles, and to ensure inter-coder reliability, we conducted the full-text analysis again in reviewing teams. We further excluded articles that did not meet the initial inclusion criteria in the full-text analysis but replaced them from the hit list to maintain 100 articles per SDG when possible.

The reviewing teams discussed articles that appeared as duplicates in multiple SDGs and reassigned them to the SDG with a better thematic fit. In this step, we did not replace the reassigned articles but reduced the total number per SDG because the reassigned articles remained in the study.

On the basis of an initial screening of ten articles per SDG, we developed a coding scheme that was further adapted during multiple rounds of implementation and refinement with all authors. This inductive approach resulted in a robust set of categories (Supplementary Table 6), which we then applied to all articles. We distinguished between empirical, conceptual and review articles. We combined the set of conceptual and review articles into one category, because many of the reviews on artificial intelligence and sustainable development are non-systematic and closely related to conceptual papers, but both differ in their approach and structure from empirical articles.

Data analysis

We further analysed the empirical literature (n = 393) and review and conceptual articles (n = 399) using a multivariate statistical full-text analysis63. For this analysis, we extracted words from the PDF files using Python and filtered for nouns only. We then put these words into text files to import them into R. Articles with extreme word counts were removed (Supplementary Figs. 11 and 12) because they formed separate clusters due to word count rather than meaningful word abundance patterns. This ensured clustering was based on shared abundance distributions rather than text length artefacts (see the Extended Methodology Description in Supplementary Information for more details).

For the empirical articles, we removed 10 articles due to extreme word count and included only the 20% most frequent words to reduce noise and sparsity, resulting in a list of 1,579 words out of 7,990. For the review and conceptual articles, we removed 16 articles due to extreme count and, by also considering the 20% most frequent words, the final sample of words resulted in 1,948 words out of 9,877.

Assuming that different research communities use a unique set of conceptual vocabulary, groups within the literature were differentiated using hierarchical cluster analysis based on a list of conceptual vocabulary that was obtained from the filtered list of words. We then used an indicator species analysis11 to find significant indicator words for all respective groups. These groups were further separated until no significant indicator words supported any more groups. On the basis of the total set of words, we performed a detrended correspondence analysis to reduce the multivariate corpus and visualize it on the two primary ordination axes. Within this multivariate space, significant indicator words were restricted to five words per group to allow the visual clustering of the groups. The method uses minimum variance as the criterion for clustering articles, hence minimizing variance within groups while maximizing differences between groups12. A detailed description of the analysis is part of the Supplementary Information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.