Developer engagement in open-source software’s green transition

Vaccargiu, Matteo; Aufiero, Sabrina; Bartolucci, Silvia; Neykova, Rumyana; Tonelli, Roberto; Destefanis, Giuseppe

doi:10.1038/s44458-026-00050-w

Download PDF

Article
Open access
Published: 06 March 2026

Developer engagement in open-source software’s green transition

Matteo Vaccargiu^1,2,
Sabrina Aufiero³,
Silvia Bartolucci ORCID: orcid.org/0000-0003-1127-5600³,
Rumyana Neykova²,
Roberto Tonelli¹ &
…
Giuseppe Destefanis³

Communications Sustainability volume 1, Article number: 41 (2026) Cite this article

1727 Accesses
10 Altmetric
Metrics details

Subjects

Abstract

Software development plays a central role in digital sustainability, yet developers’ role and engagement remains understudied. Here we analyse nearly a decade of developer discussions available on the code repository Github on Ethereum, a widely used open-source blockchain platform. Using topic modelling, with interpretation supported by large language models and a sustainability framework for software systems, we trace how economic, environmental, social, individual, and technical sustainability themes emerge and evolve over time. We find that sustainability awareness, particularly related to energy efficiency and cost, intensifies during key events such as the transition from proof-of-work to proof-of-stake consensus, which substantially reduced energy use. We identify influential contributors and thematic specialisation, providing a transferable framework for understanding sustainability in emerging developer communities. These findings highlight the role of developer discourse in shaping sustainable software ecosystems and integrating sustainability into open-source development.

Evolving collaboration, dependencies, and use in the Rust Open Source Software ecosystem

Article Open access 16 November 2022

Sustainable professional growth through digital mentorship: evidence from language teachers in low-resource settings

Article Open access 04 December 2025

Examples of shifting development pathways: lessons on how to enable broader, deeper, and faster climate action

Article Open access 15 December 2022

Introduction

Modern society increasingly relies on complex digital infrastructures—ranging from cloud computing and artificial intelligence to decentralised platforms as blockchains—that support critical services across finance, healthcare, energy, logistics, and governance. While these technologies promise increased efficiency, transparency, and innovation compared to legacy systems, they also introduce growing concerns about their environmental impact, resource consumption, and long-term sustainability. The energy demands of data centres, algorithmic optimisation at the expense of social equity, and carbon-intensive consensus protocols highlight a pressing global challenge: how can digital systems evolve sustainably in line with societal and environmental goals? In this article, we focus on developers’ discussions as a measurable signal of how sustainability considerations surface, interact with technical trade-offs, and spread within an open-source software (OSS) ecosystem. We adopt a broad conceptualisation of sustainability, which encompasses environmental, economic, social, individual, and technical dimensions. This approach recognises that issues such as efficiency, cost, and resource use are often intertwined, and that environmental considerations often emerge through technical or economic discussions.

Among these technologies, blockchain has emerged as both a symbol of decentralised innovation and a focal point in debates over digital sustainability and technological environmental footprint. Relevant applications—from cryptocurrencies^1,2 to supply chain traceability³—have driven adoption and institutional support and investments in recent years. Yet, the energy demands of proof-of-work (PoW) blockchains have drawn sustained criticism due to high electricity consumption, associated greenhouse-gas emissions and resource inefficiencies such as obsolescent hardware that contributes to e-waste^4,5. At the same time, digitalisation itself has the potential to generate resource savings and reduce emissions by improving process efficiency, optimising logistics, and reducing the need for physical transactions and documentation⁶. This dual potential-of both environmental cost and benefit-supports the importance of understanding how developers negotiate sustainability trade-offs in their technical decisions.

These concerns place blockchain at the heart of wider discussions on how software architectures and digital protocols can align with climate action and responsible innovation agendas⁷. Ethereum is a particularly informative setting because its consensus algorithm transition from Proof of Work to Proof of Stake – the so-called Merge–constitutes a structural change with sustainability implications. While the Merge represents a major architectural transformation, it was achieved through a sequence of incremental steps (See the full deployment timeline https://www.spydra.app/blog/ethereums-upgrade-journey-a-timeline-of-innovation) that collectively redefined Ethereum’s consensus mechanism. This staged evolution provides a valuable lens for examining how sustainability-oriented decisions emerge through continuous negotiation and adaptation rather than abrupt system’s redesign.

While policymakers, industry leaders, and researchers debate regulatory and technical solutions, a critical perspective is often overlooked: the role of developers in shaping sustainable digital futures. Developers’ daily decisions influence how technologies evolve, which trade-offs are prioritised, and how sustainability considerations are embedded (or neglected) within software ecosystems. We, therefore, study what developers discuss, when they discuss it, and how these discussions organise socially, moving beyond frequency counts to the structure of interaction.

Since its introduction, blockchain technology has witnessed rapid growth and widespread adoption, attributed to its key features such as security, transparency, immutability, and traceability⁸. These features have driven the adoption of blockchain technologies in a wide range of industries. However, the surge in popularity of cryptocurrencies, especially Bitcoin, along with other blockchain implementations, has raised significant concerns regarding their environmental footprint. This is primarily due to the increased greenhouse gas emissions and substantial energy consumption associated with their operations⁹. Such environmental concerns have sparked ongoing debates within both scientific communities and industry practices about the overall sustainability of blockchain and distributed ledger technologies¹⁰. A particular point of contention is the extensive energy requirements driven by the computational processes they employ, notably the so-called PoW algorithm. This algorithm, commonly used in cryptocurrency mining, demands considerable computing resources¹¹, highlighting the urgent need for a reassessment of the energy efficiency of these technologies¹².

This study aims to understand how developers prioritise and address sustainability concerns in their projects. These discussions highlight practical challenges in implementing sustainability measures and can indicate future trends toward sustainable practices, revealing the tension between technical optimisation and environmental concerns, providing context for policymakers and researchers. Understanding these conversations can inform strategies to promote eco-friendly practices in blockchain development and influence the future environmental impact of blockchain technology.

Using topic modelling techniques, we analyse a dataset of textual data of issues and comments from Go-Ethereum. Go-Ethereum ("geth”) is the widely used Ethereum client written in Go and a core reference implementation in the ecosystem; its issue tracker contains design discussions, trade-offs, and maintenance debates that precede deployment decisions. The analysis first establishes whether sustainability actually emerges as an identifiable theme within developer discussions, then it tracks how interest in sustainability has changed across the project’s timeline, and if developers tend to join issues whose themes resemble those they have engaged with previously. Our goal is to develop a descriptive and transferable framework for analysing how sustainability considerations emerge and evolve within open-source software communities. Using Ethereum as a proof-of-concept, we combine topic modelling, large language models, and network analysis to capture the dynamics of developer interaction and discourse. The methodological approach is designed to be readily adaptable to other socio-technical ecosystems-from blockchain and cloud infrastructure to open AI development-by modifying domain-specific keywords and prompt definitions while retaining the core analytical pipeline.

This study extends our previous work titled Sustainability in Blockchain Development: A BERT-Based Analysis of Ethereum Developers’ Discussions¹³. This article makes three major advances over our prior study and related work: (1) Transparency and replicability - we fully document the prompting strategy, preprocessing, and decision rules used in the BERTopic (Bidirectional Encoder Representations from Transformers-based topic modelling) + Large Language Model (LLM)-assisted pipeline, enabling exact reproduction; (2) Robustness of labelling - we report a manual check from the authors of BERT/LLM-assisted labels, including Fleiss’s kappa coefficient; and (3) Network perspective- we construct developer-issue and developer-developer networks to analyse community structure and roles, revealing how sustainability discourse is organised and spreads through developer interactions and community structure.

The paper is organised as follows: in Section 2 we present the existing literature connected to this study; in Section 3 we discuss the methodology. The results are presented and discussed in Section 4 and possible threats to validity are acknowledged in Section 5. Finally, in Section 6, conclusions and future research developments of this topic are discussed.

Related Works

The analysis of sustainability and energy consumption of blockchain technology has attracted significant attention from the scientific community^14,15,16,17. Qin and Gervais¹⁸, Arshad et al.¹⁹, and Asif et al.²⁰ provide general analysis related to energy consumption and sustainability in Ethereum. The authors discuss platform’s consumption, possible solutions and challenges to improve it, as well as the transition from PoW to the so-called Proof-of-Stake (PoS) protocol. An investigation about agile blockchain-oriented software development principles and sustainability software design principles was conducted by Pinna et al.²¹, where they present a new Agile method for the development of blockchain-oriented systems that includes sustainability awareness practices within the development phases, in particular in the requirements and the acceptance tests. Eligüzel²² presents a study on the relationship between blockchain technology and sustainability through a descriptive literature review, using topic modelling and clustering method of latent semantic analysis (a social spider optimisation technique) on the corpus of 1069 articles extracted from Scopus. Another literature data mining work was carried out by Liu et al.²³, where 759 articles extracted from Web of Science related to blockchain technology in sustainable financial field were analysed by keyword analysis, bi-clustering algorithms, and strategic coordinate analysis to explore the hot topics in this field and predict the trend of future sustainable development. Also, related to the sustainability of blockchain with applications in finance, a comparison using a holistic approach between the old and new ways financial transactions was conducted by Stamoulis²⁴, analysing their sustainability performance. Ayman et al.²⁵ created a topic analysis model considering data extracted from the smart contract developer community on Stack Overflow, providing information on the topics and issues discussed between the users.

Other studies on blockchain topic analysis are more focused on using BERT-based models to analyse generic blockchain-related topics with data extracted from social media such as Reddit²⁶, from forum such as CoinDesk²⁷, or based on abstracts gathered from USPTO patents²⁸, highlighting in both cases the benefits of using an Natural Language Processing (NLP)-based bidirectional encoder BERT textual analysis approach to examine technological knowledge and relationships within the field of blockchain technology.

Complex networks approaches to analyse blockchain systems have been widely used, e.g. in the context of understanding crypto markets^2,29, investment networks¹, transactions patterns³⁰, and smart contracts dependencies^{31,32,33,34,35,36}. Regarding the analysis of the network of Ethereum developers, previous literature has focused on the interplay between developers’ sentiment on Github and prediction of returns in crypto markets^37,38,39 and on the effect of external events on developers’ interaction based on the analysis of issues and comments extracted from GitHub⁴⁰. A recent application also analysed Decentralised Finance communities to detect centralisation issues in governance⁴¹. Other approaches similar to ours focus more generally on the analysis of Open-Source Software (OSS). Some studies of GitHub issues and comments related to OSS are proposed by Mumtaz et al.⁴² and Jamieson et al.⁴³, who considered also commits related to Decentralised Web communities. They analyse respectively 13 and 52 projects, aiming to exam the social smells in the software teams before and after the introduction of this new feature.

In this context, our work proposes an approach based on topic and developers’ network analysis of a particular OSS example such as Go-Ethereum, providing relevant insights for the analysis of similar ecosystems.

Methodology

This study proposes a reliable and reproducible NLP-based approach to analyse discussion topics. The focus is on identifying and providing insights into the most frequently addressed themes by Ethereum developers in GitHub issues and comments, especially those pertaining to sustainability. Figure 1 shows the steps followed for the dataset generation, topic modelling and interpretation, and network construction. They are explained in detail in the next two subsections.

Dataset Overview and Statistics

The dataset employed in this research was obtained from GitHub, and centres on discussions related to Go-Ethereum spanning from January 2014 up to May 2023. Our analysis focuses on the text of issues and comments linked to them, encompassing a total of 15,954 issues and 50,023 comments. For every issue, the dataset captures several pieces of information, including an ID, the name of the author, the count of comments, the date of the first posting, the date of the latest update, and the full text of the issue. In a similar manner, each comment is detailed by the author’s ID, the ID of the associated issue, the date it was created, and the text of the comment itself. In Table 1 we show an example of issue and in Table 2 we present an example of comment. We have conducted data pre-processing to ensure data quality, removing NaN values and stop words.

Table 1 Github issue example.

Full size table

Table 2 Github Comment Example.

Full size table

Figure 2 shows the year-by-year breakdown of issues posted on GitHub. Omitting 2014, the study’s start year, and 2023, for which data is only available until May, the yearly distribution of issues is fairly even, justifying the following analysis given by year.

Figure 3 shows the distribution of the lifespan, in years, of closed issues. The median time to resolution is 6 days, showing that half of the issues were addressed within a week. Additionally, 75% of the issues were closed in 83 days or fewer. Out of the total 15,954 issues, only 241 are still open. These open issues generally last much longer than the ones that have been closed, with the median open time showing that half of them stay unresolved for at least 125 days. This may indicate that the issues that remain open tend to be either more complex or contentious.

To each issue, we assign a measure of discussion intensity based on the number of comments associated with it, suggesting information about the complexity or importance of the issue within the development process. The distribution of this metric is plotted in Fig. 4. It is important to note that while a high number of comments can indicate significant developer interest, it may also reflect the complexity or contentiousness of an issue, or simply a prolonged resolution process. Figure 4 shows that 75% of the issues receive 4 comments or less, indicating that most discussions are brief, but a few require significant community attention and take longer to address.

**Fig. 4: Issue’s Discussion Intensity.**

We analysed user activity to identify the five most active contributors based on the number of issues they initiated. The top contributors, with user IDs 129561, 142290, 6915, 5959481, and 6264126, opened 1208, 863, 669, 455, and 426 issues, respectively. These individuals are also among the top commentators, showing their substantial engagement with the platform. We calculated user lifespan as the time from their first to last recorded activity. On average, users are active for 200 days, but the median lifespan is only 1 day, indicating most users post a single comment and then stop participating. This highlights the brief nature of participation for many users in this ecosystem^44,45.

Topic Modelling

Topic modelling uses NLP techniques and probabilistic algorithms to identify topics from text. We used the BERT (Bidirectional Encoder Representations from Transformers) model⁴⁶ due to its advanced language context handling, efficiency with short texts, and minimal hyperparameter tuning. By employing class-based TF-IDF (c-TF-IDF)⁴⁷, BERT effectively associates words with relevant topics, improving clarity and interpretability. This approach produces topics identified by keywords, each with a likelihood score. We used BERTopic, which uses BERT’s contextual embeddings to analyse topics. The model converts text into uniform hidden representations. Initial trials on the full dataset produced ambiguous results with over 160 varied topics, leading us to adopt a semi-supervised zero-shot method for more precise topic identification. A semi-supervised zero-shot approach is a hybrid technique that combines the advantages of both supervised and unsupervised learning but does not require explicit examples of every category for training. Instead, it uses known keywords for some data points to infer the classification of unlabelled data, even for categories not seen during training. Employing this technique facilitated a more directed analysis, allowing us to guide the model through the incorporation of issue titles and the specific topics pertinent to our study, thus enhancing the model’s ability to classify discussions into relevant and previously undefined topics. We applied KeyBERT (https://maartengr.github.io/KeyBERT/api/keybert.html)⁴⁸, a tool developed specifically for extracting keywords using BERT. This process involves feeding the text from issues and comments into KeyBERT, which then identifies and returns a set of keywords for each document. These keywords are selected based on the phrases within a document that most closely match the document’s overall content. The resulting collection of keywords forms a vocabulary that is used as input for CountVectorizer, a component of Sklearn, aiding the BERTopic model in its identification of topics. Furthermore, as described in⁴⁹, employing a Zero-Shot classification model with unlabelled data enhances the accuracy of topic extraction. The model was created using the candidate topics in Table 3.

Table 3 Topic keywords.

Full size table

They have been chosen based on the literature review provided in Sec. 1 and Ethereum transition from PoW to PoS “The Merge” (https://ethereum.org/en/roadmap/merge/), which significantly reduced energy consumption and gas emissions⁵⁰. The thematic keywords guiding topic modelling were defined inductively from the dataset and organised according to the five SusAF dimensions (economic, environmental, social, individual, and technical). This keyword set operationalises sustainability concepts within the Ethereum context but can be readily adapted for other domains. By modifying the keyword list and LLM prompt definitions, the same pipeline can be applied to different open-source ecosystems, enabling comparative analyses of sustainability engagement across software communities.

For each experiment, the embedding model used was BAAI/bge-small-en (https://huggingface.co/BAAI/bge-small-en) which performs well with technical texts and comments as shown by Blanco-Cuaresma et al. in ref. ⁵¹. For the representative model we used facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli).

To evaluate the effectiveness of our topic model, we used a method that combines topic interpretation with calculating the coherence score c_v as outlined by Roder et al.⁵². The coherence score yielded a value of c_v equal to 0.66, which is generally considered to indicate a satisfactory level of coherence. However, the model generated a total of 165 topics, a figure deemed excessively high. The emergence of numerous small clusters prompted a subsequent experiment with the min_topic_size = 30 parameter set, aiming to establish a minimum size for each topic cluster. In this revised model, the c_v coherence value of 0.67 was maintained, but the number of topics decreased to 58, making the results easier to interpret.

A manual review of the topics validated the model’s findings, showing them to be consistent.

Topic Interpretation

In the process of topic interpretation, BERT generates a list of 10 keywords for each topic, accompanied by a probability score for each keyword. This score signifies the relevance of the specific word to the given topic. The interpretation was carried out using a dual approach: initially, Chat-GPT 3.5 analysed the keywords and their respective probabilities to identify the subject of discussion. This automated interpretation was subsequently validated through a manual review by the authors^53,54.

The use of Chat-GPT 3.5 for topic interpretation was motivated by its ability to understand the context and meaning of the keywords, similar to how a human would interpret them. We tailored a prompt that instructed the tool to generate topic labels based on the provided keywords and their probabilities, ensuring a focused and relevant interpretation process. While keywords alone may not always capture the full scope of a topic, they serve as a strong foundation for both human and machine interpretation. We used the prompt in Table 4.

Table 4 Topic Interpretation prompt.

Full size table

Colavito et al.⁵⁴ demonstrated the effectiveness of GPT-like models for automated labelling tasks of issues without the need for fine-tuning. The study showed substantial agreement between GPT-like models and human annotators, suggesting that these models can be used to reduce the costs associated with manual annotation.

To ensure the accuracy of the Chat-GPT 3.5 interpretations, we also conducted a manual review process, examining the assigned labels and comparing them with the keywords and their probabilities. The manual validation step allowed us to confirm that the labels accurately reflected the underlying topics.

After analysing the full list of topics, we focused on isolating those pertinent to sustainability. This selection was based on the labels assigned to each topic by Chat-GPT and the top 10 words for each topic generated by BERTopic. A topic was flagged as relevant to sustainability if its labels or keywords matched any of the predefined subjects of interest in the zero-shot model. Following the identification of all topics associated with sustainability, we conducted a thorough manual review to validate the findings.

Three authors independently classified all 58 topics as sustainability-related or not, based on the top 10 keywords generated by BERTopic. Each author examined the keywords and assigned classifications without access to the others’ responses or the ChatGPT-generated labels initially. After excluding Topic -1 (“Undefined”), 57 topics were analysed. Inter-rater agreement was 64.9% (Fleiss’ κ = 0.509), indicating moderate agreement according to standard interpretation guidelines⁵⁵. The three authors disagreed on 20 topics (35.1%), which were adjudicated by a fourth author. Pairwise agreement between annotators ranged from 71.9 to 82.5%, demonstrating consistent evaluation across all reviewers. The adjudication process resulted in 10 of the 20 conflicting topics being classified as sustainability-related, yielding a final count of 23 sustainable topics (40.4% of all topics). Table 5 presents the complete inter-rater agreement statistics. Disagreements primarily occurred on topics where sustainability implications were implicit rather than explicit. For instance, topics involving purely technical implementation details without clear resource optimisation or cost implications were sources of disagreement. The fourth author applied the SusAF framework systematically, requiring clear alignment with at least one sustainability dimension (economic, environmental, social, individual, or technical) for classification as sustainability-related.

Table 5 Topic validation.

Full size table

Sustainability Awareness Framework (SusAF)

The Sustainability Awareness Framework (SusAF)⁵⁶, developed by the SUSO Academy and inspired by the Karlskrona Manifesto for Sustainability Design⁵⁷, is a structured methodology designed to help software teams assess and improve the sustainability of their products and practices. It encourages a holistic evaluation across five key dimensions—environmental, economic, social, individual, and technical—while considering impacts at three levels: immediate (during development), enabling (through use), and structural (long-term systemic effects). Traditionally applied in participatory workshop settings, SusAF helps teams reflect on the broader consequences of their work and integrate sustainability into development cycles. In this study, we extend the use of SusAF in a novel way: rather than applying it as a design tool, we use it as a conceptual framework to classify and extract sustainability-related topics from textual developer discussions. By aligning our topic modelling pipeline with SusAF’s five dimensions, we introduce a structured approach to identifying how sustainability concerns surface in open-source discourse. It demonstrates how SusAF can support textual analysis, offering a scalable method to trace sustainability awareness in large-scale software communities. For each topic, we matched the associated keywords with the definition of each dimension in the framework. To ensure better control and generalisability in the manual evaluation, the process was independently conducted by three different authors. The results were then compared and validated by a fourth author. Overall, this framework supports a multidimensional view of sustainability in software development contexts. In our analysis, the environmental dimension is interpreted in a broad sense, covering explicit discussions of energy use and emissions together with themes related to computational efficiency, resource use, and cost management. For example, debates on gas fees and transaction costs were often described by developers as attempts to improve efficiency and reduce energy demand, showing how environmental, technical, and economic aspects often intersect.

Sustainability Network Analysis

For our network analysis, we began with a dataset of 1,468 developers and 2,752 issues, constructing a bipartite network with developers as Layer 1 and issues as Layer 2. Links between layers were established based on developers’ comments on issues, resulting in an undirected and unweighted initial network. After removing 473 issues without comments, we focused on the remaining 2279 issues.

We created a bi-adjacency matrix where rows represented developers and columns represented issues, with binary entries indicating whether a developer commented on an issue. The network density is 0.14%, with an average of 3.8 issues commented on per developer. Next, we construct the projections of the bipartite network on Layer 1 (developers) and Layer 2 (issues) to characterise developers’ relevance and thematic importance of issues.

First, we transformed the bi-adjacency matrix into a 1468 × 1468 adjacency matrix representing connections between developers sharing at least one issue in common. To identify influential developers, we calculated four centrality measures: degree centrality (number of direct connections), betweenness centrality (frequency of appearing on shortest paths between nodes), closeness centrality (average shortest path length to all other nodes), and eigenvector centrality (influence based on connections to other influential nodes)^58,59. Developers were ranked based on these measures to identify key contributors to sustainability discussions.

Next, we transformed the bi-adjacency matrix into a 2279 × 2279 adjacency matrix representing connections between issues. Each entry in this matrix indicated the number of developers who commented on both issues, resulting in a weighted and undirected network. This transformation was achieved using the dot product of the bi-adjacency matrix transpose.

For visualisation, the network displayed issues as nodes and edges representing shared developer comments, with edge thickness proportional to the number of shared comments. We applied a colour-coding scheme with 23 colours for specific topics (micro-labels) and 6 shapes for SusAF measures (macro-labels). Sixty isolated issues were removed to focus on the connected component.

We applied the Leiden algorithm⁶⁰ to identify communities within the network, revealing six distinct communities of issues. For each identified cluster, we calculated in-density (proportion of connections within a cluster) and out-density (proportion of connections between a cluster and the rest of the network). These densities were compared to those of randomly generated clusters by rewiring edges (hence producing 10⁶ randomised versions of the real network) to assess the significance of the identified communities.

We then analysed the distribution of SusAF measures (EC/EN, EC/T, EN/T, I/T, S/T, T) within each community to understand which sustainability topics were prevalent. Additionally, we examined the creation year of issues within each community to track the evolution of engagement with different sustainability topics over time.

Generalisability and applicability of the framework

The framework developed in this study is designed to be transferable and reproducible across a wide range of open-source and digital infrastructure ecosystems. It combines text-based topic modelling, large language model (LLM) classification, and network analysis into a modular pipeline that can be replicated with minimal adaptation. The framework enables researchers and practitioners to trace how sustainability-related discussions emerge, spread, and evolve within developer communities.

Data collection and pre-processing

The framework requires a minimal and standardised set of input variables:

Timestamp: indicating when each contribution occurred.
Contributor identifier: a unique tag for each participant (e.g., GitHub username or forum handle).
Text content: the full body of the message, comment, or issue description.

Data can be extracted from version-control repositories (e.g., GitHub, GitLab) or from short-text discussion platforms such as Stack Exchange, Reddit, or Ethereum Research. After cleaning and tokenisation, the dataset is organised as a table with columns for timestamp, contributor, and text, enabling chronological and user-level analysis.

Thematic extraction and sustainability classification

Topic modelling is performed using a transformer-based model such as BERTopic, which identifies clusters of semantically related discussions. These topics are then classified according to the five dimensions of the Sustainability in Software Assessment Framework (SusAF) – environmental, economic, social, individual, and technical. To adapt the framework to other ecosystems, two elements are easily customisable:

Keyword dictionary: domain-specific terms reflecting sustainability concerns (e.g., energy efficiency, hardware reuse, or data governance).
LLM prompt template: a contextual instruction guiding sustainability assessment, e.g., “Does this topic address environmental, economic, or social aspects of sustainability in [project name]?”

Modifying these components allows seamless transfer of the framework to other projects such as AI development, cloud computing, or data infrastructure systems.

Network construction and engagement analysis

From repository metadata, a directed interaction network is built where (i) nodes represent individual contributors or specific comments and (ii) edges represent communication or co-participation events (e.g., issue replies, code reviews, or co-authorship). Each edge or node can be linked to sustainability-labelled topics, revealing:

Contributors most active in sustainability-related discussions.
Cross-domain collaborations and thematic clusters.
Patterns of specialisation or bridging roles across communities.

Standard network measures (degree, betweenness, modularity) quantify engagement intensity and structural cohesion.

Aggregated indicators and sustainability engagement index

Topic- and network-level outputs can be integrated into an aggregate Sustainability Engagement Index (SEI). This index can be computed as a weighted combination of:

The number and proportion of sustainability-labelled contributions.
The average centrality of contributors involved in those discussions.

The SEI can be tracked over time or compared across ecosystems to identify shifts in community attention to sustainability and emerging thematic priorities.

Extending the framework

Because the framework depends only on standardised metadata (timestamp, contributor, and text), it can be applied to any open-source or collaborative software repository. Potential applications include:

Other blockchain ecosystems on Github (e.g., Bitcoin, Solana, or Tezos).
General open source projects Github repositories (e.g., AI, cloud or data related).
Technical forums such as Reddit or Stack Exchange.

Future developments may integrate quantitative sustainability indicators (e.g., energy consumption or carbon estimates) to link discourse-level analysis with environmental performance metrics. This structured procedure turns the Ethereum case into a proof-of-concept for a reproducible and transferable methodology. By adjusting only the keyword dictionary and LLM prompt definitions, the framework can be directly applied to new contexts, enabling consistent monitoring and comparison of sustainability discourse across diverse open-source and digital ecosystems.

Results

In this section we now discuss the findings derived from the topic extraction, the sustainability evaluation and the network analysis.

The Sustainability Awareness Framework (SusAF)⁵⁶, delineates five dimensions of sustainability: economic, social, individual, environmental, and technical. The subsequent subsections will explore the topics identified within these five domains, detailing how the discussions among Go-Ethereum developers correspond to and support each sustainability dimension. In Table 6, we outline the topics identified as relevant to sustainability by the model, with these findings later confirmed through manual verification. The third column indicates the specific dimensions of sustainability each topic pertains to, which we will explore in greater depth in the next section. After setting aside discussions classified as “Undefined”, we found that 2752 out of 8665 are connected to sustainability (31.8%).

Table 6 Sustainability topics interpretation.

Full size table

Figure 5 shows how the discussions contribute to the sustainability of blockchain technology, considering the five measures defined before.

**Fig. 5: Sustainability Awareness Diagram (SusAD).**

Sustainability Evaluation

The analysis of developer discussions in Table 6, shows numerous topics of discussion and indicates that sustainability is either a primary or secondary focus in 23 out of 58 topics.

The plot in Fig. 6 presents the ratio of discussions on sustainability-related topics to the overall number of topics tackled within the same year. The focus on sustainability among Ethereum developers reached its highest in 2021, with 2023 being a close second. This spike in discussion coincides with the Ethereum Merge on September 15, 2022, when Ethereum shifted from a PoW to a PoS consensus mechanism, slashing its energy consumption by approximately 99.95%. This uptick in sustainability conversations likely occurred as the community prepared for the Merge by discussing its implementation and continued post-Merge to assess its impact on efficiency and ongoing operation. The dip in sustainability-related discussions in 2022, the year of the Merge, could be attributed to the predominance of technical issues related to the transition, which may have steered the community’s focus away from broader sustainability topics.

**Fig. 6: Percentage of Sustainable Issues per Year.**

We find 473 sustainable issues with 0 comments in a total of of 2752 sustainable issues. Within the sustainable issues discussion, we have 1468 different users that comment an issue out of 6694 users that comments all the topics issues that correspond a percentage of 21%. Of these 1468 commentators, very few are extremely active: only 8 participate in more than 100 issues, with an average of 3.8 issues they participate in per user.

Moreover, we analysed the number of comments for each sustainability issue, then we mapped each comment with the ID of the user that posted the comment for considering how many different users comment the issue over the total amount of comments. We find a correlation of 99.84% meaning that the users that comment are in general different. So the most commented issues coincide with the issues with the most commentators, meaning that the number of comments is a good proxy for issue popularity.

We now explore how Go-Ethereum developer discussions align with the five SusAF dimensions.

Economic dimension Economic sustainability focuses on value creation, customer relationships, supply chain efficiency, governance, and innovation. In the Go-Ethereum community, topics such as “Gas Price and Transaction Fees” and “Crypto Benchmarking and Performance Analysis” highlight economic sustainability⁶¹. Discussions emphasise reducing transaction fees and efficient network management to enhance adoption, user experience, and lower operational costs, thereby supporting blockchain’s economic sustainability⁶².
Social dimension Social sustainability emphasises community cohesion, trust, inclusivity, equity, and active participation. Topics like “P2P Network and Ethereum Build Process” and “Swarm Network and Manifest Management” show the importance of community engagement and collaboration⁶³, promoting a robust, inclusive, and cooperative blockchain community. “Community Support and Question Tracking” exemplifies social collaboration among developers to enhance blockchain usage⁶⁴.
Individual dimension Individual sustainability includes health, lifelong learning, privacy, security, self-awareness. Topics like “Database State Management and Trie” and “Account Security and Keystore Management” highlight privacy, security, and empowerment⁶⁵. These discussions promote problem solving and data integrity, essential for individual sustainability.
Environmental dimension Environmental sustainability focuses on resource management and reducing environmental impact. Discussions on “Gas Price and Transaction Fees” and “Benchmark Analysis and Optimisation” address reducing energy demands and environmental footprint⁶⁶. These topics emphasise waste minimisation and efficient resource use, supporting a sustainable relationship with natural resources.
Technical dimension Technical sustainability involves maintaining, adjusting, securing, and scaling systems to adapt to changing environments. Issues like “P2P Network and Ethereum Build Process”, “Tracing and Debugging with Tracers”, and “Repository Management and Issue Handling” demonstrate the importance of creating resilient and adaptable technological solutions⁵⁶. These efforts ensure long-term sustainability and flexibility, fostering innovation and growth.

Table 6 shows how the found topics relate to the five sustainability measures. Developer discussions mainly contribute to the platform’s technical sustainability, with indirect benefits to social, individual, economic, and environmental aspects by reducing energy consumption and optimising emissions.

In particular, we observe several discussions where technical optimisation choices are explicitly tied to resource efficiency. For example, issue 980191017 (The issue body reads: Looking at the numbers we could get higher throughput with the same compression ratio or the same throughput and better compression ratio: https://github.com/facebook/zstd#benchmarks. There are bindings to go: https://facebook.github.io/zstd/#other-languages, which would be relevant for the freezer and rlpx.), which debates replacing the Snappy compression algorithm with Zstd, illustrates how developers balance throughput against computational load. Snappy offers faster compression and decompression, improving execution speed, whereas Zstd achieves significantly higher compression ratios, reducing disk usage and network bandwidth. The discussion highlights a recurring pattern in our Environmental and Technical (EN/T) topics: decisions framed as performance improvements often have implications for energy consumption and resource utilisation, revealing how developers implicitly negotiate trade-offs between technical optimisation and environmental impact.

Sustainability Network Analysis

We now examine developer participation in sustainability discussions through a network analysis, following the methodology explained in Sec. 3.5. Nodes represent developers, and edges indicate shared comments on sustainability-related GitHub issues. The weight of connections increases with the number of shared comments, revealing collaboration and influence patterns.

We first consider the bipartite network developers-issues projection onto Layer 1 (developers only). To identify influential developers, we used four centrality measures: degree centrality, which counts direct connections and shows how active a developer is; betweenness centrality, which identifies intermediaries who bridge groups; closeness centrality, which indicates how efficiently a developer can spread information; eigenvector centrality, which highlights developers connected to other influential developers. Tables 7 and 8 list the top five developers by user ID and centrality measure values. From Table 8, we see that developer 129561 ranks highest across all centrality measures, highlighting their crucial role in sustainability discussions within the Go-Ethereum community. This developer is very active in discussions (high degree centrality), acts as a key information conduit (high betweenness centrality), is closely connected to others (high closeness centrality), and is influential among influential developers (high eigenvector centrality).

Table 7 Centrality on all topics.

Full size table

Table 8 Centrality on sustainable topics.

Full size table

We, then, consider the bipartite projection on Layer 2 (issues). Each node in the network represents an issue, and links represent shared comments by developers, with link thickness indicating the number of shared developers. We focused on the connected component of the network (i.e., the set of nodes connected by paths), excluding 60 isolated issues.

Each issue is categorised under two labels: a micro-label for the specific topic (left column of Table 6) and a macro-label for the related SusAF measure (right column of Table 6). The dataset includes 2, 279 issues across 23 topics (micro-labels) classified into 6 SusAF measures (macro-labels): EC/EN, EC/T, EN/T, I/T, S/T, T (Economic & Environmental, Economic & Technical, Environmental & Technical, Individual & Technical, Social & Technical, Technical, respectively). The distribution of macro-labels is: EC/EN, EC/T, and EN/T each with 1 topic; I/T with 2 topics; S/T with 6 topics; and T with 12 topics, totalling 23 topics.

We visualise the network with a colour-coded scheme using 23 colours and 6 unique shapes for macro-labels, as detailed in Table 6. The connected component of commented issues is shown in Fig. 7, Left Panel. The Leiden algorithm identifies six distinct communities, illustrated in different colours in Fig. 7, Right Panel⁶⁷.

**Fig. 7: Network of commented sustainable issues.**

Adopting the methodology described by Mungo et al.¹, we analyse a network of commented issues consisting of N = 2219 nodes and k = 6 clusters S, with ∣S_i∣ representing the number of nodes in cluster S_i. We examine the in-density and out-density of links according to the partitioning established by the Leiden algorithm^60,67. Using the unweighted adjacency matrix A of the commented issues network and the clustering S^* = {S₁, …, S_k}, the in-density for a cluster S_i is defined as:

$${\rho }_{i}^{i}=\frac{1}{| {S}_{i}| (| {S}_{i}| -1)}\sum _{j,k\in {S}_{i},j\ne k}{A}_{jk},$$

(1)

and the out-density is given by:

$${\rho }_{i}^{o}=\frac{1}{| {S}_{i}| (N-| {S}_{i}| )}\sum _{j\in {S}_{i},k\notin {S}_{i}}{A}_{jk}.$$

(2)

Here, A_jk represents a binary value from the adjacency matrix indicating the presence of a link between nodes j and k, focusing solely on the existence of a connection between issues rather than its strength. We then compare the in-densities and out-densities of the clusters identified by the algorithm against those of randomly generated clusters. For the random clusters, each issue is assigned to one of the six possible clusters with equal probability; the simulation is obtained with N = 10⁶ random networks.

We show the comparison of in- and out-densities for the six communities identified by the Leiden algorithm in Fig. 8. As expected, these clusters exhibit high in-cluster density, confirming that the algorithm successfully isolates tightly connected groups of issues. Comparing these results with random clusters (red shaded area) reveals significant differences, validating the presence of cohesive structural communities in the commented issues network. We complement the unsupervised community detection with a supervised analysis based on predefined sustainability categories. Figure 9 presents the same analysis when issues are grouped according to their SusAF macro-labels. Here, the densities provide information on whether developers concentrate on the same sustainability topic or distribute their comments across different ones. We observe that some categories (e.g., EC/T, EN/T) display higher in-cluster density, indicating focused developer attention on those topics. Others (e.g., T, S/T) lie closer to the diagonal, suggesting that developers engage more broadly across different categories. Taken together, the two perspectives highlight both the endogenous community structure emerging from the network (unsupervised) and the degree to which this structure aligns with exogenous sustainability categories (supervised). While Leiden clustering identifies cohesive yet mixed communities, the SusAF partition reveals topic-specific differences in how developers allocate their contributions.

**Fig. 8: Unsupervised cluster analysis.**

**Fig. 9: Supervised cluster analysis.**

To further interpret the identified communities, we next examine their composition to assess how they relate to the predefined sustainability categories. Figure 10 presents a bar plot showing the distribution of issues per SusAF dimensions (EC/EN, EC/T, EN/T, I/T, S/T, T) within each community. The x-axis ticks are colour-coded as in Fig. 8 and Fig. 7, Right Panel. Each bar represents the number of issues of a specific topic within the clusters.

**Fig. 10: Distribution of the SusAF measures across the six clusters.**

Community 0 (blue) and Community 2 (yellow) are mainly associated with EC/EN (Economic & Environmental) and I/T (Individual & Technical) topics. Community 1 (green) is distinct for S/T (Social & Technical). Technical (T) topics are dispersed throughout the network. The EC/T (Economic & Technical) and EN/T (Environmental & Technical) measures have only 33 and 46 issues, respectively, indicating limited relevance (Table 6).

We also analyse the creation year of the issues, presenting the temporal trends in Fig. 11. We examine the temporal distribution of opened issues by community over the years, revealing distinct activity patterns. Community 1 (green, S/T) was most active from 2015 to 2017. Community 2 (yellow, EC/EN and I/T) saw increased activity during the middle years. Since 2020, Community 0 (blue, EC/EN and I/T) has shown a notable rise in engagement. Similar trends are observed when analysing the last update dates, indicating consistent community engagement over time.

Summarising, we found that developers prefer engaging in discussions similar to their previous ones, rather than responding randomly. Community detection analysis reveals that developers consistently respond to Economic, Environmental, and Individual topics, with increased activity from 2019 onwards. Social issues were notably active from 2015 to 2017, indicating selective engagement. Technical issues are broadly discussed by all developers. The communities identified by the Leiden algorithm, which include developers with shared interests, reflect issues that were initiated in the same year or in adjacent years. These findings are consistent with the SusAF categorisation of topics we applied, effectively mirroring the thematic preferences and temporal engagement patterns within the developer community.

Threats to Validity

Construct Validity

Our analysis captures sustainability-related discussions as they appear in GitHub issues and comments. This dataset provides a transparent and complete record of on-platform communication, but excludes other channels used by Ethereum developers (e.g., forums, Discord, mailing lists), which may host additional discussions on sustainability. Topic identification relies on BERTopic with BGE-small embeddings, while sustainability categorisation is based on the SusAF framework. Although these methods offer interpretability and structure, they remain approximations of developers’ underlying intentions. Some sustainability considerations may be embedded in technical debates without explicit framing, potentially leading to an underestimation of certain dimensions. Zero-shot classification also depends on predefined keyword sets derived from prior literature, which may influence the boundaries of sustainability-related topics.

Internal Validity

LLM-assisted labelling introduces potential subjectivity. To mitigate this, we combined automated topic descriptions with a multi-annotator manual validation process, achieving moderate agreement (Fleiss’ κ = 0.509) and resolving remaining disagreements through adjudication. However, LLM outputs may vary due to model stochasticity, and human judgements may reflect individual interpretations. Topic modelling is sensitive to parameter choices, such as minimum topic size, dimensionality reduction, and embedding model, meaning alternative configurations could yield slightly different topic structures. Similarly, the bipartite network projection treats all comments as equal signals of participation, without distinguishing comment depth or technical substance. These modelling choices may affect perceived topic prominence and developer engagement.

External Validity

This study focuses exclusively on the Go-Ethereum repository, which plays a central role in the Ethereum ecosystem but may not be representative of other clients, tooling projects, or developer communities. Sustainability discussions in Go-Ethereum may be shaped by its proximity to protocol-level evolution, long project history, and specific organisational norms. Results therefore characterise sustainability as expressed through this project’s issue-tracker communication rather than the broader Ethereum ecosystem. The SusAF framework enables a structured interpretation of sustainability, but its application to software engineering discussions may emphasise different dimensions in projects oriented toward user-facing sustainability, hardware optimisation, or organisational governance.

Conclusion Validity

We employed established techniques for topic modelling, zero-shot classification, inter-rater validation, and bipartite network analysis, along with standard procedures for frequency analysis and community detection. Topic modelling, however, generates probabilistic clusters rather than definitive semantic categories, and sustainability assignments reflect interpretive judgements grounded in the SusAF taxonomy. While the entire methodological pipeline is documented and reproducible, small variations in embeddings, clustering, or LLM-generated descriptions are expected. Despite these limitations, the consistency observed across quantitative topic patterns, manual validation, and network results increases confidence in the reported findings. Our conclusions should therefore be interpreted as characterising patterns of sustainability-related discussion rather than providing direct measurements of developer intent.

Conclusion and Future Works

This study provides a novel and reproducible approach to understanding how sustainability concerns are discussed and negotiated within large-scale open-source digital infrastructures. We use Ethereum as an example since it is the most comprehensive public blockchain project, and offers a collection of developer transactions spanning over a timeframe of almost ten years. To determine when, how, and by whom sustainability themes occur, our analysis combines BERT-based topic modelling, LLM analysis, and network science tools.

Our findings reveal that while technical optimisation dominates the discourse, developers increasingly engage with themes of environmental impact, economic efficiency, and resource consumption, particularly around key events such as Ethereum’s transition to Proof of Stake consensus. As shown in Fig. 11, when isolating contributions labelled under the environmental dimension, a gradual increase in developer participation becomes evident. This trend highlights that environmental considerations are becoming an integrated part of the technical discourse rather than an external or peripheral concern. Our analysis reveals that sustainability is a significant topic among Ethereum developers, with 23 out of 58 topics related to it. We also identify influential contributors and observe thematic specialisation across developer communities, offering a nuanced view of how sustainability engagement unfolds in decentralised settings. This nuanced view refers to the way sustainability-related discussions intersect across multiple dimensions (individual, technical, economic, environmental) rather than appearing as isolated environmental concerns. Developers often articulate environmental considerations through the lenses of technical optimisation (e.g., efficiency improvements, gas-cost reductions) and economic reasoning (e.g., cost-performance trade-offs). This overlap illustrates that sustainability engagement in decentralised settings is a multidimensional process, shaped by the interplay between technical design choices, economic incentives, and environmental awareness. Through network analysis, we identified individuals who not only participate frequently but also serve as central nodes connecting diverse threads of conversation. These developers play a pivotal role in shaping the discourse around sustainability, acting as both contributors and knowledge brokers who bridge different thematic areas. Key developers driving these discussions emerge as influential figures also in the broader Go-Ethereum network.

Our analysis shows that developers tend to engage most actively with issues that align with their prior contributions, indicating the presence of specialised knowledge clusters within the community. Rather than randomly participating in all open discussions, contributors consistently gravitate toward topics within their domain of expertise, whether related to energy efficiency, protocol optimisation, or social infrastructure. This specialisation reflects the complex and distributed nature of large-scale open-source projects, where expertise is often deep but not uniformly distributed.

These patterns have meaningful implications. First, they suggest that advancing sustainability in open-source software ecosystems may require targeted interventions that align with the interests and strengths of key contributors. Second, they highlight the potential for leveraging these clusters to build more structured sustainability leadership within communities, mobilising domain experts not just for technical improvements, but also for fostering cross-cutting awareness and coordination. Understanding who engages with what, and how, can inform strategies to better integrate sustainability into the day-to-day development process, and to scale successful practices across the broader digital ecosystem.

Overall we demonstrate that sustainability is not only a policy challenge or engineering constraint, but it is a social and collaborative process embedded in the daily practices of those who shape digital technologies. Our work provides a data-driven framework for evaluating sustainability discourse within other software and infrastructure ecosystems, contributing to ongoing efforts to align technological innovation with global sustainability goals.

At the same time, several limitations must be acknowledged. Our analysis is restricted to GitHub data from a single project, potentially missing relevant discussions on other platforms or in different blockchain communities. The use of topic modelling and keyword-based classification, while powerful, may overlook implicit or nuanced forms of sustainability discourse. The SusAF framework offers though a novel lens to categorise technical discussion over different dimension of the technology. In addition, while we focus on developers’ discussions as an observable and replicable dataset, sustainability choices in practice also depend on node and validator operators, actors who often overlap with developers but whose operational forums (e.g., Ethereum Research, Reddit, or validator mailing lists) represent promising avenues for future extension of this framework.

While this study focuses on the Ethereum ecosystem, the framework we propose is not project-specific. It offers a replicable and transferable approach for examining how sustainability discourse unfolds in any large-scale, collaborative digital environment. By adjusting the thematic keywords and contextual prompts, the same methodology can be applied to other open-source software repositories or online communities to trace how sustainability-related debates evolve over time. In this sense, Ethereum serves as a testbed for a broader model that can inform comparative research across distributed software ecosystems. Future research should expand this framework across multiple platforms and projects, integrate quantitative sustainability indicators (such as carbon or energy metrics), and adopt longitudinal approaches to track how sustainability discussions evolve over time. Such efforts will be essential in deepening our understanding of how digital systems – from blockchain to AI to cloud infrastructure – can support, rather than undermine, sustainable development.

By foregrounding the voices and actions of developers, this study highlights the importance of grassroots agency in the pursuit of sustainable digital futures. It invites scholars, practitioners, and policymakers alike to consider not just what we build, but how we build it – and how those practices align with the broader goals of social responsibility, environmental stewardship, and long-term technological resilience.

Previous Presentation

International Conference on Evaluation and Assessment in Software Engineering (EASE), 2024.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The code and data used for this study is available at this Github repository⁶⁸.

References

Mungo, L., Bartolucci, S. & Alessandretti, L. Cryptocurrency co-investment network: token returns reflect investment patterns. EPJ Data Sci. 13, 11 (2024).
Article Google Scholar
Briola, A. & Aste, T. Dependency structures in cryptocurrency market from high to low frequency. Entropy 24, 1548 (2022).
Article Google Scholar
Giuffrida, S., Salim, S., Ullah, A. & Vaccargiu, M. A Move Sui library for secure, certified and trusted supply chain ownership management. In Proc. IEEE/ACM 7th International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), 50–56 (IEEE, 2025).
Schinckus, C. Proof-of-work based blockchain technology and anthropocene: An undermined situation? Renew. Sustain. Energy Rev. 152, 111682 (2021).
Article Google Scholar
Wendl, M., Doan, M. H. & Sassen, R. The environmental impact of cryptocurrencies using proof of work and proof of stake consensus algorithms: A systematic review. J. Environ. Manag. 326, 116530 (2023).
Article Google Scholar
Lange, S., Pohl, J. & Santarius, T. Digitalization and energy consumption. Does ICT reduce energy demand? Ecol. Econ. 176, 106760 (2020).
Article Google Scholar
Vaccargiu, M., Pinna, A., Tonelli, R. & Cocco, L. Blockchain in the energy sector for SDG achievement. Sustainability 15, 2014843 (2023).
Tripathi, G., Ahad, M. A. & Casalino, G. A comprehensive review of blockchain technology: underlying principles and historical background with future challenges. Decis. Anal. J. 9, 100344 (2023).
Article Google Scholar
Kohli, V., Chakravarty, S., Chamola, V., Sangwan, K. S. & Zeadally, S. An analysis of energy consumption and carbon footprints of cryptocurrencies and possible solutions. Digital Commun. Netw. 9, 79–89 (2023).
Article Google Scholar
Deshpande, A., Stewart, K., Lepetit, L. & Gunashekar, S. Distributed ledger technologies/blockchain: challenges, opportunities and the prospects for standards. Overv. Rep.- Br. Stand. Inst. 40, 40 (2017).
Google Scholar
Ghosh, E. & Das, B. A study on the issue of blockchain’s energy consumption. In Chakraborty, M., Chakrabarti, S. & Balas, V. E. (eds.) Proceedings of International Ethical Hacking Conference 2019, 63–75 (Springer, 2020).
Islam, M. R., Rashid, M. M., Rahman, M. A., Mohamad, M. H. S. B. & Embong, A. H. B. A comprehensive analysis of blockchain-based cryptocurrency mining impact on energy consumption. Int. J. Adv. Comput. Sci. Appl. 13, 130469 (2022).
Vaccargiu, M. et al. Sustainability in blockchain development: a BERT-based analysis of Ethereum developer discussions. In Proc. 28th International Conference on Evaluation and Assessment in Software Engineering, 381–386 (ACM, 2024).
Jiang, S., Li, Y., Lu, Q., Hong, Y., Guan, D., Xiong, Y., & Wang, S. Policy assessments for the carbon emission flows and sustainability of Bitcoin blockchain operation in China. Nat. Commun. 12, 1938 (2021).
Zhang, D., Chen, X. H., Lau, C. K. M. & Xu, B. Implications of cryptocurrency energy usage on climate change. Technol. Forecast. Soc. Change 187, 122219 (2023).
Article Google Scholar
Truby, J., Brown, R. D., Dahdal, A. & Ibrahim, I. Blockchain, climate damage, and death: Policy interventions to reduce the carbon emissions, mortality, and net-zero implications of non-fungible tokens and bitcoin. Energy Res. Soc. Sci. 88, 102499 (2022).
Article Google Scholar
Smith, T. B., Vacca, R., Mantegazza, L. & Capua, I. Natural language processing and network analysis provide novel insights on policy and scientific discourse around sustainable development goals. Nat. Sci. Rep. 11, 22427 (2021).
Google Scholar
Qin, K. & Gervais, A. An overview of blockchain scalability, interoperability and sustainability. Hochschule Luzern Imperial College London Liquidity Network 1–15 (EU, 2018).
Arshad, A., Shahzad, F., Ur Rehman, I. & Sergi, B. S. A systematic literature review of blockchain technology and environmental sustainability: status quo and future research. Int. Rev. Econ. Financ. 88, 1602–1622 (2023).
Article Google Scholar
Asif, R. & Hassan, S. R. Shaping the future of Ethereum: exploring energy consumption in proof-of-work and proof-of-stake consensus. Front. Blockchain 6, 1151724 (2023).
Article Google Scholar
Pinna, A., Baralla, G., Marchesi, M. & Tonelli, R. Raising sustainability awareness in agile blockchain-oriented software engineering. In Proc. IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 696–700 (IEEE, 2021).
Eligüzel, N. An analysis of the integration of sustainability concepts into blockchain technology. Int. J. Appl. Methods Electron. Comp. 11, 158–164 (2023).
Google Scholar
Liu, Y., Zhang, S., Chen, M., Wu, Y. & Chen, Z. The sustainable development of financial topic detection and trend prediction by data mining. Sustainability 13, 7585 (2021).
Stamoulis, E. Comparative study on the environmental, political, social effects and long-term sustainability of Bitcoin, Ethereum, Tether and Cardano cryptocurrencies, Master’s thesis (University of Twente, 2021).
Ayman, A., Roy, S., Alipour, A., & Laszka, A. Smart contract development from the perspective of developers: Topics and issues discussed on social media. In International Conference on Financial Cryptography and Data Security (pp. 405-422). (Springer, 2020).
Ibba, G. & Vaccargiu, M. Analysis of users’ most discussed topics and trends on blockchain technologies and smart contracts. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 865–873 (IEEE, 2023).
Vaccargiu, M., Ibba, G. & Pinna, A. Topics analysis and trends on blockchain applications in the energy sector. In Proc. IEEE International Conference on Software Analysis, Evolution and Reengineering - Companion (SANER-C), 68–71 (IEEE, 2024).
Kim, B. T.-S. & Hyun, E.-J. Mapping the landscape of blockchain technology knowledge: a patent co-citation and semantic similarity approach. Systems 11, 030111 (2023).
Vidal-Tomás, D. & Bartolucci, S. Artificial intelligence and digital economy: divergent realities. Available at SSRN 4589333 (2023).
Lin, D., Wu, J., Yuan, Q. & Zheng, Z. Modeling and understanding ethereum transaction records via a complex network approach. IEEE Trans. Circuits Syst. II: Express Briefs 67, 2737–2741 (2020).
Google Scholar
Aufiero, S., Ibba, G., Bartolucci, S., Destefanis, G., Neykova, R., & Ortu, M. Dapps ecosystems: Mapping the network structure of smart contract interactions. EPJ Data Sci. 13, 60 (2024).
Destefanis, G. Complex systems oriented approach for dapps analysis. In 2024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), 757–762 (IEEE, 2024).
Ibba, G., Aufiero, S., Bartolucci, S., Neykova, R., Ortu, M., Tonelli, R. & Destefanis, G. Mindthedapp: a toolchain for complex network-driven structural analysis of ethereum-based decentralized applications. IEEE Access 12, 28382–28394 (2024).
Ibba, G., Aufiero, S., Neykova, R., Bartolucci, S., Ortu, M., Tonelli, R., & Destefanis, G. A curated solidity smart contracts repository of metrics and vulnerability. In Proc. 20th International Conference on Predictive Models and Data Analytics in Software Engineering pp. 32–41 (ACM, 2024).
Ibba, G., Khullar, S., Tesfai, E., Neykova, R., Aufiero, S., Ortu, M., Bartolucci, S. & Destefanis, G. A preliminary analysis of software metrics in decentralised applications. In Proceedings of the Fifth ACM International Workshop on Blockchain-enabled Networked Sensor Systems (pp. 27–33) (2023).
Ibba, G., Destefanis, G., Neykova, R., Ortu, M., Aufiero, S., & Bartolucci, S. Dai: A dependencies analyzer and installer for solidity smart contracts. In IEEE International Conference on Software Analysis, Evolution and Reengineering-Companion (SANER-C) pp. 72–75 (IEEE, 2024).
Bartolucci, S., Destefanis, G., Ortu, M., Uras, N., Marchesi, M., & Tonelli, R. The butterfly “affect”: impact of development practices on cryptocurrency prices. EPJ Data Sci. 9, 21 (2020).
Lucchini, L., Alessandretti, L., Lepri, B., Gallo, A. & Baronchelli, A. From code to market: network of developers and correlated returns of cryptocurrencies. Sci. Adv. 6, eabd2204 (2020).
Article Google Scholar
Ortu, M., Uras, N., Conversano, C., Bartolucci, S. & Destefanis, G. On technical trading and social media indicators for cryptocurrency price classification through deep learning. Expert Syst. Appl. 198, 116804 (2022).
Article Google Scholar
Vaccargiu, M., Aufiero, S., Ba, C., Bartolucci, S., Clegg, R., Graziotin, D., Neykova, R., Tonelli, R, & Destefanis, G. (2025). Mining a Decade of Event Impacts on Contributor Dynamics in Ethereum: A Longitudinal Study. In IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) (pp. 552-563) (IEEE, 2025).
Destefanis, G., Xu, J.,& Bartolucci, S. Measuring the decentralisation of DeFi development: An empirical analysis of contributor distribution in Lido. Inf. Syst. 139, 102695 (2024).
Mumtaz, H., Paradis, C., Palomba, F., Tamburri, D. A., Kazman, R., & Blincoe, K. A preliminary study on the assignment of github issues to issue commenters and the relationship with social smells. In Proceedings of the 15th International Conference on Cooperative and Human Aspects of Software Engineering pp. 61–65 (ACM, 2022).
Jamieson, J., Yamashita, N. & Foong, E. Predicting open source contributor turnover from value-related discussions: An analysis of github issues. In Proc. IEEE/ACM 46th International Conference on Software Engineering (ICSE), 678–690, (IEEE, 2024).
Ortu, M., Hall, T., Marchesi, M., Tonelli, R., Bowes, D., & Destefanis, G. Mining communication patterns in software development: A ithub analysis. In Proceedings of the 14th international conference on predictive models and data analytics in software engineering (pp. 70-79) (ACM, 2018).
Ortu, M., Destefanis, G., Hall, T. & Bowes, D. Fault-insertion and fault-fixing behavioural patterns in apache software foundation projects. Inf. Softw. Technol. 158, 107187 (2023).
Article Google Scholar
Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. ArXiv preprint: 2203.05794 (2022).
Bafna, P., Pramod, D. & Vaidya, A. Document clustering: TF-IDF approach. In 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 61–66 (IEEE, 2016).
Khan, M.Q., Shahid, A., Uddin, M.I., Roman, M., Alharbi, A., Alosaimi, W., Almalki, J. & Alshahrani, S.M. Impact analysis of keyword extraction using contextual word embedding. PeerJ Comp. Sci. 8, e967 (2022).
Ma, T., Yao, J. G., Lin, C. Y. & Zhao, T. Issues with entailment-based zero-shot text classification. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) pp. 786–796 (NLP, 2017).
Nair, P. R. & Dorai, D. R. Evaluation of performance and security of proof of work and proof of stake using blockchain. In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 279–283, (2021).
Blanco-Cuaresma, S. et al. Experimenting with Large Language Models and vector embeddings in NASA SciX. ArXiv preprint: 2312.14211. (2023).
Röder, M., Both, A. & Hinneburg, A. Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining, 399–408 (ICDM, 2015).
Rijcken, E., Scheepers, F., Zervanou, K., Spruit, M., Mosteiro, P., & Kaymak, U. Towards interpreting topic models with ChatGPT. In The 20th World Congress of the International Fuzzy Systems Association (World Congress, 2023).
Colavito, G., Lanubile, F., Novielli, N. & Quaranta, L. Leveraging GPT-like llms to automate issue labeling. In 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR), 469–480 (IEEE, 2024).
Landis, J. R., & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159-174 (1977).
Duboc, L. et al. Requirements engineering for sustainability: an awareness framework for designing software systems for a better tomorrow. Require. Eng. 25, 469–492 (2020).
Becker, C. et al. Sustainability design and software: The Karlskrona manifesto. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2, 467–476 (IEEE, 2015).
Latora, V., Nicosia, V., & Russo, G. Complex networks: principles, methods and applications. (Cambridge University Press, 2017).
Bartolucci, S., Caccioli, F., Caravelli, F., & Vivo, P. Distribution of centrality measures on undirected random networks via the cavity method. Proc. Natl. Acad. Sci. USA 121, e2403682121 (2024).
Traag, V. A., Waltman, L. & Van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Nat. Sci. Rep. 9, 1–12 (2019).
Google Scholar
Javaid, M., Haleem, A., Singh, R. P., Suman, R. & Khan, S. A review of blockchain technology applications for financial services. BenchCouncil Trans. Benchmarks Stand. Eval 2, 100073 (2022).
Article Google Scholar
Ullah, N., Al-Rahmi, W. M., Alfarraj, O., Alalwan, N., Alzahrani, A. I., Ramayah, T. & Kumar, V. Hybridizing cost saving with trust for blockchain technology adoption by financial institutions. Telemat. Inform. Rep. 6, 100008 (2022).
Article Google Scholar
Devine, A., Jabbar, A., Kimmitt, J. & Apostolidis, C. Conceptualising a social business blockchain: the coexistence of social and economic logics. Technol. Forecast. Soc. Change 172, 120997 (2021).
Article Google Scholar
Kassen, M. Understanding decentralized civic engagement: focus on peer-to-peer and blockchain-driven perspectives on e-participation. Technol. Soc. 66, 101650 (2021).
Article Google Scholar
Cha, J., Singh, S. K., Pan, Y. & Park, J. H. Blockchain-based cyber threat intelligence system architecture for sustainable computing. Sustainability 12, 6401 (2020).
Article Google Scholar
Parmentola, A., Petrillo, A., Tutore, I. & De Felice, F. Is blockchain able to enhance environmental sustainability? A systematic review and research agenda from the perspective of sustainable development goals (SDGs). Bus. Strategy Environ. 31, 194–217 (2022).
Article Google Scholar
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).
Article Google Scholar
Vaccargiu, M., Aufiero, S., Bartolucci, S., Neykova, R., Tonelli, R., and Destefanis, G., Sustainability Assessment Ethereum Developers. GitHub repository, 2026. Available at: https://github.com/matteovaccargiu/SustainabilityAssessmentEthereumDevelopers. Accessed: 3 February 2026.

Download references

Acknowledgements

M.V. and R.T. acknowledge financial support under the National Recovery and Resilience Plan (NRRP), by the Italian Ministry of University and Research (MUR) funded by the European Union-NextGenerationEU. Project Code ECS0000038—Project eINS Ecosystem of Innovation for Next Generation Sardinia—CUP F53C22000430001-Grant Assignment for the PoC “TraCCCS: Tracciabilitá, Certificazione Blockchain e Valorizzazione dei Carbon Credits per PMI Sarde con Impianti di Produzione di Energia Rinnovabile”.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
Matteo Vaccargiu & Roberto Tonelli
Department of Computer Science, Brunel University of London, London, UK
Matteo Vaccargiu & Rumyana Neykova
Department of Computer Science, University College London, London, UK
Sabrina Aufiero, Silvia Bartolucci & Giuseppe Destefanis

Authors

Matteo Vaccargiu
View author publications
Search author on:PubMed Google Scholar
Sabrina Aufiero
View author publications
Search author on:PubMed Google Scholar
Silvia Bartolucci
View author publications
Search author on:PubMed Google Scholar
Rumyana Neykova
View author publications
Search author on:PubMed Google Scholar
Roberto Tonelli
View author publications
Search author on:PubMed Google Scholar
Giuseppe Destefanis
View author publications
Search author on:PubMed Google Scholar

Contributions

S.B.: Conceptualisation; Methodology; Data curation; Writing–original draft; Writing–review & editing. G.D.: Conceptualisation; Methodology; Data curation; Writing–original draft; Writing–review & editing. S.A.: Conceptualisation; Methodology; Data curation; Investigation; Writing–original draft; Writing–review & editing. M.V.: Conceptualisation; Methodology; Data curation; Investigation; Writing–original draft; Writing–review & editing. R.N.: Conceptualisation; Methodology; Writing–review & editing. R.T.: Validation; Writing–review & editing. All authors approved the final version of the manuscript.

Corresponding author

Correspondence to Silvia Bartolucci.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Sustainability thanks Johannes Sedlmeir, Rodrigo Dutra Garcia and Birgit Penzenstadler for their contribution to the peer review of this work. Peer review was single-anonymous OR Peer review was double-anonymous. Primary Handling Editor: Nandita Basu. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer Review file (download PDF )

Reporting Summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Vaccargiu, M., Aufiero, S., Bartolucci, S. et al. Developer engagement in open-source software’s green transition. Commun. Sustain. 1, 41 (2026). https://doi.org/10.1038/s44458-026-00050-w

Download citation

Received: 29 July 2025
Accepted: 14 February 2026
Published: 06 March 2026
Version of record: 06 March 2026
DOI: https://doi.org/10.1038/s44458-026-00050-w

Subjects

Abstract

Similar content being viewed by others

Evolving collaboration, dependencies, and use in the Rust Open Source Software ecosystem

Sustainable professional growth through digital mentorship: evidence from language teachers in low-resource settings

Examples of shifting development pathways: lessons on how to enable broader, deeper, and faster climate action

Introduction

Related Works

Methodology

Dataset Overview and Statistics

Topic Modelling

Topic Interpretation

Sustainability Awareness Framework (SusAF)

Sustainability Network Analysis

Generalisability and applicability of the framework

Data collection and pre-processing

Thematic extraction and sustainability classification

Network construction and engagement analysis

Aggregated indicators and sustainability engagement index

Extending the framework

Results

Sustainability Evaluation

Sustainability Network Analysis

Threats to Validity

Construct Validity

Internal Validity

External Validity

Conclusion Validity

Conclusion and Future Works

Previous Presentation

Reporting summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Transparent Peer Review file (download PDF )

Reporting Summary (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links