Introduction

Recent assessments indicate that even with full implementation of current nationally determined contributions, global temperatures would still rise by 2.5–3 °C by the end of the century (UNEP 2023). This trajectory significantly exceeds the Paris Agreement’s 1.5 °C target (Matthews & Wynes 2022), underscoring a persistent gap between climate pledges and actionable mitigation efforts (Bolton & Kacperczyk 2023; Huynh & Xia 2023). Of particular concern is China, which accounts for approximately 32% of global CO2 emissions, a share that not only continues to grow but also dwarfs the contributions from the U.S. (14%) and the EU (8%). China’s commitment to peaking emissions by 2030 and achieving carbon neutrality by 2060 (Zhao et al. 2022) necessitates addressing Scope 3 (value chain) emissions, particularly from small and medium suppliers (SMSs) that account for 65% of China’s carbon footprint through these indirect emissions (Meng et al. 2018). However, a significant hurdle for most SMSs in their carbon emission reduction (CER) efforts is a high financing cost (Cecere et al. 2020). A recent survey by the Center for International Knowledge on Development (2023) shows that the average interest rate for China’s SMEs is 5.8%, which is nearly 50% higher than the 3.9% rate for large enterprises.

The bank-led carbon finance (BLCF) model has gained prominence in both developed and developing economies as a policy-driven mechanism to reduce financing costs for SMSs undertaking CER initiatives. For example, the European Investment Bank administers the EU Innovation Fund, which allocated €2.87 billion during 2021 and 2023 to finance CER projects by SMSs. A representative example is the €18 million financing extended in 2021 to an Italian cement plant for the CLEANKER carbon capture, utilization, and storage project, achieving a 33% reduction in emissions intensity from 1.2t to 0.8t CO2 per ton of cement produced. In China, the People’s Bank of China (2021) introduced the Carbon Emission Reduction Support Tool, a structural monetary instrument designed to incentivize banks through subsidized refinancing rates to extend CER loans to SMSs. Regulatory disclosures in 2023 revealed that 17 participating banks disbursed $19.42 billion across 1353 CER projects. Notwithstanding these advancements, the BLCF model faces systemic challenges stemming from information asymmetries (Akomea-Frimpong et al. 2022). SMSs’ limited capacity for verifiable emissions monitoring results in opacity in CER performance data, constraining banks’ ability to conduct comprehensive due diligence (Bertini et al. 2022; Gao & Souza 2022; Beyer et al. 2024). This informational deficit compels financial institutions to incorporate risk premiums into CER financing terms (Huang et al. 2023), disproportionately burdening capital-constrained SMSs.

To address the structural deficiencies of the BLCF model, the firm-led carbon finance (FLCF) model has emerged. This model leverages the resource integration capabilities and information hub role of supply chain core firms to establish vertical collaboration mechanisms (Song et al. 2016; Song et al. 2018), thereby resolving the financing challenges faced by SMSs in implementing CER initiatives (Dou et al. 2015). For example, Walmart requires key suppliers to connect to its Sustainability Hub platform to monitor product lifecycle carbon emissions (Neebe 2020). The platform algorithmically converts emission reductions into tradable carbon credits. Through a partnership with J.P. Morgan, Walmart links suppliers’ carbon performance to loan interest rates, with top performers qualifying for preferential rates. Similarly, State Grid Corporation of China, the world’s largest utility company, employs an analogous mechanism (Xu et al. 2023). For instance, it requires its suppliers to join the Yingda Carbon Management Platform, which digitally integrates supply chain carbon emission data to generate standardized carbon credits. Additionally, it collaborates with Bank of Communications to develop targeted financial instruments.

However, three critical research gaps persist in the extant literature. First, few studies have empirically validated the BLCF model’s long-term efficacy in promoting CER among SMSs, specifically its heterogeneous impacts on firms with varying emission reduction capabilities. Second, how core firms implement their critical functions in the FLCF model remains underexplored. Third, limited data availability constrains cross-model comparisons of BLCF/FLCF mechanisms and the examination of contextual moderators.

To address these research gaps, this study proposes a mixed-methods framework that integrates semi-structured interviews (qualitative) with reinforcement learning modeling (quantitative). The research identifies three critical roles of core firms within carbon finance ecosystems and systematically compares the effectiveness of two carbon finance models in promoting CER among SMSs. Theoretically, this study builds upon supply chain stewardship theory to innovatively propose that core firms perform three intermediary roles (credit, service, and value intermediaries) in the FLCF model. This theoretical advancement clarifies how core firms strategically bridge financial and environmental governance to facilitate supply chain decarbonization. Methodologically, the study pioneers a mixed-methods approach. The research design begins with semi-structured interviews to explore the operational mechanisms of the FLCF model, extracting real-world behavioral features (such as decision motives and incentive mechanisms). These qualitative insights are then operationalized into reinforcement learning parameters (including action spaces and reward functions) to construct computational models. Finally, multiagent simulations are employed to quantitatively validate the decarbonization efficacy of the two finance models. This approach establishes a rigorous, closed-loop validation process that progresses from qualitative theory-building to quantitative verification. By synergizing qualitative depth with computational rigor, this framework not only strengthens the robustness of conclusions but also enhances the generalizability of interview-derived theories through algorithmic replication. The methodology offers a transferable paradigm for interdisciplinary studies on sustainable finance and supply chain governance.

This paper is structured as follows: The “Literature review” section provides a literature review. The “Research methodology” section employs semi-structured interviews to analyze the critical role of core firms and extract real-world behavioral features for the BLCF/FLCF model. The “The semi-structured interviews conducted with the state grid and Yingda” section operationalizes these features into reinforcement learning parameters to develop data-driven computational models. The “Verification experiments and extended analysis based on reinforcement learning” section evaluates the empirical results. The conclusion synthesizes key findings and practical insights.

Literature review

Challenges of carbon emission reductions for SMSs

The most critical barrier to low-carbon transitions for SMSs globally is their insufficient carbon accounting capabilities, with 48.3% of Chinese SMSs exhibiting significant gaps in this area (PwC 2023). Effective carbon accounting requires accurate measurement of Scope 1 (direct emissions), Scope 2 (indirect energy-related emissions), and Scope 3 (value chain emissions) (Hettler & Graf-Vlachy 2024), which depends on robust digital infrastructure including IoT-enabled energy monitoring systems, integrated enterprise resource planning platforms for production digitization (Ma et al. 2023), and specialized carbon management tools like SAP Carbon Footprint Analytics. However, 62.6% of Chinese SMEs remain in the early stage of digital transformation (36Kr, Lenovo 2024), lacking both the IT infrastructure and technical expertise needed to implement such comprehensive carbon accounting systems. This digital readiness gap creates a fundamental obstacle to their low-carbon transition efforts.

Economic constraints present even more substantial barriers to carbon accounting adoption. First, high upfront costs create financial barriers. Empirical studies show that establishing carbon accounting systems requires $20,000–$40,000 (OECD 2022), straining SMSs with typical profit margins below 5%. Second, green financing remains prohibitively expensive. SMSs face interest rates 1.5 percentage points higher than large firms (Udeagha & Muchapondwa 2023), due to risk premiums from lenders’ limited CER project assessment capacity (Song et al. 2024). Third, market risks undermine the incentives for SMSs to adopt low-carbon transitions. 55.5% of SMSs cite significant market uncertainties (Boston Consulting Group 2023), as low-carbon products rarely achieve competitive price premiums, weakening short-term economic viability.

The role of banks in carbon finance

While banks are expected to play a pivotal role in overcoming these structural and financial barriers to carbon finance, the demonstrated effectiveness of BLCF remains empirically inconclusive. Developed economy studies show positive effects (Alsaifi et al. 2020; Gidage & Bhide 2025), with Camilleri (2015) demonstrating successful emission reductions through preferential interest rates in Europe, facilitated by standardized carbon accounting (Kaur et al. 2022; Schaltegger & Csutora 2012) and transparent disclosure mechanisms (Augoye et al. 2024; Gidage & Bhide 2025). Conversely, research in developing Asian economies (e.g., Indonesia, China, Bangladesh) reveals increased financing costs for SMSs under BLCF (Wasan et al. 2021; Banga 2019; Gidage et al. 2024), undermining their CER incentives.

Two main factors restrict the BLCF’s CER effect on SMSs in developing countries. First, banks often distrust SMSs’ carbon data due to the lack of standardized carbon management systems, making verification costly (D’Apolito et al. 2024; Gidage et al. 2024). Although policy-driven preferential rates exist (Yu & Rehman Khan 2022), hidden costs due to weak carbon accounting and collateral shortages reduce SMSs’ CER motivation. Second, banks face difficulties evaluating SMSs’ emission reduction potential, especially for innovative startups (Wattanapruttipaisan 2003). Fragmented green certifications and inconsistent carbon accounting methods (Song et al. 2016; Song et al. 2018; Zhang et al. 2022a) lead to higher financing costs, creating an “institutional premium” that discourages SMSs' participation (Oyewole et al. 2024).

These barriers significantly lower SMSs’ acceptance of BLCF. SMSs in developing countries often face acute short-term financial constraints (Chiappini et al. 2022), whereas low-carbon transitions require long-term investment alongside immediate cash flow commitments. When technology payback periods are misaligned with debt repayment schedules, temporary financial strain directly deters participation (Ramasubbu & Kemerer 2021). Additionally, transitioning SMSs confront dual uncertainties: upfront costs for technological upgrades and market risks for green products (Shahin et al. 2024). If expected returns fail to offset transition costs, BLCF may worsen financial fragility rather than improve long-term performance (Song et al. 2016). Consequently, SMSs adopt a cautious stance, creating broader market-level obstacles to carbon finance adoption.

Ethical supply chain stewardship and core firms

Given these limitations of traditional BLCF, alternative models leveraging digital supply chain networks have emerged as a potential solution. In the low-carbon transition of global supply chains, the FLCF model restructures the carbon finance ecosystem through digital innovation. Its core mechanism involves banks utilizing digital carbon platforms established by core firms (Feng & He 2020; Pien 2020), integrating IoT, blockchain, and big data (Zhang et al. 2022b; Li et al. 2021) to enable real-time monitoring and verification of supply chain emissions (Kathuria 2007). Through dynamic environmental performance evaluation, it converts SMSs’ CER efforts into tradable carbon credits (Eyo-Udo et al. 2024), while data-driven accounting facilitates differentiated green financial products.

Empirical evidence shows carbon information platforms reduce information asymmetry across supply chains and between banks and firms via transparent data sharing (Hong & Qin 2024), reducing SMSs’ carbon financing cost premium caused by BLCF information silos. However, the widespread adoption of this model encounters significant barriers. Research by Kadefors et al. (2021) finds that high sunk costs of carbon reduction infrastructure (LaBelle 2012) create cost–benefit uncertainties for core firms promoting CER among SMSs, weakening their motivation and increasing “greenwashing” risks (Xing et al. 2021). Such practices undermine the model’s environmental efficacy.

Beyond these implementation challenges, current FLCF research reveals deeper theoretical limitations in understanding supply chain decarbonization drivers. Existing studies primarily highlight external drivers, chiefly regulatory pressures like the EU’s Carbon Border Adjustment Mechanism carbon tariffs (Bellora & Fontagné 2023) and the U.S. Clean Competition Act’s carbon cost regulations (Li et al. 2024), alongside stakeholder pressures such as institutional investors’ ESG focus (Cao et al. 2023; Bhide 2025) and multinational firms’ sustainability standards (Park & Jang 2021; Gidage et al. 2024). However, this external-pressure paradigm fails to account for the critical role of internal governance mechanisms within core firms.

Against the backdrop of global supply chain decarbonization, core firms are undergoing a paradigm shift in governance logic (Rosenbloom 2017), evolving from traditional economic agents into supply chain stewards. Conventional studies frame core firms as economic agents, primarily focused on cost control, profit maximization, and competitive advantage (Aßländer et al. 2016). Under this perspective, firms often adopt passive or limited intervention strategies toward decarbonization across supply chains. However, with accelerating global carbon neutrality efforts, supply chain emissions face mounting constraints from policy regulations (Xia et al. 2018), market mechanisms (Villena & Dhanorkar 2020), and investor preferences (Zhang & Yousaf 2020). Consequently, individual firms’ carbon reduction efforts prove insufficient for achieving systemic decarbonization.

Emerging research suggests a fundamental shift in lead firms’ governance logic (Caldwell et al. 2008; Davis et al. 1997), which is manifested in two key dimensions: (1) Governance reorientation: From passive compliance to active leadership, core firms no longer merely respond to external pressures (e.g., carbon tariffs, environmental, social, and governance regulations) but proactively develop strategic frameworks for supply chain decarbonization, driving collaborative CER across the value chain (Huang et al. 2017). (2) Value transformation: From individual gains to systemic value, core firms extend beyond self-interest by establishing information-sharing platforms, optimizing relational governance, and integrating value creation pathways to support SMSs in low-carbon transitions (Oyewole et al. 2024), thereby enhancing the sustainability competitiveness of entire supply networks.

Comparative analysis of literature

Through a systematic review and comparative analysis of existing literature (as shown in Table 1), this study identifies the following research gaps: (1) Although studies on the BLCF model have verified its impact on CER and economic performance in SMSs using multinational firm samples, they still exhibit two notable limitations: First, existing research lacks long-term dynamic tracking of heterogeneous SMSs. Second, there is insufficient analysis of the transmission effects of different carbon finance models on the economic performance of core firms, particularly with respect to empirical evidence from the Chinese context. (2) While theoretical studies on the FLCF model have demonstrated its advantages in alleviating information asymmetry, reducing financing costs for SMEs, and promoting CER, its effectiveness still lacks firm-level empirical support due to limited access to microdata. Moreover, existing single-case studies on the operational mechanisms of the FLCF model are constrained by researchers’ subjective biases and case-specific limitations, posing significant challenges to the generalizability of their findings. (3) Current literature fails to adequately explain the dominant role of chain-leading enterprises in carbon finance models and does not uncover their internal mechanisms for achieving supply chain collaborative management through financial governance. (4) Despite extensive independent research on the two carbon finance models, comparative studies remain insufficient, particularly in examining their differential effects on long-term CER and the economic performance of supply chain enterprises. To address these gaps, this study innovatively adopts a mixed-methods approach, combining semi-structured interviews with firm-level data and employing reinforcement learning algorithms for cross-validation.

Table 1 Comparison between BLCF and FLCF models.

Research methodology

Research design: mixed-methods approach

This study adopts an explanatory sequential mixed-methods design. The qualitative phase explores key governance mechanisms through analyses of semi-structured interviews, while the quantitative phase tests and generalizes these findings via reinforcement learning simulations (see Fig. 1).

Fig. 1
figure 1

The explanatory sequential mixed-methods design.

Qualitative research design

Grounded in supply chain steward theory, the research team conducted semi-structured interviews with senior executives from State Grid and its supply chain network, employing narrative analysis to examine critical decision-making processes in carbon finance practices. Following Yin’s (2009) methodology, this approach systematically captured behavioral patterns and decision influences in CER initiatives. While the qualitative methods provided rich exploratory insights, they inevitably carried certain methodological constraints (Prahalad & Bettis 1986; Afuah 2000). Researchers’ theoretical presuppositions risked introducing interpretive biases, while the single-case study design posed challenges for broader generalization (Siggelkow 2007). Perhaps most significantly, the qualitative approach alone could not adequately quantify the differential impacts of various carbon finance instruments or establish clear causal mechanisms, limitations that necessitated complementary quantitative analysis.

Quantitative validation

This study employs a multiagent Deep Q Network algorithm to address supply chain decision-making by formalizing the parameters into a Markov Decision Process. Enhanced with experience replay and target network optimization mechanisms, the algorithm demonstrates stable learning performance in complex supply chain environments (Chen et al. 2021; Shakya et al. 2023). The proposed methodology systematically addresses three key limitations of qualitative analysis. First, it resolves causal inference ambiguities through controlled variable experiments that isolate critical parameters, thereby enabling precise quantification of economic impacts. Second, the multiagent simulation framework dynamically models long-term interactions among heterogeneous firms with varying CER capabilities, overcoming the neglect of organizational diversity inherent in qualitative approaches. Third, the framework facilitates a systematic comparative analysis of distinct carbon financial mechanisms, providing insights into their relative efficacy.

Qualitative component

Case selection and research questions

This study selects State Grid Corporation of China (hereinafter “State Grid”) and its subsidiary Yingda Carbon Asset Management Company (hereinafter “Yingda”) as representative cases of the FLCF model, based on three key considerations:

  1. (1)

    Industry leadership and scale advantages. As the world’s largest utility, State Grid operates a transmission network spanning 1.2 million kilometers, which is 16 times larger than Europe’s Terna and 14 times the size of the U.S. PJM grid. This network transmits more than 80% of China’s electricity. Given that the power sector accounts for 41% of China’s energy combustion emissions, it represents a crucial decarbonization player. Yingda serves as a specialized carbon finance platform that complements State Grid’s advantages, making this an ideal FLCF case study. (2) Typical challenges in supply chain decarbonization. Approximately 90% of State Grid’s 50,000 suppliers are SMSs, which typically face challenges such as limited scale, weak creditworthiness, and financial instability (Carbó-Valverde et al. 2009; Wellalage et al. 2019). These constraints result in elevated green financing costs and insufficient investment in emission reduction technologies (Iannamorelli et al. 2024), exemplifying the global “SMSs financing barrier” in supply chain decarbonization (Jun & Ran 2024). (3) Innovative demonstration model. The carbon finance solutions co-developed by State Grid, Yingda, and financial institutions adopt a tripartite collaborative model where core firms take the lead, financial institutions provide empowerment, and SMSs participate. This model aligns with international practices such as Walmart’s Gigaton PPA Program and Apple’s Supplier Clean Energy Program. Such firm-led innovations significantly reduce SMSs’ financing costs and establish scalable decarbonization pathways, offering a replicable Chinese solution for global supply chain low-carbon transitions.

These characteristics make the case ideally suited to address our core research questions: RQ1: What specific governance functions do core firms perform in enabling the FLCF model? RQ2: Does the FLCF model effectively promote CER among SMSs, and through what pathways?

Data collection

The research team conducted a six-year longitudinal study (2018 to 2024) to evaluate State Grid’s carbon finance initiatives. The study employed a stratified sampling framework, encompassing three stakeholder tiers: strategic decision-makers, business implementers, and ecosystem collaborators, all participants meeting a minimum of 12 months of direct involvement in relevant projects. Data collection comprised three modalities: semi-structured executive interviews, open discussions with operational teams, and thematic focus groups. Prior to data collection, all participants received detailed explanations of research objectives and written assurances of privacy/confidentiality protections. Informed consent was obtained through a standardized protocol: respondents reviewed and signed documentation before audio recording commenced.

Interview sessions were administered predominantly in person at corporate facilities, with supplementary sessions conducted via encrypted videoconferencing platforms. Each session (60–180 min) involved 4–8 research team members and utilized a triangulated documentation system (synchronized audio recordings, real-time transcripts, and researcher field notes). To ensure data quality, we implemented: (1) 24-h post-interview ambiguity resolution, (2) follow-up interviews (mean = 2.3 sessions per participant), and (3) standardized transcription (198,000 words). Analytical rigor was ensured through theoretical sampling, longitudinal case tracking, and methodological triangulation, integrating policy documents, corporate disclosures, and nonverbal datasets. Complete participant demographics and session metadata are systematically tabulated in Table 2.

Table 2 Semi-structured interview record.

Interview guide

The interview design follows a “theory, practice, and methodology” tripartite framework to ensure research validity through three key steps:

First, in the theoretical mapping phase, we operationalize the three core functions of supply chain stewardship (credit intermediation, service intermediation, and value intermediation) into specific interview questions. The credit intermediation dimension examines data governance (e.g., “Architecture design and implementation of the 5E supply chain system” and “Support policies for SMSs digital transformation”). The service intermediation dimension explores carbon credit mechanisms (e.g., “Underlying logic and operational paradigm of carbon accounts”). The value intermediation dimension analyzes financial incentives (e.g., “Dynamic correlation between carbon ratings and financing costs”). This theoretically grounded approach ensures conceptual alignment between research questions and the theoretical framework.

Second, we employ a pyramidal interview framework consisting of three levels: foundational facts (e.g., “Core modules of digital platforms”), mechanism analysis (e.g., “Technical pathways for carbon credit scoring models”), and strategic impact (e.g., “Sustained effects of carbon finance on SMS emission reduction”). Each level incorporates 3–5 STAR-based follow-up questions, forming a complete questioning chain.

Finally, we design dynamic adjustment mechanisms for different respondents: strategic value analysis for decision-makers (e.g., “Synergies between carbon finance and low-carbon supply chain transformation”) and implementation pathways for executives (e.g., “Conversion algorithms from operational data to carbon credits”). Parameter-anchored questions (e.g., “Quantitative relationship between carbon ratings and financing costs”, “Cluster analysis of supplier behavioral characteristics”) yield essential parameters for reinforcement learning modeling.

Data analysis procedures

This study employed MAXQDA 20 software to conduct a five-stage qualitative data analysis process. The analytical procedure comprised five sequential phases: data compilation, deconstruction, reassembly, interpretation, and conclusion. Initially, interview and focus group recordings were transcribed verbatim. Subsequently, coding was applied to categorize textual data and identify emerging themes. Thereafter, hierarchical relationships between codes and categories were established to develop a thematic framework. In the next phase, theoretical interpretations were derived based on the established coding system. Finally, findings were validated through iterative analysis until theoretical saturation was achieved.

Quantitative component

This study employs a multiagent reinforcement learning framework to conduct a quantitative evaluation of both the BLCF and FLCF models. The analytical framework integrates two interconnected modules: The Supply Chain Modeling Module simulates multistakeholder interactions within the supply chain ecosystem, incorporating the core firm (State Grid as a representative case), SMSs, and financial institutions. Notably, the FLCF model implementation includes a specialized carbon information platform (Yingda Carbon Asset Management) for systematic CER performance assessment. The deep Q network Optimization Module utilizes deep Q network algorithms to optimize SMSs’ operational decision-making, facilitating dynamic CER strategy adaptation in response to both financial incentives and order allocation parameters.

The semi-structured interviews were conducted with the state grid and Yingda

Based on supply chain stewardship theory, this section systematically deconstructs the tripartite intermediary roles (information, service, and value) of core firms in the FLCF model, thereby addressing RQ1 in full. It also elucidates the pathways and effectiveness of the FLCF model in promoting CER among SMSs (RQ2). Specifically: (1) it identifies core firms’ governance functions as information intermediaries in carbon data collection/verification, behavioral assetization, and credit accumulation; (2) analyzes their service intermediary role in driving CER commitments, establishing synergy mechanisms, and building ecosystems; and (3) explicates their value intermediary function in extending economic benefits and network value. The coded interview data (Tables 35) and operational details (Appendices A1 and A2) provide empirical foundations for subsequent reinforcement learning simulations.

Table 3 Coding and evidence display of core firms as information intermediaries.
Table 4 Coding and evidence display of core firms as service intermediaries.
Table 5 Coding and evidence display of core firms as value intermediaries.

Core firm as an information intermediary

The carbon credit mechanism is the key foundation enabling the FLCF model to drive CER in supply chain SMSs. Research shows that under the FLCF model, core firms (e.g., State Grid and its subsidiary Yingda) serve as information intermediaries, operating through three mechanisms: (1) CER data collection and verification, (2) assetization of CER behaviors, and (3) carbon credit accumulation support (see Table 3 for details). Over 80% of interviewees emphasized that carbon information platforms (e.g., Yingda) significantly lower SMSs’ green financing costs.

In carbon data collection and verification, the FLCF model outperforms the traditional BLCF model. State Grid utilizes its 5E digital platform (see Appendix A1 for architecture and details) to standardize monitoring and enable real-time carbon footprint tracking across SMSs’ lifecycle (procurement, production, manufacturing, and logistics). One interviewee noted: “State Grids’ real-time carbon data allow banks to objectively assess SMSs’ low-carbon transition.”

For assetization of CER behaviors, State Grid employs a multidimensional SMS carbon performance evaluation system (see Appendix A2 for indicators and details), converting CER behaviors into tradable carbon assets. The system evaluates six indicators: basic firm information, production and operation data, energy consumption levels, carbon management performance, environmental protection performance, and social credit records. This enables data-driven financing pricing.

In carbon credit accumulation support, State Grid develops a systematic carbon credit infrastructure and a scoring-based rule system, generating precise SMS credit profiles. This approach has proven effective: by 2023, State Grid’s supply chain SMSs secured $656.2 million in CER financing, representing a 217% annual increase.

Semi-structured interviews confirm the FLCF model’s institutional advantage over the BLCF model. While the BLCF model struggles with information asymmetry and CER performance validation, the FLCF model leverages core firms’ data-backed carbon assessment reports. This mechanism provides dual benefits: (1) It empowers SMSs with credible carbon certification, facilitating bank financing. (2) It equips banks with reliable decision-making tools, shifting credit evaluation from entity-based to behavior-based assessment while strengthening risk control.

Core firm as a service intermediary

The value creation of resources stems not from the resources themselves but from the management capabilities of scarcity, irreplaceability, and inimitability. As information intermediaries, core firms mitigate information asymmetry between banks and SMSs, subsequently evolving into service intermediaries that promote CER through dynamically matched carbon financial services. This is manifested in the construction of three core capabilities: CER commitment-driving capability, CER synergy-connecting capability, and CER ecosystem-building capability (see Table 4 for details).

The CER commitment-driving capability establishes a continuous behavioral shaping mechanism guided by a standardized system. Utilizing a carbon rating framework, the core firm classifies SMSs into five tiers (gold, silver, bronze, standard, and non-green) and provides tiered financing incentives. This creates a positive feedback loop wherein improved CER performance reduces financing costs for SMSs.

The CER synergy-connecting capability facilitates collaborative CER actions across supply chain members. By leveraging production and transaction relationships and trust capital within the supply chain network, the core firm employs a “5E” digital system to collect real-time operational data from SMSs. This enables banks to integrate carbon ratings with financial metrics in credit evaluations, shifting from collateral-based assessments to carbon behavior-driven financing solutions.

The CER ecosystem-building capability integrates stakeholder value networks. For example, State Grid implements a dynamically updated “whitelist” system for SMSs, incorporating nonfinancial indicators (e.g., carbon credit ratings and historical order fulfillment records) into bank credit assessments. Whitelisted SMSs receive interest rate discounts (20–50 basis points) and incremental credit lines, substantially enhancing their CER motivation.

Ultimately, the core firm’s value creation stems from bidirectional empowerment: for SMSs, it enables precise carbon financial service matching via information advantages, while for banks, it reduces service costs and risks via supply chain network integration.

Core firm as a value intermediary

Leveraging the collaborative advantages of supply chain networks, the core firm effectively integrates all participants under the FLCF model, thereby extending both the economic value of CER and the collaborative potential of network relationships (see Table 5 for details).

In extending the economic value of CER, this model generates tangible economic benefits through improved operational efficiency and enhanced brand value. Beyond providing technical CER support to SMSs, it establishes a value conversion mechanism that transforms environmental benefits into economic gains. Yingda’s collaboration with media platforms to promote carbon-rated SMSs significantly boosted their market recognition, with one firm reporting a 15–20% sales growth through increased green procurement orders.

In expanding network relationship value, the model restructures supply chain networks around CER, creating value through two mechanisms: financial network expansion (which reduces information asymmetry between financial institutions and CER active SMEs) and deeper strategic collaboration. Interviewed executives noted that joint CER planning with the core firm secured preferential financing while fostering technology sharing mechanisms.

Table 5 analysis reveals the core firm’s dual role as a value intermediary: its carbon platform quantifies SMSs’ environmental contributions into economic value while aligning stakeholder interests to form a synergistic value network.

Reinforcement learning modeling approach

The quantitative analysis integrates two core components: (1) a supply chain modeling module that simulates BLCF/FLCF financing scenarios to analyze CER behavior impacts, and (2) a deep Q network-based optimization module that employs neural networks to derive optimal CER strategies (Oroojlooyjadid et al. 2022; Shakya et al. 2023). This combined approach enables data-driven decision-making for effective supply chain carbon reduction.

Supply chain modeling module

The BLCF multi-period supply chain system consists of three key actors: (1) a downstream core firm (e.g., State Grid), (2) multiple upstream SMSs with heterogeneous initial CER levels, and (3) a financial institution. In contrast, the FLCF model introduces a carbon information platform (e.g., Yingda) as an additional component to this basic structure.

The basic assumptions of the BLCF

The BLCF operational process can be conceptualized as four distinct stages: procurement, financing, production, and sales, as illustrated in Fig. 2.

  1. (1)

    Market Environment: Consumers exhibit heterogeneous environmental awareness levels that substantially influence their purchasing decisions according to SMSs’ CER performance (Frik and Mittone 2019). The minimum CER requirement for consumer \(j\) is defined as \({{cog}}_{j}\), representing their environmental awareness threshold. Each consumer \(j\) contributes one unit of potential demand per period. Let \({{PC}}_{i,t}\) denote consumer \(j\)’s purchase decision from the core firm in period \(t\), which is determined by comparing the group CERs (\({E}_{t}\),) with \({{cog}}_{j}\), described as follows: \({{PC}}_{i,t}=\left\{\begin{array}{c}1,{E}_{t}\ge {{cog}}_{j}\\ 0,{E}_{t} < {{cog}}_{j}\end{array}\right.\).

    The market comprises \(M\) consumers with aggregate demand \({Q}_{f,t}={\sum }_{j=1}^{M}{{PC}}_{i,t}\). We classify consumers into three distinct groups by environmental awareness: (1) non-environmentally conscious (\({\eta }_{1}\)), (2) moderately environmentally conscious (\({\eta }_{2}\)), and (3) highly environmentally conscious (\({\eta }_{3}\)), where \({\eta }_{1}\) + \({\eta }_{2}\) + \({\eta }_{3}\) = 1.

  2. (2)

    Procurement stage: (i) Order Allocation Mechanism: Operationally, the core firm determines SMSs’ order proportions based on both wholesale prices (\({w}_{i,t}\)) and carbon emissions (\({\theta }_{i,t}\)). For each SMS \(i\), the composite score is calculated as: \({{score}}_{i,t}=\alpha \frac{{p}_{0}-{w}_{i,t}}{{\sum }_{i=1}^{N}\left({p}_{0}-{w}_{i,t}\right)}+\beta \frac{{\theta }_{i,t}}{{\sum }_{i=1}^{N}{\theta }_{i,t}}\). Where \(\alpha\) and \(\beta\) are price and emission weighting factors, respectively. The normalized score determines the allocation proportion \({\sigma }_{i,t}\).

    (ii) Demand Forecasting: Downstream core firm uses market analytics to predict period-\(t\) demand \({\widetilde{Q}}_{f,t}\). The aggregate CER performance \({E}_{t}\) is computed as the weighted average: \({E}_{t}={\sum }_{i=1}^{N}{\sigma }_{i,t}{\theta }_{i,t}\). Demand is then formulated using environmental awareness thresholds: \({\widetilde{Q}}_{f,t}=\left\{\begin{array}{c}\left\lfloor {\eta }_{1}M\right\rfloor ,{E}_{t}=0\\ \left\lfloor {\eta }_{1}M+{\eta }_{2}M\frac{{E}_{t}}{e}\right\rfloor ,{0 < E}_{t}\le e\\ \left\lfloor {\eta }_{1}M+{\eta }_{2}M+{\eta }_{3}\left(\frac{{E}_{t}-e}{{\theta }_{0}-e}\right)\right\rfloor ,{e < E}_{t}\le {\theta }_{0}\end{array}\right.\). Here, \({E}_{t}=0\) indicates no consumer environmental awareness, while \({0 < E}_{t}\le e\) and \({e < E}_{t}\le {\theta }_{0}\) represent low and high awareness levels, respectively.

    (iii) Order Quantification: Assuming a unit component-to-product ratio, the order quantity for SMS \(i\) in the period \(t\) is \({q}_{i,t}={\sigma }_{i,t}{\widetilde{Q}}_{f,t}\).

  3. (3)

    Financing Stage: We posit that all production costs for the supplier \({S}_{i}\) are financed through the BLCF model. Due to the bank’s limited access to reliable CER data (Wang et al. 2019), they apply uniform elevated financing rates \({r}_{b}\) across all SMSs.

  4. (4)

    Production stage: Each SMS \(i\) generates baseline carbon emissions \({\theta }_{0}\) per component, reducible through CER investment (Liu and Tyagi 2017; Roberts et al. 1999). The total unit cost \({c}_{i,t}\) in period \(t\) is: \({c}_{i,t}={c}_{i}+{c}_{i,t}^{{\prime} }\) where \({c}_{i,t}^{{\prime} }=\frac{1}{2}k{\theta }_{i,t}^{2}\). Here, \({c}_{i}\) denotes fixed production cost, while \(k > 0\) represents the CER coefficient. Adopting a cost-plus pricing strategy with a fixed margin \({\mu }_{i}\), subject to a government price cap \({p}_{0}\), the wholesale price \({w}_{i,t}\) is: \({w}_{i,t}=\left\{\begin{array}{c}{c}_{i,t}+{\mu }_{i},{if}{c}_{i,t}+{\mu }_{i} < {p}_{0}\\ {p}_{0},{otherwise}\end{array}\right.\). The dual-condition pricing ensures regulatory compliance while preserving profit margins.

  5. (5)

    Sales stage: Upon revealing consumer demand \({Q}_{f,t}\), revenue streams are determined as: \({\pi }_{i,t}^{s}={q}_{i,t}({p}_{i,t}-{c}_{i,t})\) and \({\pi }_{f,t}=\left\{\begin{array}{c}{\widetilde{Q}}_{f,t}{P}_{f}-{\sum }_{i=1}^{N}{p}_{i,t}{q}_{i,t},{\widetilde{Q}}_{f,t}\le {Q}_{f,t}\\ {Q}_{f,t}{P}_{f}-{\sum }_{i=1}^{N}{p}_{i,t}{q}_{i,t},{\widetilde{Q}}_{f,t} > {Q}_{f,t}\end{array}\right.\), where \(\lambda\) denotes the commodity residual value rate.

Fig. 2
figure 2

Schematic diagram of the BLCF process.

The basic assumptions of the FLCF

The FLCF financing process comprises four sequential stages: (1) procurement and carbon assessment, (2) financing, (3) production, and (4) sales. As depicted in Fig. 3, the production and sales stages are consistent with the BLCF framework.

  1. (1)

    Procurement stage: During order allocation, the core firm evaluates each SMS using three key parameters: the wholesale prices \({w}_{i,t}\), the SMSs’ current carbon emissions \({\theta }_{i,t}\), and historical CER actions \({\varepsilon }_{i,t}\). Following the BLCF approach, an allocation score is computed as: \({{score}}_{i,t}=\alpha \frac{{p}_{0}-{w}_{i,t}}{{\sum }_{i=1}^{N}\left({p}_{0}-{w}_{i,t}\right)}+\beta \frac{{\theta }_{i,t}}{{\sum }_{i=1}^{N}{\theta }_{i,t}}+\gamma \frac{{\varepsilon }_{i,t}}{{\sum }_{i=1}^{N}{\varepsilon }_{i,t}}\), where \(\alpha\), \(\beta\) and \(\gamma\) represent the normalized weights for price, emissions, and CER history, respectively, satisfying \(\alpha +\beta +\gamma =1\). The final allocation proportion \({\sigma }_{i,t}\) is determined through score normalization.

  2. (2)

    Carbon assessment stage: The carbon information platform assigns carbon scores based on historical CER performance, enabling SMSs to secure differentiated, lower lending rates. Simultaneously, SMS pays carbon verification fees to the bank \(B\). Using real operational data from State Grid and Yingda, our model incorporates historical carbon emissions as an accumulated variable \({\varepsilon }_{i,t}\), where: \({\varepsilon }_{i,t}=\varSigma\)(previous periods’ carbon score adjustments). \(\bigtriangleup {\varepsilon }_{i,t}\) = \(\xi\) if CER decrease in period \(t\); \(\bigtriangleup {\varepsilon }_{i,t}\) = \(-\xi\) if CER increase in period \(t\); \(\bigtriangleup {\varepsilon }_{i,t}\) = 0 if CER remains unchanged.

    The carbon credit rating \({\delta }_{i,t}\) for each SMS combines current CER performance \({\theta }_{i,t}\) and historical CERs behavior \({\varepsilon }_{i,t}\) through: \({\delta }_{i,t}=\varphi {\theta }_{i,t}+\rho {\varepsilon }_{i,t}\), with \(\varphi +\rho =1\), where \(\varphi\) and \(\rho\) represent the platform’s weighting of current and historical CER performance.

  3. (3)

    Financing stages: Under FLCF, banks offer tiered interest rates based on carbon credit ratings, categorized as: \({r}_{i,t}={r}_{b}\), if \({\delta }_{i,t} < {\underline{\delta }}_{i,t}\), \({r}_{i,t}={r}_{f}\), if \({\delta }_{i,t} > {\bar{\delta }}_{i,t}\), otherwise decreasing linearly between thresholds.

Fig. 3
figure 3

Schematic diagram of the FLCF process.

Deep Q network module

The deep Q network-based CER decision framework formalizes the interaction between SMSs and core firms through a tuple (\(S,A,P,R\)). The joint state space \(S=\{s={(s}_{1},\ldots ,{s}_{N})\}\) captures real-time CER levels of \(N\) SMSs, while the action space \(A=\{a=\left({a}_{1},\ldots ,{a}_{N}\right)\}\) defines emission adjustment options (increase/decrease/maintain) per sales cycle. Each joint action \(a\in A\) triggers state transitions \(s\to {s}^{{\prime} }\) governed by probability \(P({s}^{{\prime} }{|s},a)\) and generates rewards \(R(s,a)\) proportional to core firms’ order volumes. The objective is to learn an optimal policy \({\pi }^{* }\): \(S\to A\) that maximizes the discounted profit \({\pi }^{* }={\arg }{\max }_{\pi }{\sum }_{t=1}^{T}{\gamma }^{t}{E}_{\pi }[{R}_{t}^{\pi }]\).

The algorithm architecture employs dual deep Q networks: an online network for immediate Q-value estimation Q(s, a|θ) and a target network Q(s, a|θ*) with periodic synchronization through parameter updates \({\theta }^{* }\leftarrow \tau \theta +(1-\tau ){\theta }^{* }\). Experience replay buffers historical transitions (\({s}_{t},\mathop{{a}_{t}}\limits^{ \rightharpoonup },{r}_{t},{s}_{t+1}\)) to break temporal correlations via randomized minibatch sampling. Action selection follows an epsilon-greedy strategy to balance exploitation of current knowledge and exploration of new CER decisions.

Decision cycles progress through three phases (as shown in Fig. 4): (1) Initialization: Q network parameters guide initial CER targets and sales predictions; (2) Execution: Real-time emission adjustments update operational states, with transition data logged for offline learning; (3) Optimization: Carbon finance mechanisms regulate CER performance while prioritized experience replay and target network updates drive Q function convergence through temporal difference minimization: \({Q}_{i}\left(s,a\right)\leftarrow {Q}_{i}\left(s,a\right)+{\alpha }_{i}[{r}_{i}+\beta \cdot \max \left\{{a}^{{\prime} }\right\}{Q}_{i}^{{\prime} }\left({s}^{{\prime} },{a}^{{\prime} }\right)-{Q}_{i}\left(s,a\right)]\). This dual timescale process ensures policy adaptability to dynamic market conditions while maintaining learning stability (see Appendix A3 for implementation details).

Fig. 4
figure 4

Diagrammatic representation of the deep Q network algorithm structure.

Verification experiments and extended analysis based on reinforcement learning

Using real-world data from State Grid and Yingda, we validate the research questions raised in the semi-structured interviews and further support the conclusions through sensitivity analyses.

We investigate SMS equilibrium behavior under the BLCF/FLCF model using numerical simulations. Three SMSs are initialized with their initial conditions (ICs) characterized by CER values randomly sampled from three distinct ranges: [0,5] (low type), [15,25] (medium type), and [45,55] (high type). For each range, we generate 1000 independent IC sets, with each set undergoing 1000 iterations to reach steady state. The results demonstrate consistent equilibrium patterns across all trials, confirming the robustness of our findings. The complete datasets and computer code supporting these findings are included in the supplementary materials.

Validation of the research questions

Our results are consistent with existing quantitative evidence on BLCF and qualitative studies on FLCF, confirming both models’ efficacy in promoting CER among SMSs. By classifying SMSs based on initial CER levels, we find that BLCF benefits low/medium-type SMSs but reduces high-type performance (Research finding 1), whereas FLCF consistently enhances CER across all tiers through carbon finance incentives (Research finding 2). Notably, while both models generate economic value across multitier supply chain networks, yet FLCF’s systemic approach yields more substantial and sustainable value amplification (Research finding 3).

CER of SMSs under the BLCF model

Research finding 1

Our study of 1000 ICs under the BLCF model reveals that SMSs exhibit significant heterogeneity in CER performance, yet ultimately converge toward an approximate equilibrium state. As illustrated in Fig. 5, the total CERs in both IC(1) and IC(2) scenarios increased from initial values of 71.8 and 72.7 to equilibrium levels of 102.9 and 103.6, respectively. Specifically, while low and medium-type SMSs exhibit sustained improvement trends, high-type SMSs show performance deterioration.

Fig. 5
figure 5

Trends in CERs of three SMSs under BLCF.

From a perspective of cost and benefit, the relationship between carbon abatement investment costs and emission reductions displays strictly convex characteristics, with marginal abatement costs increasing monotonically with CER growth. Consequently, the carbon abatement utility function exhibits strictly concave properties. Notably, high-type SMSs encounter a superlinear increase in marginal costs. Under the BLCF model’s uniform financing pricing mechanism, the financing scale required by high-type SMSs results in a nonmonotonic decline in their cost–benefit ratio, creating a characteristic “emission reduction trap.”

Further supply chain dynamic game analysis reveals that when an SMS’s CER approaches a critical threshold, its profit margin experiences dual compression. Although the core firm’s order quantity (\({\widetilde{Q}}_{f,t}\)) is positively correlated with the SMS’s CER level (\({\theta }_{i,t}\)), the concomitant increase in component wholesale prices (\({w}_{i,t}\)) negatively affects order quantities. This dynamic ultimately compels high-type SMSs to lower their emission reduction levels to maintain equilibrium.

CER of SMSs under the FLCF model

Research finding 2

Our analysis of 1000 distinct ICs under the FLCF model revealed a significant convergence pattern: SMSs consistently reached a stable CER equilibrium after 1000 iterations, irrespective of their initial CER levels. Notably, this equilibrium level consistently surpassed the initial CER values of even high-type SMSs. The empirical data demonstrate this phenomenon clearly: under IC(1) conditions, aggregate CERs increased from 65.7 to 208 at equilibrium, while under IC(2) conditions, they rose from 64.6 to 199 (as shown in Fig. 6).

Fig. 6
figure 6

Trends in CERs of three SMSs under FLCF.

This counterintuitive phenomenon reveals that core firms under the FLCF model systematically optimize SMSs’ CER through their supply chain stewardship. Specifically, first, as information intermediaries, core firms establish digital platforms for carbon data governance. By implementing standardized data collection, multitier verification, and dynamic credit assessment, they transform fragmented CER behaviors into quantifiable credit assets, reducing information asymmetry while providing verifiable data for carbon financing decisions. Second, as service intermediaries, they develop an integrated “assessment, financing, and empowerment” framework. Through tiered incentives (e.g., carbon rating systems) and dynamic resource allocation (e.g., whitelists), they align financial resources with CER needs, creating a virtuous cycle of “CER improvement-financing cost reduction-investment growth.” Third, as value intermediaries, they create market-based mechanisms to convert environmental value into economic value. Institutional designs like carbon-linked procurement preferences transform CER performance into brand premiums and innovation benefits, motivating SMSs to develop low-carbon competitiveness proactively.

Profit comparison between BLCF and FLCF

To ensure comparability between BLCF and FLCF models, we maintain constant values for key parameters (\(N=3\), \(\alpha =0.6\), \(\beta +\lambda =0.4\), \(\gamma =0.2\), \({\eta }_{1}=0.5\), \({\eta }_{2}=0.3\), \({\eta }_{3}=0.2\), \({r}_{f}=3.5 \%\),\({r}_{b}=8 \%\)). Through randomized generation of three distinct initial CER value sets for SMSs, we conduct a comparative analysis of equilibrium states across both models. The comparative results are presented in Table 6.

Table 6 Comparative analysis of between BLCF and FLCF.

Research finding 3

The research results demonstrate that when SMSs in the supply chain reach equilibrium states in CER, both FLCF and BLCF models enhance overall economic performance. However, the FLCF model demonstrates superior incentive efficacy compared to BLCF. As presented in Table 6, the FLCF model: (1) Significantly improves environmental performance, with SMSs achieving over 50% reduction in total CER; (2) The strategy generates substantial economic benefits at the micro level, with SMEs achieving 30% profit margin expansion and core firms seeing synchronous 14% profit growth; (3) Enhances supply chain synergies, yielding an 18% improvement in overall supply chain profitability.

Extension of sensitivity analyses

To systematically validate the robustness of our findings and elucidate the operational mechanisms, this study extends beyond qualitative verification by conducting multidimensional analyses: First, we examine how fluctuations in bank interest rates, which are a core element of carbon finance, affect equilibrium outcomes (Research findings 4 & 5). Second, we investigate how order allocation preferences of core firms (Research findings 6 & 7) shape equilibrium results from an operational perspective. Finally, to further verify the reliability of our conclusions, we expand the heterogeneity of SMSs by increasing their numbers to six and nine, respectively (Research findings 8 & 9).

The impact of interest rate on SMSs’ CER decision-making under BLCF and FLCF

Under fixed parameters (\(N=3\), \(\alpha =0.7\), \(\beta =0.3\), \({\eta }_{1}=0.5\), \({\eta }_{2}=0.3\), \({\eta }_{3}=0.2\)), we evaluated BLCF’s CER performance across 5% to 9% interest rates (Table 7). Column 3 displays the mean initial CER values averaged across 1000 randomly generated IC scenarios. Column 5 reports the corresponding mean equilibrium CER values, obtained after performing 1000 training iterations for each IC scenario and then averaging the results.

Table 7 Impact of interest rate on SMSs’ CER decision-making under BLCF.

Research finding 4

(1) The equilibrium total CER values show insignificant differences across all tested interest rate levels in Column 5, indicating that interest rate adjustments under the BLCF framework have limited efficacy for influencing SMSs’ CER performance at equilibrium. (2) A comparative analysis of Columns 4 (initial total CER) and 5 (equilibrium total CER) reveals that the BLCF model effectively promotes CER among SMSs. These results provide robust evidence of the BLCF model’s efficacy, though further refinements are necessary to optimize its performance.

Under the FLCF model, this study maintains fixed core parameters (\(N=3\), \(\alpha =0.6\), \(\beta =0.2\), \(\gamma =0.2\), \({\eta }_{1}=0.5\), \({\eta }_{2}=0.3\), \({\eta }_{3}=0.2\)). Given capital cost constraints in banking practice, substantial adjustments to the interest rate floor (\({r}_{f}\) = 3.5%) are infeasible. Through controlled experiments systematically varying the ceiling rate (5%–9%), we analyze dynamic CER performance variations among SMSs, with comprehensive results presented in Table 8.

Table 8 Impact of interest rate on SMSs’ CER decision-making under FLCF.

Research finding 5

(1) Within the FLCF framework, banks demonstrate limited capacity to influence SMSs’ CER equilibrium through adjustments to financing rate ceilings.

(2) Column 5 analysis (average convergence iterations) demonstrates that more restrictive financing rate policies (with ceilings lowered from 9% to 5%) significantly accelerate steady state attainment, with optimal convergence speed observed at the 9% ceiling level.

This finding appears counterintuitive, as higher caps on financing rates are generally assumed to increase the financial burden on firms, thereby hindering their CER efforts. However, our results suggest that while elevated financing rate caps ostensibly raise borrowing costs for SMSs, they may also incentivize firms to accelerate CER initiatives. One plausible explanation is that stricter financing constraints motivate firms to actively pursue more efficient CER strategies. By reducing carbon emissions per unit, firms can lower financing costs more quickly, achieving an optimal balance between economic and environmental performance.

The impact of core firm order preferences on SMS CERs decision-making under BLCF and FLCF

On the procurement side, changes in SMSs’ CERs efforts are analyzed by examining the order allocation preferences of core firms under the BLCF, while holding other key parameters constant (\(N=3\), \({\eta }_{1}=0.5\), \({\eta }_{2}=0.3\), \({\eta }_{3}=0.2\)), as presented in Table 9.

Table 9 Impact of adjusting core firms’ order allocation weights (\(\alpha ,\beta\)) on SMSs’ CER decision-making under BLCF.

Research finding 6

(1) Situation 1: Moderate Preference for SMSs’ CERs. When core firms adjust their preferences for the CER levels of SMSs within a specific range (termed Situation 1), the impact on the stable equilibrium values of SMSs remains relatively minor. This observation is supported by the task first to fifth simulation results.

(2) Situation 2: Excessive Preference for SMSs’ CERs Beyond a Threshold. When core firms prioritize SMSs’ CERs beyond a certain threshold (Situation 2), SMSs with varying initial CER levels exhibit differing degrees of increase in their stable states (as demonstrated in the 6th–9th simulation results).

In Situation 2, where core firms place excessive emphasis on the CER levels of upstream SMSs, the following dynamics emerge: High-type SMSs gain a competitive advantage in procurement during the first sales cycle. Owing to their superior CER performance, they are more likely to secure additional orders in subsequent cycles. Although they incur higher production costs, the increased order volume compensates for this, ultimately yielding higher profits. Low and medium-type SMSs may attempt to enhance their CERs in later sales cycles to offset the initial revenue disadvantage. However, even after making such adjustments, they fail to close the gap with high-type SMSs.

Moreover, when core firms over-prioritize CER criteria in SMS procurement (Situation 2), the collective CER performance among SMSs may deteriorate relative to Situation 1. This decline in overall environmental standards could ultimately reduce consumer demand for end products, negatively impacting core firms’ profitability.

From a procurement perspective, under the FLCF model, while holding other critical parameters constant (\(=3\), \({\eta }_{1}=0.5\), \({\eta }_{2}=0.3\), \({\eta }_{3}=0.2\), \({r}_{f}=3.5 \%\),\({r}_{b}=7 \%\)). Table 10 demonstrates how shifts in core firms’ order allocation preferences affect SMSs’ CER performance.

Table 10 Impact of adjusting core firms’ order allocation weights (\(\alpha ,\beta ,\gamma\)) on SMSs’ CER decision-making under FLCF.

Research finding 7

Comparative analysis of simulation trial pairs (2nd/5th, 3rd/6th, and 4th/7th) reveals that core firms can effectively promote SMSs to superior equilibrium states. This elevation is achieved by prioritizing SMSs’ CER performance trajectories over their current CER levels in procurement decisions. Notably, this strategic emphasis simultaneously enhances profitability for core firms.

The impact of SMS quantity expansion on CER decision-making under BLCF and FLCF frameworks

Research finding 8

Regardless of their initial CER levels, SMSs consistently converge toward uniform CER outcomes after multiple iterations under the BLCF model. This convergence remains stable even when the number of SMSs increases. In our extended analysis, we tested configurations with 6 and 9 SMSs, respectively, and observed consistent equilibrium outcomes, as demonstrated in Fig. 7.

Fig. 7
figure 7

Trends in CERs for 6 and 9 SMSs under BLCF.

Research finding 9

Under the FLCF model, SMSs with diverse initial CER levels consistently achieve high equilibrium CERs after repeated iterations. This trend persists even as the number of SMSs within the FLCF model. Our findings were further validated by increasing the SMS count to 6 and 9, with simulation results detailed in Fig. 8.

Fig. 8
figure 8

Trends in CERs for 6 and 9 SMSs under FLCF.

These results demonstrate that the convergence and stability of CER outcomes are robust across different initial conditions and structural scales, reinforcing the reliability and generalizability of our research findings.

Discussion

Main research conclusions

Existing quantitative studies have confirmed the sustained positive impact of the BLCF model on the carbon emissions of SMS, while qualitative research has revealed the potential role of the FLCF model. However, how to quantitatively verify the effectiveness of the FLCF model in the absence of data and how to conduct a comparative analysis of different carbon finance models remain unresolved in prior research (Yu & Rehman Khan 2022; Eyo-Udo et al. 2024). To address these gaps, this study innovatively combines semi-structured interviews with reinforcement learning should be spelled out initially, leveraging real-world data from core firms to systematically explore the impact mechanisms and effectiveness of different carbon finance models. The main findings are as follows:

First, semi-structured interviews reveal that under the FLCF model, the core firm effectively promotes CER measures among SMS through the synergistic roles of credit intermediary, service intermediary, and value intermediary in supply chain management. This finding elucidates the core mechanism of core firms in the FLCF model.

Second, reinforcement learning simulations demonstrate that: (1) While the BLCF model improves the overall CER level of SMS, it leads to increased emissions from high-type SMS due to diminishing marginal returns; (2) In contrast, the FLCF model enhances CER across all SMS types, achieving a 50% higher total reduction than BLCF, demonstrating significant advantages.

Third, economic benefit analysis shows that the FLCF model creates a win-win outcome: SMS economic performance improves by 30%, core firms by 14%, and overall supply chain performance by 18%, validating its comprehensive value.

Fourth, sensitivity analyses indicate that bank interest rate fluctuations have a limited impact on total SMS CER, but under FLCF, moderately raising the interest rate ceiling helps SMS reach CER equilibrium faster.

Finally, tests on core firms’ order allocation preferences confirm that in the BLCF model, overemphasizing SMS CER performance negatively affects total reduction, whereas under FLCF, prioritizing SMS historical CER behavior more effectively enhances overall CER.

Managerial implication

This study provides several practical implications for management: First, in government policy-making, prioritizing the FLCF model is crucial. Since the FLCF model effectively improves both CER in SMSs and the overall supply chain performance, the government can: (1) Offer tax incentives and financial support to encourage core firms to establish carbon information-sharing platforms, and (2) strengthen carbon disclosure regulations, particularly by mandating listed companies and key firms to report supply chain carbon data, which helps standardize CER management for SMSs.

Second, core firms should enhance their carbon information platforms. Data show that when core firms consider SMSs’ past CER performance in orders, overall CER effects and economic performance improve. Thus, core firms should use carbon data to track SMSs’ progress and adjust supply chain carbon strategies for better results.

Third, banks should leverage carbon data to reduce information asymmetry. By analyzing data from core firms, they can design better carbon finance products for SMSs at different stages. Also, slightly increasing interest rate flexibility in these products can encourage SMSs to reduce emissions faster, which helps in product design.

Finally, for SMSs, both BLCF and FLCF models facilitate CER and economic gains. SMSs should actively join supply chain carbon finance systems, make full use of carbon finance tools, and improve their CER capabilities. Doing so enhances both environmental and market performance, supporting long-term competitiveness and sustainable development.

Conclusion

Summary of contributions

Theoretical contributions

This study develops a Supply Chain Stewardship theory framework for carbon finance and collaborative CER. Unlike previous fragmented ESG stewardship research (Dodd et al. 2024; Eyo-Udo et al. 2024), we systematically analyze carbon finance mechanisms. Results show the FLCF model’s core firm, through intermediation of information, service, and value, creates a unique long-term CER approach distinct from BLCF. Our findings fill a carbon finance behavior gap and reveal how resource integration and interest alignment simultaneously boost environmental and economic performance across SMS types.

Practical implications

This study achieves three key practical advancements: First, through comparing the dynamic CER effects of BLCF and FLCF, it empirically identifies their differential impacts. Notably, the FLCF model demonstrates significant CER enhancement across all SMS types, whereas BLCF shows only marginal benefits for low-/medium-type SMSs, thereby offering critical insights for formulating supply chain carbon neutrality strategies.

Second, the research establishes a methodological innovation by quantifying the FLCF model. Through developing a reinforcement learning based decision model, the study addresses the limitations of static analysis, enabling sensitivity analysis of dynamic factors, including core firm order preferences and bank financing rates. This approach provides a precise simulation tool to support carbon finance decision-making.

Third, the study pioneers the integration of semi-structured interviews with reinforcement learning within a mixed-methods framework. This combined qualitative and quantitative approach maintains theoretical depth while resolving traditional limitations related to data availability and long-term effect tracking, thus establishing a replicable methodological paradigm for future research. Particularly, the reinforcement learning algorithm successfully captures dynamic learning effects among supply chain actors, thereby mitigating the inherent constraints of cross-sectional data analysis.

Limitations and future research

This study has several limitations that offer valuable avenues for future research:

First, the sample selection presents certain constraints. While this investigation concentrates on representative anchor firms in China’s power sector (e.g., State Grid and Yingda), the generalizability of our findings to other industries and diverse national economic contexts requires further validation. This limitation stems from cross-country heterogeneity in industry characteristics (e.g., varying capital intensity and technology diffusion cycles) and market environments (e.g., differences in regulatory intensity, marketization levels, and energy structures). To strengthen external validity, subsequent studies could: (1) implement cross-industry comparative analyses incorporating samples from manufacturing, service, and other sectors; and (2) supplement primary data with comprehensive secondary data analyses to evaluate the efficacy of carbon finance models more thoroughly across different contexts.

Second, methodological enhancements are warranted. Our research innovatively integrates semi-structured interviews with reinforcement learning, effectively bridging existing gaps in understanding how heterogeneous SMSs respond to different carbon finance models while enabling systematic comparisons. However, inherent limitations of the deep Q network approach may introduce certain idealizations in model assumptions, potentially compromising result accuracy. Specifically, deep Q networks’ reliance on discrete state spaces may not fully capture the continuous nature of real-world carbon finance decision variables (e.g., nonlinear fluctuations in corporate carbon abatement costs), thereby limiting their capacity to model complex market dynamics comprehensively. Future methodological improvements could: (1) investigate more sophisticated reinforcement learning algorithms (e.g., Actor-Critic-based continuous control methods); and (2) perform rigorous robustness tests using real-world market data to enhance findings’ reliability.

Finally, several critical research questions merit deeper exploration. From an internal supply chain perspective, this study does not fully address: (1) how power asymmetries between SMSs and core firms might impact CER implementation; (2) the potential incentive effects of long-term contracts on low-carbon technology investments; or (3) the influence of heterogeneous consumer preferences for low-carbon products on market responses. Externally, the interplay between policy instruments (e.g., carbon tariffs) and market mechanisms (e.g., carbon trading) in shaping SMSs’ CER decisions requires further examination. Moreover, while our research focuses on BLCF and FLCF models, emerging paradigms such as platform-mediated carbon finance warrant investigation to expand the theoretical framework. Addressing these gaps would contribute to developing a more comprehensive analytical framework for supply chain carbon finance research.