Introduction

Recent studies highlight blockchain’s applicability across a wide array of domains, including financial innovation1, collaborative logistics in supply chains2, and decentralized digital forensics3, where secure and transparent record-keeping underpins processes ranging from automated payments to tamper-proof evidence management. In parallel, the research group led by Khan and colleagues has demonstrated blockchain’s potential in enterprise IoT4, fog/edge computing5, and 5G networking6, focusing on secure data sharing and system optimization at scale. The same core principles of decentralization, immutability, and consensus-driven validation are reflected in all these blockchain-driven solutions, underscoring the versatility of the technology in diverse operational landscapes.

As these advanced architectures increasingly hinge upon the correctness and security of their underlying smart contracts, robust contract testing and verification becomes paramount. Smart contracts have, indeed, revolutionized the blockchain ecosystem by enabling automated, decentralized, and transparent execution of contractual agreements. These self-executing contracts are designed to enforce the terms and conditions encoded therein, eliminating the need for intermediaries and reducing transactional cost7. Nevertheless, the ever-growing reliance on smart contracts also intensifies the potential impact of vulnerabilities, which can lead to severe financial losses and undermine the trust in blockchain technology8.

Smart contract testing currently faces three major hurdles: i) insufficient automation that requires extensive manual effort9, ii) limited test coverage in large or complex contract systems10, and iii) tools that focus on specific vulnerabilities but fail to address other security risks11. To tackle these issues, we introduce a fully automated smart contract testing suite (AGTS) designed for the Hyperledger Fabric platform. First, AGTS addresses the lack of automation by generating test code directly from smart contract requirements, significantly reducing manual workload. Second, it boosts coverage by systematically accounting for diverse execution paths, including edge cases often overlooked by conventional methods. Furthermore, by integrating advanced techniques such as symbolic execution and fuzz testing, AGTS identifies a more general and broader range of potential vulnerabilities than existing tools. Last but not least, through validation in real deployment environments, it ensures both reliability and efficiency in the development and testing processes.

AGTS is a framework where can be plugged into platform-specific testing logic. In this work, the Hyperledger Fabric platform is selected to demonstrate the versatility of the framework. Hyperledger Fabric12 is a prominent permissioned blockchain framework that supports the implementation of complex business processes across multiple organizations13. Its modular architecture, pluggable consensus protocols, and the support of private channels and data privacy make it a preferred choice for enterprise applications. The unique features of Hyperledger Fabric, such as the endorsement policy and transaction lifecycle, necessitate specialized testing methodologies to identify and mitigate potential vulnerabilities14.

In Hyperledger Fabric, each transaction must be endorsed by a specific set of endorsing nodes, which execute the chaincode and validate the transactions. This endorsement procedure, initiated by the Peer Chaincode Invoke (PCI) command, ensures that transactions are validated before being passed to the ordering service. The ordering service then distributes the transactions consistently across channel nodes. Once confirmed by enough number of nodes, the transaction is committed to the ledger, ensuring decentralized integrity and consistency. This system ensures that all transactions are consistently ordered and validated, maintaining the integrity and consistency of the ledger in a decentralized environment. To support testing across different blockchain platforms, e.g., Ethereum and Polkadot, AGTS encapsulates these platform-specific commands within its configurable headers, enabling easy and straightforward adaptation to varying syntax and environments. These settings can be structured as the necessary commands in AGTS Header, and if we want to implement AGTS on other platforms, their invocation syntax and environment configuration requirements need to be analyzed, then the AGTS header can be adjusted accordingly.

In our work, Docker is used to manage the testing environment for smart contract deployment on a blockchain network, specifically Hyperledger Fabric. By providing an isolated and consistent setting, containers simplify the deployment and testing process, enhancing reliability and aiding developers in verifying contract behavior before going live15. Furthermore, Docker commands control the network so that each test begins from a clean state. For example, the generated test files include commands to bring down the network and restart it with a new channel, eliminating any residual artifacts from previous deployments and preserving consistency throughout the testing phase. These tasks are implemented in the AGTS Code Generator, allowing the final generated scripts to automatically invoke the necessary Docker commands, thus maintain a standardized and isolated environment for executing smart contract tests.

AGTS integrates seamlessly into the Continuous Integration (CI) pipelines, leveraging those platforms such as GitHub Actions16. This integration ensures automated execution of the generated test suites upon each code commit, allowing immediate detection of regressions or vulnerabilities. The CI integration not only streamlines the testing process but also reinforces reliability and maintainability by continuously validating the correctness of smart contracts throughout the development cycle.

In the development and deployment of smart contracts, testing and verification are also the critical phases to ensure the correctness and security. Both static and dynamic analyses17,18 are employed to detect vulnerabilities, catch potential runtime issues, and maintain compliance with best practices19,20. Once scripts are generated, they undergo quality assurance via ShellCheck21, a widely respected tool that inspects code for common pitfalls, deprecated syntax, and other mistakes across multiple shell interpreters, including sh, dash, ksh, and bash.

When ShellCheck produces no warnings, errors, or suggestions, it indicates that the scripts adhere to best practices and contain no apparent flaws. This positive outcome validates our current scripting approach and underscores the effectiveness of our development processes. Next, the validated scripts are deployed on our continuous integration system, ensuring that each test runs in a consistent environment, further enhancing the overall reliability and maintainability prior to production release.

The key points of AGTS are highlighted below.

  • Docker provides an isolated and consistent test environment for smart contract deployment on Hyperledger Fabric. Generated test scripts invoke Docker commands automatically, ensuring each test starts from a clean state and maintains network consistency.

  • Following script generation, ShellCheck21 guarantees script quality by verifying adherence to scripting best practices.

  • Furthermore, AGTS is seamlessly integrated into a Continuous Integration (CI) pipeline, automating test execution with each code commit to rapidly identify potential regressions or vulnerabilities.

Through these comprehensive innovations, including full automation, broad test coverage, advanced vulnerability detection techniques, and robust CI integration, AGTS significantly enhances the reliability, efficiency, and maintainability of smart contract testing, ultimately promoting safer and more extensive adoption of blockchain technologies in practical enterprise scenarios.

AGTS presents a significant advancement in smart contract testing by fully automating test suite generation, comprehensively increasing test coverage, and integrating advanced analysis techniques. Compared with conventional approaches, AGTS delivers superior detection, drastically reduces manual intervention, and enhances maintainability via robust CI integration and rigorous script verification.

Related work

The detection of vulnerabilities in smart contracts often employs various techniques, including symbolic execution, abstract interpretation, fuzz testing, mutation testing, and evolutionary or data-flow testing22. These methodologies have been widely used to ensure the security and correctness of smart contracts on blockchain platforms like Ethereum and Hyperledger.

To effectively identify and mitigate vulnerabilities in smart contracts, researchers have proposed various advanced testing and analysis methodologies, each characterized by distinct strengths and application scenarios.

Symbolic execution Symbolic execution has proven effective in identifying vulnerabilities by systematically exploring potential execution paths in smart contracts. Luu et al. used symbolic execution in their tool named Oyente to assess the security of Ethereum smart contracts, flagging vulnerabilities in approximately 46% of 19,336 contracts analyzed. They verified the severity of these vulnerabilities through real world incidents, notably the DAO attack resulting in a $60 million loss in June 20167. Nikolic et al. proposed Maian, employing interprocedural symbolic analysis to detect critical vulnerabilities such as locked funds, leaked assets, and unintended contract termination. Their large-scale evaluation of nearly one million contracts revealed 34,200 vulnerable contracts, successfully reproducing severe exploits, including the Parity bug that locked $200 million worth of Ether8.

Static analysis Static analysis methods examine smart contracts without executing them, effectively detecting common and known logical vulnerabilities. Tikhomirov et al. introduced SmartCheck, an extensible tool that analyzed Solidity code through XML-based intermediate representation and XPath pattern matching, significantly streamlining vulnerability detection, although it had limitations with complex vulnerabilities23. Similarly, Feist et al. developed Slither, enhancing smart contract analysis via advanced data-flow and taint analysis on the SlithIR intermediate representation. Their evaluation demonstrated Slither’s effectiveness in terms of speed, robustness, and accurate vulnerability detection, although users were required the familiarity with SlithIR for comprehensive analyses24.

Fuzz testing Fuzz testing systematically identifies vulnerabilities by generating random input and observing contract behaviors during runtime. Chan et al. created ContractFuzzer, utilizing Ethereum’s ABI specifications and predefined oracles for efficient vulnerability detection. The approach detected over 459 vulnerabilities across 6,991 contracts, highlighting severe issues in critical contracts like DAO and Parity Wallet25.

Mutation testing Mutation testing introduces deliberate faults into code to systematically evaluate test-suite effectiveness. Wu et al. developed a comprehensive mutation testing framework specifically designed for Ethereum smart contracts. They introduced 15 novel mutation operators targeting blockchain-specific scenarios and conducted empirical evaluations on 26 contracts across four Ethereum decentralized applications (DApps). Their results demonstrated a defect detection rate of approximately 96.01%, significantly outperforming traditional coverage-based testing methods (55.68%), confirming the effectiveness of mutation testing in enhancing smart contract reliability26. Additionally, Wang et al. proposed specialized path-based test coverage criteria tailored explicitly for Ethereum’s unique transaction control-flow behaviors, such as whole-transaction basis path sets and bounded transaction interactions. Their case study showed improved testing effectiveness compared to random or statement-based testing criteria, underscoring the significance of path-based strategies for smart contract verification27.

Data-flow and evolutionary algorithms Significant research has been dedicated to data-flow testing and evolutionary algorithms (EA) aimed at enhancing test-case generation and software validation. Data-flow testing emphasizes variable definition-use associations within programs, providing valuable insights into control-flow accuracy and robustness. Ghiduk et al. compared genetic algorithms (GA) with the harmony search algorithm (HSA) through empirical analysis using the t-Test, concluding that HSA significantly reduced test generation time without compromising quality28. Ji et al. proposed an enhanced genetic algorithm integrated with particle swarm optimization (PSO), significantly surpassing baseline methods in test coverage, execution efficiency, and iteration count for smart contract data-flow testing29. Sheoran et al. utilized the artificial bee colony algorithm (ABC) to effectively generate test cases targeting unclear definition-use paths, providing an innovative approach to previously unexplored data-flow issues30.

Further studies highlighted the effectiveness of evolutionary algorithms: Nayak and Mohapatra demonstrated PSO’s superiority over GA by achieving complete (100%) definition-use coverage and faster execution times31. Singla et al. developed GPSCA, a hybrid GA-PSO technique incorporating node-dominance concepts, demonstrating superior performance in data-flow coverage compared to standalone GA or PSO methods32. Kumar et al. advanced evolutionary approaches by proposing a hybrid adaptive PSO-GA algorithm equipped with a specialized fitness function. This method significantly improved data-flow test generation outcomes compared to traditional GA, PSO, ant colony optimization (ACO), differential evolution (DE), and hybrid GA-PSO algorithms, validated through extensive benchmarks and real world software programs33.

Wang et al. addressed challenges in multi-objective evolutionary algorithms (MOEA), particularly related to difficult-to-approximate (DtA) Pareto front (PF) boundaries. They introduced a novel test-problem generator designed to create scenarios with controllable difficulty levels related to DtA PF boundaries. Through experiments with representative MOEAs—NSGA-II, SMS-EMOA, and MOEA/D-DRA, they identified limitations in existing algorithms, proposing a modified variant, MOEA/D-DRA-UT, which exhibited enhanced effectiveness. This highlighted the critical importance of rational computational resource allocation across different regions of the Pareto front34.

Despite these advancements, most research primarily targets traditional programming languages or Ethereum smart contracts. Zhang et al.’s ADF-GA was among the first evolutionary algorithm-based testing approaches explicitly tailored for Solidity-based Ethereum smart contracts, optimizing the definition-use pair coverage through enhanced genetic algorithms35. Nevertheless, current methodologies typically lack comprehensive automation, sufficient coverage for complex business logic, and adequate support for enterprise blockchain platforms such as Hyperledger Fabric. These limitations underscore the necessity of advanced automated, multi-metric testing frameworks tailored for enterprise-level blockchain environments, motivating our development of the Automated Smart Contract Testing Suite (AGTS).

Table 1 Comparison of representative smart contract testing tools/methods.

Table 1 summarizes the representative smart contract testing methods across multiple dimensions. Clearly, most existing tools predominantly focus on Ethereum contracts and exhibit limited automation, requiring substantial manual intervention, while simultaneously falling short in addressing complex business scenarios and multiple coverage criteria crucial for enterprise contexts.

Considering these limitations, we introduce the AGTS framework, explicitly designed to provide comprehensive automation, rigorous test coverage, and robust integration capabilities tailored to enterprise-level blockchain platforms such as Hyperledger Fabric. This approach effectively bridges the identified research gaps and significantly advances the state-of-the-art in smart contract testing methodologies.

Overview

Smart contract testing on enterprise blockchain platforms like Hyperledger Fabric encounters several challenges, including insufficient automation, unclear mapping from user requirements to test scenarios, and difficulty handling complex execution logic. To address these issues comprehensively, we propose an Automated Generation of Test Suites (AGTS) framework, explicitly designed to automate and streamline testing processes for smart contracts on Hyperledger Fabric.

The complete workflow of AGTS is depicted in Figure 1, outlining the integrated components and their interactions. The workflow consists of the following clearly defined stages.

Fig. 1
figure 1

Overview of the automated generation of test suites (AGTS) Workflow. AGTS translates user-defined smart contract requirements into structured test scripts via composition order generation and various automated testing methods. Smart contracts are independently developed via model-driven engineering36.

Prerequisites Gu and Ke37 present a novel type system based on coroutines that validates the correctness of requirement models and generate an accurate composition order. This method begins by deriving coroutine types for each contract in the requirement model, including their pre-conditions and post-conditions. Next, a set of synthesis rules is used to combine multiple coroutine types into a single coroutine type, modeling the final result of a sequence of operations. During the synthesis of coroutine types, the execution order of each coroutine is determined, forming the composition order. Finally, by hooking or listening to these synthesis rules, a potential execution order is formed for requirement validation, ensuring that the sequence diagrams in the requirement model match the generated execution order. The accuracy of the composition order generation method is validated through the CoCoME case studies and unit tests, demonstrating that the synthesis results of coroutine types are correct. Based on the comprehensive validation through case studies and unit tests, the composition order generation method based on the coroutine type system is confirmed to be accurate and reliable. Therefore, the composition order detailed in this work37 can be directly referenced as a critical input of AGTS.

Requirement parsing The workflow begins with Requirement Parsing, where user-defined smart contract requirements, usually described in natural language documents, are systematically analyzed and converted into structured models. This parsing stage involves key pre-processing techniques such as tokenization, syntax normalization, and entity extraction to accurately identify and formalize the contract’s required functionalities, parameters, preconditions, and postconditions. These structured requirement models serve as the essential input for generating composition orders and subsequently constructing test cases. It is important to note that the actual implementation of smart contracts themselves is not within the scope of AGTS. Instead, these smart contracts are independently generated via an external model-driven engineering method36. AGTS strictly focuses on test suite automation based on structured requirement input.

In practice, the level of detail and logical clarity in the requirement file significantly impact the coverage and precision of the resulting test suite. A more comprehensively specified requirement, featuring well-defined logic lines, explicit constraints, and finer granularity, tends to yield broader and more accurate test coverage. For instance, when requirements explicitly delineate stepwise preconditions or functional parameters, the parser can generate composition orders that systematically address these conditions, thereby achieving higher functional coverage in subsequent test execution.

Currently, AGTS processes one requirement file at a time. If multiple requirements need to be parsed and tested in a single pass, a more sophisticated approach would be required, potentially involving iterative or loop-based logic structures to handle parallel or interdependent requirements. Such an extension would enable the framework to combine test scenarios from different documents in a coherent manner, ensuring that cross-cutting functionalities and shared dependencies between separate requirements are faithfully captured in the final test suite.

Composition order generation In the Composition Order Generation stage, structured requirement models are transformed into explicit, logically ordered sequences of smart contract operations—referred to as the composition order. This stage uses coroutines, a computational technique enabling flexible, sequential modeling of interdependent operations. Specifically, coroutines systematically derive valid sequences of smart contract function calls based on well-defined preconditions and postconditions from the structured models.

For demonstration, we utilize well-established benchmark cases such as CoCoME (Common Component Modeling Example), ATM, LibraryMS, and LoanPS. Taking CoCoME as a representative example, we explicitly model the realistic transactional workflow, including opening the store, initializing cash registers, processing item selections, and completing purchase transactions. The coroutines ensure the resulting composition order strictly adheres to real world logical constraints, minimizing the risk of invalid sequences or overlooked interactions that could potentially lead to vulnerabilities.

The comprehensive and systematic nature of this approach ensures that generated test cases robustly cover critical execution paths and edge cases, significantly enhancing the reliability and compliance of the resulting smart contract tests.

Automated test suite generation The Automated Test Suite Generation stage is the core module of the AGTS framework, responsible for automatically translating user-defined requirements and the derived composition order into executable test scripts specifically tailored to Hyperledger Fabric.

This module starts by performing detailed preprocessing of the input requirements. Natural language requirement documents undergo structured parsing, which involves tokenization, keyword identification, and format standardization. Specifically, AGTS leverages regular expressions and keyword-based searches to accurately extract essential information such as class names, function signatures, input parameters, preconditions, and postconditions from the structured requirement models.

Once this structured data is obtained, AGTS constructs a detailed intermediate representation (IR), typically stored as structured JSON objects. This IR facilitates clarity, correctness, and straightforward generation of executable test scripts.

Furthermore, AGTS adopts a hybrid analysis approach combining static analysis which examines the smart contract’s structural integrity, identifying issues such as type mismatches, unused variables, and logical inconsistencies at compile-time; and dynamic analysis which employs advanced testing techniques, such as symbolic execution and fuzz testing, to explore runtime behaviors. Symbolic execution systematically explores diverse execution paths, ensuring high path coverage, while fuzz testing efficiently identifies vulnerabilities by systematically injecting various randomized input.

The entire automated process is highly efficient, with a linear runtime complexity over the number n of functions to test. Currently, we scan for each function throughout the entire contract of size m, resulting in a complexity of O(mn). It can be easily optimized by indexing on the contract to achieve a complexity of \(O\left( (n+m) \log m \right)\).

Smart contract integration and execution In this stage, the test scripts automatically generated by AGTS are systematically deployed and executed against independently developed smart contracts within the Hyperledger Fabric environment. Prior to execution, a clean and isolated blockchain environment is established using a Docker container, ensuring consistency and reproducibility for each test run. The generated scripts contain detailed sequences of transactions, precisely aligned with the structured requirements and the composition order, guaranteeing realistic and thorough functional coverage.

Transactions are executed through standard Hyperledger Fabric interfaces, e.g., peer chaincode invocation commands, simulating real world interactions across nodes. AGTS actively monitors the smart contract during execution, capturing detailed transaction metadata, including transaction IDs, timestamps, execution status codes, e.g., VALID, and response payloads. This comprehensive execution monitoring ensures that contract behaviors conform to predefined business rules and helps promptly identify deviations or anomalies at runtime.

The AGTS framework, implemented in C#, seamlessly integrates with enterprise-grade blockchain development environments, providing a robust, highly automated testing solution tailored specifically for Hyperledger Fabric. By systematically translating user requirements into executable test scripts and leveraging sophisticated static and dynamic analysis methodologies, AGTS significantly improves the accuracy, reliability, and security of enterprise-level smart contracts.

Looking forward, we aim to enhance AGTS by incorporating advanced methodologies such as machine learning-driven test case generation, further extending the automation capabilities and adaptability. Additionally, future expansions include compatibility with other blockchain platforms, e.g., Ethereum, Polkadot, and an improved user interface for broader accessibility and usability. Ultimately, AGTS provides a scalable foundation for rigorous and efficient smart contract testing, fostering trust and reliability across diverse blockchain applications.

Results

Overview of experimental setup

The effectiveness of the Automated Generation of Test Suites (AGTS) was evaluated in a blockchain-oriented experimental environment. The experimental environment included an Intel Xeon Silver 4210 CPU @ 2.20 GHz, 32 GB DDR4 RAM, running Ubuntu 22.04 LTS. Key software components included Docker Engine v24.0.6, Docker Compose v2.23.0, Hyperledger Fabric v2.5.8, ShellCheck v0.8.0, and shunit2 v2.1.9. The automated code generation process within AGTS was implemented using C# 8.0. Three representative smart contracts, CoCoME, LibraryMS, and LoanPS, were utilized to assess the performance and effectiveness of AGTS, for they were the well-known and standard test cases in model-driven engineering. Their requirement files clearly stated the precondition and postcondition of each function, which enabled thorough and rigorous testing.

Automated test case generation

The AGTS framework successfully automated the generation of functional test cases based on user-defined requirements: 34 test cases for CoCoME, 32 for LibraryMS, and 28 for LoanPS. Key test scenarios for CoCoME involved store initialization and transaction workflows. LibraryMS tests focused on user-book interactions, while LoanPS tests emphasized loan lifecycle events.

For the CoCoME system, we generated the Bash script containing 34 test cases, including essential operations such as creating stores, managing cash desks, adding products, and executing transactions. Each function was verified for basic execution and interaction with other contract components. For instance, executing a purchase required proper initialization of the store and cash desk, as well as the sufficient product stock. This sequential dependency testing validated the integrity of the contract’s workflow.

For the LibraryMS, we generated a Bash script containing 32 test cases, encompassing key functionalities such as registering library members, adding new books, borrowing and returning items, and managing overdue records. Each function underwent verification for correctness of execution as well as interactions among contract modules. For example, borrowing a book required the book to be available and the borrower to hold a valid membership, ensuring comprehensive validation of business logic and sequential dependencies within the contract.

For the LoanPS, we generated a Bash script consisting of 28 test cases, covering critical operations such as initiating loan requests, performing credit checks, approving or rejecting loans, and managing repayments. Each test case validated fundamental execution steps and their associated interactions within the smart contract. For instance, processing a repayment required prior successful loan approval and loan disbursement, demonstrating the effective coverage of complex sequential dependencies and integrity constraints in the loan lifecycle.

Table 2 summarizes the generated test cases and their coverage rates. The results clearly demonstrate AGTS’s capability to achieve comprehensive coverage of smart contract functionalities with minimal manual effort.

Table 2 Summary of automated test case generation results.

Script composition and execution

The framework ensured that smart contract functions were executed in the correct order, maintaining data integrity throughout the process. The composition order, derived from user-defined requirements, prevented unauthorized or out-of-sequence operations. Precondition and postcondition checks confirmed that each function correctly modified state variables and produced expected results, reinforcing adherence to business logic.

The generated test scripts were composed by AGTS and automatically executed through a Continuous Integration (CI) pipeline using GitHub Actions16, ensuring early detection of regressions. Mutation and regression testing validated robustness and stability. Detailed CI logs are available online (https://github.com/dancingBone79/TestCodeGeneration/tree/net80), and the results are shown in Table 3.

The framework employed mutation testing by introducing controlled changes to the contract code to evaluate the resilience of the test suite. This approach uncovered subtle defects and edge cases that might have been overlooked, emphasizing the importance of comprehensive test coverage. Regression testing confirmed that updates did not adversely affect previously validated functionalities, ensuring consistent contract performance over time.

Table 3 Continuous integration (CI) pipeline execution results.

Execution and validation on Hyperledger Fabric network

The generated scripts were executed on the Hyperledger Fabric network, successfully validating robust and correct interactions among smart contract components. Specifically, the scripts automated network initialization, including Docker container deployment and cryptographic material generation via cryptogen, the creation and configuration of a dedicated channel mychannel joined by peers from Org1 and Org2, and the full lifecycle management (packaging, installation, approval, and commitment) of the CoCoME v1.0 smart contracts. The repeated invocation of the test function testMakeNewSale consistently resulted in valid transactions (VALID status), each with a unique transaction identifier (txid). The overall execution completed without errors or exceptions, confirmed by the final message Ran 1 test. OK. A summary of these outcomes is provided in Table 4.

These results highlight AGTS’s practical capability to reliably deploy and validate smart contracts in realistic Hyperledger Fabric environments.

Table 4 Summary of script execution on hyperledger fabric network.

Shellcheck analysis and script quality

All generated scripts were subjected to static analysis using ShellCheck38, and passed without any errors or warnings, confirming adherence to best scripting practices. Table 5 shows the results.

Table 5 Shellcheck static analysis results.

The absence of errors and warnings confirmed the high quality and maintainability of the scripts generated by AGTS.

Automation efficiency and time saving

AGTS significantly reduced manual test scripting and overall testing duration compared to traditional methods. Efficiency improvements are quantified in Table 6.

While precise quantitative data on efficiency improvements are subject to further investigation, preliminary trials suggest that AGTS substantially reduces the time required for test preparation, typically achieving a reduction from hours or days (of manual effort) to approximately 20 minutes (of automation). These indicative estimates were based on the preliminary observations and will be validated through future systematic experiments.

Table 6 Comparison of test script preparation efforts.

This substantial improvement highlighted AGTS’s practical advantages in agile blockchain development environments.

Maintainability and reliability improvement

Automated generation, rigorous CI testing, and static analysis via ShellCheck significantly enhanced test script maintainability and overall reliability. Maintenance efforts required to adapt or fix scripts were drastically reduced due to automation and strict adherence to best practices.

This evaluation demonstrated AGTS’s strong automation capability, comprehensive functional coverage, defect detection effectiveness, and substantial improvements in test script reliability and maintainability. By significantly reducing manual effort and ensuring high-quality testing processes, AGTS represents a clear advancement in smart contract testing practices.

Defect detection

Although the primary emphasis is on functional coverage and reliability, testing with AGTS occasionally uncovered overlooked security weaknesses, such as missing input validations, potential reentrancy paths, and lacks access controls. While security analysis is not the core objective, these findings demonstrate that systematically generated tests can still surface critical vulnerabilities, underscoring the broader value of automated suite generation beyond purely functional concerns. In addition, AGTS identified between 10 and 15 defects per contract (with around 2–3 classified as critical). This outcome illustrated the framework’s capacity to expose nuanced oversights and enhance overall reliability, even without explicitly targeting security audits.

Summary of results

AGTS notably outperformed traditional manual and partially automated methods in automation capability, efficiency gains, and script maintainability. Manual test scripting typically involves considerable effort, high maintenance overhead, and lengthy test cycles spanning days. In contrast, AGTS automated the entire test-generation workflow, dramatically reducing preparation time and maintenance effort. Additionally, AGTS-generated scripts consistently passed rigorous static analyses (e.g., ShellCheck), indicating superior maintainability and fewer errors compared to manually developed scripts.

Discussion

The implementation of the Automated Generation of Test Suites (AGTS) framework provided substantial improvements to smart contract testing on Hyperledger Fabric, demonstrating enhanced automation, comprehensive test coverage, and robust error handling. By systematically translating structured requirements into executable test cases, AGTS combines multiple analysis techniques to efficiently identify vulnerabilities and verify smart contract functionalities. Nevertheless, certain limitations surfaced during this process, indicating clear directions for future improvements.

Strengths and achievements

Enhanced automation and efficiency

AGTS automates critical aspects of smart contract testing, including parsing requirements, extracting function definitions, and automatically generating test scripts with parameterized input. This automation effectively addresses the common challenge of manually analyzing smart contracts, particularly dealing with diverse data types such as strings, integers, and booleans, thereby significantly reducing manual effort and enhancing test coverage comprehensively.

Comprehensive test coverage through hybrid analysis

AGTS achieves thorough test coverage by integrating static analysis and dynamic testing methods such as symbolic execution and fuzz testing. Static analysis quickly identifies potential structural or logical issues within smart contracts, whereas dynamic analysis rigorously explores multiple execution paths and simulates realistic scenarios and edge cases. This combination ensures a holistic evaluation, capturing both obvious and subtle vulnerabilities, thus substantially enhancing the reliability of tested smart contracts.

Robust error handling and complex interaction validation

AGTS adeptly handles complex interactions and dependencies within smart contracts, affirming its reliability in validating functionalities and managing exceptions without compromising system stability. Through rigorous precondition and postcondition checks, AGTS confirms that each contract function accurately modifies state variables and produces expected outcomes. Real world inspired scenarios, such as the CoCoME purchase transaction workflow, further demonstrate the framework’s capability in handling challenging edge cases, including insufficient inventory or transaction interruptions, underscoring the robustness and accuracy of error management.

Seamless integration into continuous integration pipelines

Incorporating AGTS into Continuous Integration (CI) pipelines via GitHub Actions facilitates automatic and systematic testing upon every code commit. This continuous testing process ensures prompt identification and correction of regressions, maintaining high code quality and stability throughout iterative development cycles, which is crucial for dynamic, collaborative environments.

Validated execution and practical application

The practical effectiveness of AGTS was verified through successful deployment and execution of generated test scripts on a real world Hyperledger Fabric network. The framework accurately managed essential tasks such as network initialization, channel creation, smart contract lifecycle management, and chaincode invocation without errors. These successful executions validate both the correctness of script generation processes and the reliability of smart contracts under realistic operational conditions.

Limitations

Despite its strengths, several limitations were identified during AGTS implementation, suggesting clear avenues for further refinement.

Dependence on precise and complete specifications

AGTS relies heavily on user-defined requirement documents for parsing and generating test suites. In practice, however, real world specifications may be incomplete, ambiguous, or insufficiently detailed. Such gaps can compromise the accuracy of requirement parsing and reduce overall test coverage, potentially missing important edge cases. Although future iterations could include more robust error handling and iterative feedback loops to refine requirement data, a comprehensive solution remains an open challenge. Moreover, by leveraging an order list and the conditions constraints, AGTS does help streamline contract logic, ensuring that typical or moderately sized smart contracts are tested with clarity and efficiency. Nonetheless, no explicit benchmarks currently exist for truly large-scale contracts or highly intricate scenarios that involve thousands of functions and complex inter-function dependencies. Observing how AGTS behaves under intensive loads, such as through stress testing or distributed parsing would be crucial for evaluating its scalability and responsiveness in real world deployments.

Limited adaptability to evolving smart contracts

Smart contracts frequently evolve, yet AGTS currently lacks automatic mechanisms to accommodate these changes efficiently. Updating test cases in response to contract modifications often requires manual adjustments, which may introduce human error or inefficiencies.

Limited user-friendliness and customization

Although AGTS provides structured and parameterized scripts, usability remains constrained for users unfamiliar with script languages or command-line interfaces. Currently, there is no intuitive graphical interface or straightforward customization options, limiting its broader usability and ease of adaptation by users without scripting expertise.

Performance and scalability concerns

Extensive use of computationally intensive testing methods such as symbolic execution and fuzz testing raises concerns about performance scalability. Managing resources effectively, particularly for large and complex smart contracts or extensive blockchain networks, remains challenging and warrants optimization.

Future work

To directly address the identified limitations and enhance the AGTS framework’s applicability, future research and development efforts will focus on the following issues.

Enhancement of requirement parsing via Large Language Models (LLM)

We plan to integrate advanced LLM techniques, a type of artificial intelligence method based on deep learning, to automatically handle incomplete or ambiguous requirements, refining test scenarios, and significantly reducing manual intervention. This approach aims to mitigate the dependency on perfectly accurate requirement documents and enhance AGTS’s robustness against specification deficiencies.

Automated test adaptation to contract changes

Future improvements will include intelligent mechanisms for automatically detecting changes in smart contracts and dynamically updating test scripts. Machine-learning-based approaches or automated differentiating tools may facilitate rapid and error-free adaptation of test suites to evolving contract specifications.

Development of user-friendly interfaces and customization capabilities

To improve usability, we will develop a graphical user interface (GUI) and intuitive customization tools. Users, including those without scripting experience, will be able to visually define, modify, and execute test scenarios, thereby broadening the practical adoption and accessibility of AGTS.

Optimization of performance and scalability

Exploring heuristic-based prioritization and selective symbolic execution will be essential for addressing scalability challenges. Prioritizing high-risk code paths can reduce computational overhead, ensuring practical and efficient test execution for large-scale deployments.

Extensibility for user-specified test actions

We did not explicitly design our approach to target particular vulnerabilities, yet our overall process inherently covers common security issues. However, we plan to extend our framework by incorporating specialized interfaces for user-specific hooks that enable users to request targeted checks for known vulnerability classes, thereby offering deeper security-focused testing.

Integration with concurrency-oriented consensus protocols

Drawing on insights from the multithreaded B-LPoET framework by Khan et al39., which tackles high-throughput demands via parallel transaction processing, we plan to explore similar concurrency-oriented environments. By introducing our automated testing methods into such protocols, we can uncover contract vulnerabilities that only emerge under parallel execution. This synergy not only validates AGTS’s adaptability to emerging blockchain architectures but also highlights potential optimizations for handling multi-transaction workflows.

By systematically tackling these targeted improvements, we aim to elevate the robustness, efficiency, and accessibility of the AGTS framework, facilitating its broader adoption in industry and enhancing the reliability of blockchain-based smart contracts.

Implementation

AGTS is implemented as a modular testing framework targeted at Hyperledger Fabric. It operates on a Linux 22.04 environment with.NET 8.0 and Bash 5.1.16, ensuring compatibility with enterprise blockchain systems. All modules are developed in C# and Bash, and the source code is publicly available on GitHub for reproducibility (https://github.com/dancingBone79/TestCodeGeneration/tree/net80). A key prerequisite for AGTS is a correct composition order of function calls (the sequence in which contract functions should execute). In practice, this sequence is derived from requirement models using a coroutine-based type system. This preliminary method guarantees that the input composition order is logically valid, matching the contract’s expected execution flow. With the environment set up and a valid composition order, AGTS can reliably generate test suites for complex smart contracts.

AGTS requires clearly defined input to accurately generate testing scripts. Figure 2 demonstrates the input-output workflow.

Fig. 2
figure 2

Input-output workflow.

Prerequisites

One of the prerequisites for this suite to work is the right composition order36,37. According to these functions, we can generate the matched function with the parameters of the correct type. If the required type cannot be found, we will tell the users that the remodel file doesn’t contain this function. The function call order list in the code is sourced from the composition order outlined in Gu and Ke’s paper37.

Input requirements

AGTS requires three well-defined input items to initiate test generation, with built-in validation for each item.

Contract file

It’s a comprehensive document describing functional logic, data structures, and business rules of the smart contract, as the source (or interface) file, it contains all class and function definitions. AGTS first parses this file and validates that all referenced functions and data types are present. If any required function or type is missing, the user is notified of the inconsistency. This prevents proceeding with incomplete or incorrect contract descriptions.

Composition order list

An ordered list of smart contract function calls that defines the test execution sequence. AGTS verifies this list against the parsed contract definitions, ensuring each Class::Function entry exists and appears in a logical order. For example, a composition order may be provided as a list of strings,

$$\begin{aligned} {\textsf {compositionOrder}} \leftarrow \begin{array}{ll} {[} & {\textsf ``{ManageStoreCRUDService::createStore}''} \\ , & {\textsf ``{CoCoMESystem::openStore}''} \\ , & {\textsf ``{ManageCashDeskCRUDService::createCashDesk}''} \\ , & {\textsf ``{CoCoMESystem::openCashDesk}''} \\ ]. & \end{array} \end{aligned}$$
(1)

AGTS uses this list to drive the generation process, so the order must reflect a valid usage scenario (often preconditions of earlier functions satisfy requirements for later ones).

Target language

The scripting language in which the test suite will be generated is specified by the name of the language, as a string,

$$\begin{aligned} {\textsf {targetLanguage}} \leftarrow {\textsf ``{bash}''}. \end{aligned}$$
(2)

Currently, bash is supported for Hyperledger Fabric testing. The system’s architecture is language-agnostic, so support for other languages (e.g., Python, JavaScript) can be added without altering core logic. AGTS initializes the script generator for the specified target language, ensuring that the produced scripts have correct syntax and structure for that environment.

By validating input upfront, AGTS reduces user errors and ensures all subsequent steps operate on consistent data. Once the input pass validation (contract parsed, composition order confirmed, target language set), the framework proceeds to invoke its internal modules to generate the test suite. We call this step as the Initialization controller in the AGTS.

AGTS module architecture

AGTS’s core functionality is delivered through a pipeline of internal modules, each handling a specific aspect of the test generation process. The modules work in sequence to transform the input into a complete executable script. The framework comprises four interconnected modules, as illustrated in Figure 3:

Fig. 3
figure 3

AGTS module architecture.

Initialization controller

As the entry point for AGTS, the Initialization controller orchestrates test generation by accepting Contract Path, Composition Order List and Target Language ensuring that all modules within the suite function cohesively, enabling comprehensive and automated testing of smart contracts across different environments.

Contract interpreter

The Contract Interpreter module parses the smart contract file to extract the functions and parameter types needed for test generation. Guided by the Composition Order List provided by the Initialization controller, it locates each function’s definition and records its parameter list.

Given the class name c and function name f, the process of deriving the list \([T_1, \dots , T_m]\) of parameter types from a smart contract C can be formally described as,

$$\begin{aligned} \begin{aligned} [L_1, \dots , L_n]&= C, \\ \exists 1 \le i \le n \bullet c::f(E)&\in L_i, \\ [p_1:T_1, \dots , p_m: T_m]&= E, \end{aligned} \end{aligned}$$
(3)

where \(L_i\) are lines in C, and \(c::f(E) \in L_i\) means that the function signature occurs in line \(L_i\). The parameter type list of typing pairs \(p_i : T_i\) for each function in the composition order will be used to generate concrete values in the next stage. By systematically parsing the contract, the interpreter ensures that later steps have accurate type information for every function’s input.

Parameter generator

Using the parameter type lists from the Contract interpreter, the parameter generator automatically creates sample input values for each function call. The goal is to produce syntactically correct and semantically meaningful arguments that cover a range of scenarios.

The random value generation process is formulated as the V(t) function, which generates a random value of a given type t, defined as,

$$\begin{aligned} \begin{aligned} V({\textsf {String}})&= \left[ U_1(\Sigma ), \dots , U_n(\Sigma ) \right] \\ V({\textsf {Integer}})&= U(\{ a, a+1, \dots , b-1, b\}) \\ V({\textsf {Boolean}})&= U(\{ {\textsf {true}}, {\textsf {false}} \}) \\ \end{aligned} \end{aligned}$$
(4)

where \(U(X)\) is a uniform independent random variable ranging over set X, and \(\Sigma = \{ \texttt {'A'}, \ldots , \texttt {'Z'}, \texttt {'a'}, \ldots , \texttt {'z'} \}\) is the set of uppercase and lowercase English letters. For an unsupported type t, V(t) results in an error \({\textsf {NotSupportedException}}(t)\).

According to the results from the Contract interpreter, we have already got the function names, parameter number, and the parameter’s style, and this module could help to return a corresponding sample value based on the parameter type. This is essential for generating realistic and valid arguments for the smart contract functions.

This randomized strategy ensures a variety of test input. However, the generator is not purely random, it is designed to cover edge cases and typical boundary values. For example, for strings it may include empty strings or special characters, and for integers it targets extremes (very small, zero, very large) in addition to random mid-range values. The Parameter Generator gives special attention to boundary conditions and exceptional cases. Traditional methods may overlook or inadequately cover these boundary conditions and exceptional cases. By systematically including such edge-case values, AGTS increases the chances of revealing bugs or vulnerabilities in the smart contract that might not surface with only typical input. All generated parameters are checked to ensure they conform to the expected format and type (any mismatch or formatting error is caught and reported). This way, the output of the Parameter Generator is a set of valid argument lists, one list of concrete values for each function in the composition order.

Code generator

The Code Generator module integrates all the information from previous steps to produce the final test script. It essentially assembles the script file by placing the generated function calls into a runnable sequence and adding the necessary boilerplate (headers, environment setup, etc.). The process can be outlined in several steps.

  1. 1.

    Construct function Calls: For each function in the composition order, the Code Generator creates a chaincode invocation command that calls the smart contract function with the generated parameters. This is often a CLI command (for Hyperledger Fabric, a peer chaincode invoke... command) that includes the function name and its arguments. These commands form the core of the test script, directly invoking contract functionality with test input. This step directly leads to the creation of the PCI function mentioned earlier, which serves as a core component for interacting with the smart contract.

  2. 2.

    Include environment setup: The module inserts any required commands to set up a consistent execution environment. For Hyperledger Fabric, this includes Docker commands and network configurations to ensure the chaincode can be invoked in an isolated, reproducible environment. For example, it may add commands to start or connect to a Docker network and to set environment variables needed by the Fabric CLI. This guarantees that the script can run independently on any host with Docker and Fabric installed, yielding the same results each time (important for test reliability).

  3. 3.

    Add script structure (header/footer): The Code Generator prepends a file header and appends a file footer appropriate for the target language. The header typically contains interpreter directives and initialization (e.g., #!/bin/bash and configuration exports for bash), while the footer might include cleanup steps or simply ensure the script ends correctly. In AGTS, GenerateFileHeader(targetLanguage) and GenerateFileFooter(targetLanguage) functions supply these components. This standard structure reduces human error by automating the tedious setup and teardown segments of the script.

  4. 4.

    Synthesize final script: All parts are combined – the header, the sequence of contract calls (with their arguments), and the footer – into a single script file. The result is a fully automated test script that can be executed immediately. By assembling the script programmatically, AGTS ensures consistency in formatting and usage. Every script includes the necessary parts (invocation function, environment config, calls, etc.) in the correct order, which makes the testing process repeatable and reliable.

The Code Generator’s output is an executable bash test script (for the current implementation) that encapsulates the entire test procedure for the given smart contract. This automated assembly greatly improves maintainability and scalability: if the contract changes or a new function is added to the composition list, running AGTS again will regenerate an updated script without manual intervention.

Output and script generation

The output of AGTS is a structured bash script tailored for Hyperledger Fabric, containing all the commands and functions needed to deploy and test the smart contract functions in sequence. Internally, the script generation process can be described by a series of generation functions and a final composition, as follows.

  • \(Header\) is generated by the function GenerateFileHeader based on the specified targetLanguage. For bash, it may include #!/bin/bash, setting-e for error propagation, loading configuration files, etc. It sets up the necessary preamble and environment for the script file, including setting environmental variables and executing a sequence of commands that facilitate the installation, packaging, and deployment of smart contracts. It utilizes commands such as export, source, rm, pushd, and popd to manage files and directories effectively, as well as to execute Gradle tasks. Additionally, it employs Hyperledger Fabric’s peer lifecycle CLI tool to manage the lifecycle of chaincode, ensuring that all operations from triggering to operation of smart contracts are seamlessly handled.

  • \(Footer\) is created using the function GenerateFileFooter, which produces any closing commands or syntax needed for the script, for bash, possibly an exit 0 or shunit2 test execution trigger.

  • \(Params _i\) are the parameters for each smart contract function with class name \(C_i\) and function name \(F_i\). These parameters are determined by the function FindParameterList.

  • \(Args _i\) are the actual arguments produced by GenerateArgumentsList according to the parameters \(Params _i\). The actual arguments will be used in the smart contract call.

  • \(Call _i\) can then be finally made to generate the specific call to the smart contract function through the GenerateContractCall function.

For each function call i in the composition order (with class \(C_i\) and function \(F_i\)),

$$\begin{aligned} {Params}_{i}= & {\textsf {FindParameterList}}(C_{i}, F_{i}), \nonumber \\ {Args}_{i}= & {\textsf {GenerateArgumentsList}}({Params}_{i}), \nonumber \\ {Call}_{i}= & {\textsf {GenerateContractCall}}({Args}_{i}),\, {\hbox { for }}i = 1, \dots , {\hbox {n}}. \end{aligned}$$
(5)

Finally, the complete \(Script\) is assembled as an ordered list of the header, the generated calls for each function, and the footer, encapsulating the entire process of generating a functional script for smart contract interaction, formally,

$$\begin{aligned} \begin{aligned} Script&= \left[ Header , Call _1, \dots , Call _n, Footer \right] \end{aligned} \end{aligned}$$
(6)

In other words, AGTS outputs a script that starts with the header, then contains a sequence of function invocation calls (one for each function in the input list, in order), and ends with the footer. Each call in the sequence has its arguments generated and inlined as described above. This structured approach ensures that each aspect of the script is systematically generated, with careful attention to the details of parameter generation, argument formation, and command execution. The process not only guarantees that the generated script is syntactically correct but also optimally configured for the target environment.

Script content

The resulting Bash script is structured to not only call the contract functions but also manage the blockchain environment around those calls. For example, the script typically begins by setting up environment variables and performing chaincode installation, packaging, and instantiation commands (using Hyperledger Fabric CLI). It may define helper functions like pci() (peer chaincode invoke) to streamline repeated invocation syntax. The script then executes the contract function calls in the specified order, often within a larger test function or context that ensures the ledger state progresses correctly, e.g., a function that handles a full scenario like initializing a store, then making a sale. Throughout the process, the script uses Fabric tools such as configtxlator to decode blocks and verify outcomes, aiding in debugging and validation. Finally, the script can incorporate a testing framework (like shunit2 for bash) to assert expected results of each call, providing immediate feedback on whether each transaction succeeded as intended. By generating this script automatically, AGTS ensures that every required operation, from setting up the network context to invoking the chaincode and checking results, is included in the correct order. The script is ready to run and requires no manual edits. Testers can simply execute it to perform a full suite of contract interactions on Hyperledger Fabric, with the confidence that all commands are well-formed and the environment is correctly configured. This output streamlines the testing process, and also improves reliability, as the same script can be run multiple times or adapted to different networks with minimal changes, thanks to the consistent generation process.

Conclusion

In this paper, we presented AGTS (Automated Generation of Test Suites), a novel automated testing framework specifically designed for smart contracts deployed on Hyperledger Fabric. AGTS effectively addresses existing challenges in smart contract testing by significantly improving automation, enhancing security, and providing comprehensive test coverage.

Our research makes several clear contributions.

  1. 1.

    Enhanced testing coverage through hybrid analysis: AGTS integrates advanced testing methodologies, including static analysis, symbolic execution, and fuzz testing, enabling comprehensive identification of vulnerabilities. This ensures early detection of potential security risks, significantly improving smart contract reliability.

  2. 2.

    Efficient automation of test case generation: By fully automating the process from requirement parsing to test script generation, AGTS substantially reduces developer workload and accelerates the testing phase. This automation provides a consistent and repeatable testing process, minimizing human errors and improving overall productivity.

  3. 3.

    Flexible and modular design: The modular architecture of AGTS supports easy adaptability and scalability, enabling straightforward integration of additional testing methods or tools. This positions AGTS as a robust and sustainable testing platform, particularly valuable within rapidly evolving blockchain ecosystems like Hyperledger Fabric.

  4. 4.

    Practical impact on blockchain security: By enabling thorough pre-deployment testing, AGTS reduces the risk of deploying faulty smart contracts, thereby minimizing potential financial losses and enhancing trust and security in blockchain-based applications. Nevertheless, the effectiveness of AGTS currently depends on the completeness and precision of input requirements, highlighting the necessity for accurate initial documentation.

Moving forward, we plan to enhance AGTS by integrating heuristic-based testing strategies and user-friendly graphical interfaces, as well as exploring large language models to further automate test generation. These advancements will ensure AGTS continues to effectively meet the evolving demands of blockchain development, maintaining its relevance and efficacy in securing smart contracts.