Introduction

Smart contracts (SCs) have been adopted by banking, healthcare, insurance, and the IoT because of rapid blockchain technology development1. SCs pose security risks due to their programming and operating environment. The BeautyChain (BEC) Token Attack and Proof of Weak Hand (PoWH) event focuses on shrinking smart contract (SC) vulnerabilities. Attackers exploited BEC, an Ethereum-based token, due to a SC vulnerability. PoWH, another SC-related Ponzi scheme, had similar weaknesses. The BEC cryptocurrency SC had an integer overflow in April 2018, allowing hackers to crash the market by issuing excessive tokens. The PoWH contract lost Ether due to similar issues2,3. These incidents underline the need for integer overflow detection in our detection method. We detect the integer overflow vulnerability in Ethereum-based SCs using an accurate and adaptive technique. Gathering the source code of SCs to test them can provide difficulties when creating a model to find vulnerabilities. Related research indicates that the public can see roughly one percent of the source code for SCs4. Obtaining numerous suitable source codes is labor- and resource-intensive because of the Ethereum network and node limitations.

A comprehensive quality and security check of the dataset needs to be done. There may be privacy and legal issues with obtaining actual vulnerability data5. Paying close attention to the quantity and quality of the data collected is essential when building a trustworthy machine-learning model for code representation and vulnerability identification. Good samples are necessary for building accurate and generalizable models. A dearth of information may affect the model’s capacity to identify vulnerabilities. We present a few-shot learning strategy to discover SC shortcomings using data augmentation from traditional machine learning to solve these challenges. GAN technology facilitates the identification of deficiencies in SCs. Generative Adversarial Networks (GANs) consist of generators and discriminators6. The generator generates data; the discriminator compares that generated data with real data.

Crucially for data growth, we use GAN generators to continuously construct synthetic contracts that are near real SCs. We have trained the discriminator to distinguish between real and synthetic contracts. This way helps one to solve data shortages7. We maintain semantic and syntactic integrity by converting SC source code into spatial vectors using a code embedding technique8. GAN can find small samples by training a vector data set with a small set of samples to make several fake data sets that can be used to compare similarities. Our approach combines vector similarity ananalysing GAN discriminator feedback to detect SC integer overflow problems. The model uses GAN’s adversarial training approach and generates important characteristics from SCs to make SC security analysis more accurate and efficient9.

Research gap

SC vulnerability detection is critical to ensure security and trustworthiness10. Traditional methods, including fuzzing, symbolic execution, and formal verification, have automation, efficiency, and accuracy limitations. Recent efforts have focused on analyzing SC source code, but issues remain in preserving code structure, managing diverse information, and reducing dependence on large datasets11. Furthermore, current feature-learning approaches struggle with effective vulnerability prediction12. Addressing these shortcomings is critical to improving SC security.

Motivation

Blockchain technology, particularly SC, has revolutionized automation in various industries but faces significant security challenges13. Studies reveal inconsistencies in vulnerability detection tools, leading to high false positives and missed vulnerabilities. Traditional manual detection is inefficient, while machine learning offers a promising alternative. However, current models struggle with SC preprocessing, losing essential syntax and semantics14. Researchers are exploring vectorization and graph-based techniques, with Graph Neural Networks (GNNs) showing the potential to capture contract features. Limited access to high-quality datasets due to privacy and legal concerns remains challenging. Researchers are exploring data augmentation and few-shot learning to enhance SC vulnerability detection.

Research contributions

This article introduces a method for detecting vulnerabilities in SCs that combines GAN with code embedding. We train a GAN model on integer overflow vulnerabilities by transforming SC source codes into vector representations using code2vec. The GAN discriminator detects vulnerabilities and performs vector similarity analysis, while the GAN generator expands the dataset. Unlike traditional methods, this approach preserves contract properties through Abstract Syntax Tree (AST) vectorization and enables deep learning on limited data. Dual similarity detection accuracy is enhanced using GAN feedback, cosine similarity, and correlation coefficients. Our major contributions are as follows:

  • WE have developed a code embedding and GAN-based vulnerability detection methodology for integer overflow vulnerabilities.

  • AST-based representation retains contract properties.

  • We are using GAN data augmentation for small-sample deep learning.

  • We conducted scalability, accuracy, and efficiency tests on 150 public Ethereum contracts.

Research background

Smart contracts

In 1994, computer scientist and cryptographer Szabo coined “smart contract” to explain digital agreements15. A SC is a programming code-based agreement that automatically fulfills its obligations when all parties meet the prerequisites. SCs contain program codes. Developers can activate the SC function by writing the business logic code, such as programming and storing it in a blockchain system. “Bitcoin” was created by the cryptographic genius Nakamoto16 in 2008, and he was also a supporter of “Blockchain”. 2009 marked the beginning of the blockchain system. According to him, blockchain is a distributed, peer-to-peer (P2P) ledger that cannot be altered. In article17 published the white paper for Ethereum, he expanded the usage of blockchain technology beyond cash and launched the blockchain. He did this by adding SCs to the platform. The currently most often used blockchain platform, Ethereum, was the first one to allow SCs5. SCs thus enable programmatically controlled blockchain data in ways that transcend mere financial transactions. One important blockchain tool that allows users to create SCs to control blockchain data and offer digital money is Ethereum18. This capability greatly expands the scope of blockchain applications and facilitates diverse uses. When all of the parties involved have signed the contract, it is then encoded as a piece of programming code and recorded on the blockchain19,20. Predetermined states, transition methods, and conditions trigger the execution of these agreements. After meeting these conditions, the SC is enabled in the blockchain network and checked by nodes before blockchain operations are performed. The blockchain is responsible for monitoring the execution of SCs to guarantee that they are carried out in an exact and precise manner when the conditions that trigger them are met21. There are four characteristics of SCs:

  1. 1.

    It is transaction-driven, and there is no need for human involvement.

  2. 2.

    An SC cannot stop once it has started.

  3. 3.

    Since most blockchain nodes are required to verify the validity of the SC, every single node is aware of it.

  4. 4.

    Adjust it to suit the various settings you are working with.

These characteristics ensure the safety of investors. There has also been growth in the management sector for SCs19. Because they can be traced, SCs are an excellent choice for electoral voting22. Among the many potential uses of this technology are digital asset copyright and the administration of corporate processes23. SCs are utilized to determine access permissions in electronic medical records13. This method gives medical professionals increased control over patient information and helps prevent data leaks. The Internet of Things (IoT) and decentralized SCs combine significant development. SCs simplify complicated Internet of Things network procedures and increase resource sharing, improving the industry’s productivity, information security, and the costs of applications24.

Smart contract vulnerabilities

Attackers can use SC vulnerabilities to modify program data, disrupt execution, or perform unauthorized activities14. These vulnerabilities allow resource theft, identity manipulation, and data compromise25. Ethereum, the most popular SC blockchain, has the most vulnerabilities and losses, making it a research priority. Table 1 describes various smart contact vulnerabilities. SC vulnerabilities have three levels.

Table 1 Various smart contact vulnerabilities.

These vulnerabilities must be addressed to secure SCs and prevent financial losses.

Securing SCs

The rising use of Ethereum has increased its monetary value, highlighting the risks of the cryptocurrency’s security12. To reduce the risk of system instability, Ethereum’s original design included decentralized P2P networks and consensus algorithms, virtual machine technology to run SCs in a safe sandbox, and cryptographic methods to encrypt and verify data. Ethereum’s prototype contained all of these capabilities26. SCs were developed with the goals of automating contract execution, removing the need for dependable third parties, and enhancing transaction security and efficiency. Researchers have focused a lot of attention on SC security because of the widespread use of these contracts in industries including finance, supply chain management, and the IoT.

The intractable nature of blockchain networks and the intricate programming languages used to create SCs are the root causes of security issues in this type of contract27. Ethereum SCs offer Turing-complete programming but also come with additional security risks. A reentrancy attack on the DAO project in 2016 stole around $50 million, causing substantial economic damages and exposing SC vulnerabilities in design and execution. The Parity Wallet security event further highlighted the harm caused by SC programming issues. Due to design flaws and repair carelessness, the Parity multi-sig wallet contract was attacked in 2017, stealing or freezing $30 million in ether [6]. These instances show that SC development requires extensive security testing and regular monitoring. Under a cyberattack, 52.3 million NEM tokens worth 534 million USD were pilfered from Coincheck in 201828. The hacker might have taken a lot of money by unapproved transactions thanks to a security flaw in the hot wallet administration of the exchange. The incident at Coincheck demonstrated how critical it is to have secure methods of storing and protecting assets, particularly when using SCs10.

Since illegal financial transactions can result from inappropriate use or vulnerabilities in SCs, protecting exchange assets is a top priority11. SCs raise cross-platform security issues. The broad application of blockchain technology makes SC security a global issue of relevance. August 2021 saw over USD 610 million stolen from Poly Network, an interoperable cross-chain protocol29. The assailant changed the Keeper role in EthCrossChainData.sol and controlled asset movement by using a weakness in the SC to create a cross-chain transaction. This issue highlighted the security risks associated with the design and implementation of cross-chain protocols, which are particularly problematic in complex cross-chain communication and contract interactions7. These key historical events, which shaped blockchain technology and security testing, form the foundation of SC security. SCs must be verified and maintained for blockchain ecosystem security and growth.

Generative adversarial network (GAN)

A GAN. combines two neural networks to create data identical to its training. Image generation, editing, and text generation use GANs. In GANs, the generator creates indistinguishable data from actual data, while the discriminator distinguishes between the two. This is described as a zero-sum game. Training the two networks together improves the generator’s data realism30. GANs realistically render faces, objects, and scenes. They can also edit photographs to remove undesired elements or change a person’s appearance. Additionally, GANs have generated poems, essays, and code. GANs generate new data and are used in many applications17. GANs will certainly become increasingly popular as they grow. The role and responsibilities of GAN in SC vulnerability detection are data Enhancement, Anomaly detection, and location of Vulnerability24. GANs can greatly enhance SC vulnerability detection. We need research and development to overcome these obstacles and maximize their potential.

Methodology framework

Preparing the source code, training the model, and identifying similarities are the three steps of the GAN-based process for finding integer overflow vulnerabilities in SCs. Figure 1a shows the steps to prepare the source code for Abstract Syntax Tree (AST) generation by utilizing integer overflow vulnerability features to extract relevant and non-essential sections16. An AST is a tree representation of the syntactic structure of code. Code2Vec11, on the other hand, is a neural network-based technique that takes ASTs as input to analyze and generate vector embeddings; in this case, contract vectors. ASTs to contract vectors can be accomplished with Code2vec11. Preprocessing ensures data consistency for all SCs.

The GAN architecture consists of a generator and discriminator, each implemented with three fully connected layers. The generator uses 128, 256, and 128 layers, while the discriminator uses 256, 128, and 64 nodes, respectively. ReLU activation functions are applied to all hidden layers, and a sigmoid function is used at the output layer of the discriminator. The models were trained using the Adam optimizer with a learning rate of 0.0002, a batch size of 32, and 200 training epochs.

Abstract Syntax Tree (AST) paths

AST pathways are structured program syntax tree node associations. They capture syntactic and structural interdependence by connecting code parts with nodes. These routes describe code logic and flow to help machine learning models extract features.

Code2vec

neural network-based code embedding model that converts source code to vectors. It generates distributed embeddings from AST routes to capture code semantics and syntax. These embeddings map code snippets into a continuous vector space for pattern recognition, enabling vulnerability identification, code summarisation, and categorization.

To train a GAN model, the generator and discriminator will compete to create synthetic contracts that look more realistic and to separate actual contracts from fake ones on the processed training set. At last, the generator can distribute target data accurately while producing high-quality synthetic samples. The trained generator will increase the size of the test set. Identifying an integer overflow problem in a SC necessitates the initial vectorization of the source code. The trained discriminator receives the vector to construct the security label. When the label is positive, we compare the contract vector with the expanded test set. The detecting mechanism determines the contract’s susceptibility using the similarity threshold coefficient. The contract is at risk when the similarity coefficient goes beyond the cutoff. Recap of the detection system process:

  1. 1.

    Preprocessing the source code: Preprocess source code and build vulnerability-specific contract vectors.

  2. 2.

    Model training: GAN generator and discriminator training. The discriminator separates actual and synthetic contracts, and the generator generates high-quality synthetic contracts.

  3. 3.

    Finding security holes: Using the discriminator, you may determine the target contract’s vector. Determine vector similarity and use the similarity threshold coefficient to check for vulnerability if the label is affirmative.

Data augmentation with GAN

The generation of susceptible contract data involves code preprocessing, code embedding, and code generation, as shown in Fig. 1b.

Fig. 1
figure 1

Source code for generating an Abstract Syntax Tree (AST).

Code preprocessing

Preprocessing makes sure that the data for AST-based analysis is clean and standardized. To keep the code consistent, it’s important to remove comments, standardise variable names, and deal with spaces. There is a possibility that private user and transaction information is included in the SCs source code. Improper processing during model training could lead to a breach of data protection regulations. Solidity’s SC programming language allows customising identifiers. The naming conventions and programming techniques of programs and developers differ in their coding styles. For GAN modelling and similarity judgement, vectorisation of source code allows for translating semantically identical code segments into separate vectors. This means that source code preprocessing is required. Rule preprocessing:

  • Maintain integer overflow vulnerability features.

  • Maintain code semantics and structure.

  • Code embedding input specification.

  • The following two aspects will be standardized.

The preprocessing logic is illustrated in the following pseudocode:

figure a

Description of code features

Establishing integer overflow characteristics is crucial. SC programming language Solidity’s integer types cause integer overflow. In EVM, integers are data types with a fixed size, no signs, and ranges defined by bit width. From uint8 to uint256, Solidity can handle 8-bit unsigned numbers. Umint256 is the notation for a 256-bit unsigned integer. The result will overflow and become 0 when adding to a uint8 variable that stores 255, as shown in Fig. 2. Figure 3 provides a sample Solidity code for creating smart contracts.

Fig. 2
figure 2

Fundamental concepts of integer overflow with examples.

Fig. 3
figure 3

A smart contract’s AST.

Embed input details

The AST produced by the Solidity parser ANTLR has to be handled in line with the code2vec embedding criteria. More especially, the traversable AST has to fit the following definition. In this approach, we are using various parameters like Non-Terminal (NT), Terminal Node (TN), Set (A), Root Node (R), Maps Non-Terminal (MNT), and Maps Terminal Node (MTN). One can depict the AST of an SC as \(\langle NT, TN, A, R, MNT, MTN \rangle\), where \(NT\) represents non-terminal nodes and \(TN\) represents terminal nodes. The set \(A\) contains values of TNs, while \(R \in NT\) AST root node.

The function \(MNT: NT \rightarrow (NT \cup TN)^{*}\) maps non-terminal nodes to their respective child nodes, and the function \(MTN: TN \rightarrow A\) associates terminal nodes with values. All child node listings list each node once except the root. Figure 4 describes the AST smart contract solution.

An AST path is a directed sequence of nodes that represents a syntactic relationship between two terminal elements in the AST of a program. It captures the structure and direction of traversal (upward or downward) between nodes and characterizes the relationships between code tokens. These paths form the building blocks for constructing code semantics in later stages.

AST Paths: AST paths are defined as sequences of length \(L :\)

$$\begin{aligned} L = x_1 m_1 \cdot x_2 m_2 \cdots x_L m_L \cdot x_{L+1} m_{L+1}. \end{aligned}$$

where \(x_1, x_{L+1} \in TN\) (terminal nodes). Non-terminal nodes are \(x_j \in NT\) for \(j \in [2..L].\)

The AST movement direction is represented by \(m_j \in \{ \uparrow , \downarrow \}\): - \(\uparrow\) (up) denotes that \(x_j\) is a child of \(x_{j+1}\) (rootward movement). - \(\downarrow\) (down) denotes that \(x_{j+1}\) is a child of \(x_j\) (going away from the root).

The starting and ending nodes of a path \(p_T\) are \(S(p_T)\) and \(E(p_T)\). Define Path Context for AST path \(p_T\); the context is the triplet:

$$\begin{aligned} \langle a_S, p_T, a_T \rangle \end{aligned}$$
Fig. 4
figure 4

Smart contract’s abstract syntax tree.

$$\begin{aligned} a_S = \varphi (S(PT)) \end{aligned}$$
(1)
$$\begin{aligned} a_T = \varphi (E(PT)) \end{aligned}$$
(2)

The beginning node value is represented by \(A = MTN(S(PT)) .\)

The terminal node value is represented by \(A_{TN} = MTN(E(PT)).\)

Code embedding

code2vec is a neural embedding technique that converts source code into continuous vector representations by learning from syntactic paths (AST paths) and their contexts. It enables learning about code structure and semantics in a way suitable for machine learning applications such as vulnerability detection.

Once the semantic analysis is complete, we use code2vec to create vector representations and train the code. Specifically, code2vec captures the links between code elements by extracting path and context properties from the Abstract Syntax Tree (AST). Paths represent syntactic structures like function calls and variable assignments by connecting two nodes with directed edges. Context characteristics give code elements functions and location details. For code embedding, use the following Eq. (3):

$$\begin{aligned} CE = \sum _{j=1}^{n} AW_j VR_j \end{aligned}$$
(3)
$$\begin{aligned} PT_j = NNF(S_j, E_j, Node_j) \end{aligned}$$
(4)

CE represents the final code embedding vector for a given smart contract. Here n is the total number of AST paths extracted from the source code.

Fig. 5
figure 5

Illustration of code embedding and synthetic code generation process.

Each \(PT_j\) is the jpath context in the Abstract Syntax Tree and is transformed into a path vector \(VR_j\) using a neural network function (NNF) Eq. (4). This function takes as input three components: \(S_j\) (the vector representation of the starting token), \(E_j\) (the vector of the ending token), and \(Node_j\) (the sequence of node types along the path). The output \(VR_j\) captures the semantic and structural features of the code path.

\(AW_j\) is the attention weight assigned to the jth path. It reflects the importance or relevance of that path in the overall context of the code. Paths that contribute more to the code’s functional meaning are given higher weights during aggregation.

The final embedding vector CE is thus a weighted sum of all the individual path vectors, where each path’s influence is modulated by its attention weight. This method captures both local and global code semantics and enables the detection of subtle patterns related to vulnerabilities.

Figure 5a describes an overview of code embedding. In this figure code2vec concatenates paths and context information into vector representations to encode code element semantics. Code2vec uses this approach to create vectors for functions, variables, operators, and other code structures. Code2vec vectorises the code segment after integrating all code element vectors. The AST was parsed using solidity-parser-antlr version 0.4.13. Vector embeddings were generated with code2vec as per Alon et al. (2019), using the implementation at: https://github.com/tech-srl/code2vec, which we adapted to support Solidity syntax.

Code generation

Figure 5b shows Synthetic code generation process. GAN-generated Solidity code vectors from the vector dataset. While the GAN discriminator differentiates actual from synthetic vectors, the generator generates synthetic code vectors from random noise. The generator produces vectors that mimic Solidity code vectors through iterative training, while the discriminator improves its ability to separate them. The discriminator loss function is LF\(_d\) (Eq. 6), and the generator is LF\(_g\) (Eq. 5). Generator: g, discriminator: d, real statement: r, random noise: n, and distributions: DIS\(_data\)(r) and Dis(n).

$$\begin{aligned} \mathscr{L}\mathscr{F}_g = \mathbb {E}_{z \sim (z)} [\log (1 - d(g(n)))] \end{aligned}$$
(5)
$$\begin{aligned} \mathscr{L}\mathscr{F}_d = \mathbb {E}_{r \sim Dis_{\text {data}}(r)} [\log d(r)] + \mathbb {E}_{n \sim Dis(n)} [\log (1 - d(g(n)))] \end{aligned}$$
(6)

As generator and discriminator achieve Nash equilibrium, GAN training ceases. The generator may produce realistic-looking Solidity code vectors after training. These are a synthetic contract vector. Using this method, we can augment the vulnerable dataset with numerous synthetic contract vectors. Vector similarity detection will use the updated vulnerable contract dataset.

Dual similarity detection

Discriminator GAN analysis

During GAN training, only vectors that have integer overflow vulnerabilities are used. This means the trained discriminator can tell the difference between actual and fake contracts and identify those with integer overflow vulnerabilities.

Vector similarity analysis

Vector similarity analysis is a fundamental criteria for automated detection. Contract vectors including structural and semantic information of the source code are produced using Code2vec.

$$\begin{aligned} \cos (a, b) = \frac{1}{n} \sum _{i=1}^{n} \frac{a \cdot b_i}{\Vert a\Vert \Vert b_i\Vert } \end{aligned}$$
(7)
$$\begin{aligned} CC = \frac{1}{r} \sum _{k=1}^{r} \frac{\sum _{j=1}^{c} (a_l - \bar{a})(b_{kj} - \bar{b_k})}{\sqrt{\sum _{l=1}^{c} (a_l - \bar{a})^2 \sum _{l=1}^{c} (b_{kl} - \bar{b_k})^2}} \end{aligned}$$
(8)

Where a is the target contract vector, while b is the vulnerable contract vector. The kth vector in y is b\(_k\), while r and c represent b’s size and a’s dimensionality. Additionally, Let \(\bar{a}\) and \(\bar{b}\) \(_k\)

represent vector mean values, cos(a, b) represent cosine similarity (Eq. 7), and CC represents correlation coefficient. The target contract is brittle if the Pearson correlation coefficient and cosine similarity are both high (Eq. 8). To make the detection more precise, we take the correlation coefficient and weighted average of the cosine similarity; to check if the target contract is vulnerable to integer overflow, we apply a threshold (Fig. 6).

Fig. 6
figure 6

Similarity detection method.

Experimental results and analysis

We have already addressed the procedures necessary to enhance the vulnerable contract dataset utilizing the GAN model and the methodology for converting the source code of SCs into vectors that exhibit structural and semantic attributes. Additionally, we showcased the process of evaluating SC integer overflow vulnerabilities using the GAN discriminator in conjunction with vector similarity. This part presents the proposed method for finding SC integer overflow vulnerabilities at their paces. Before we experiment, we will provide the experimental setup and dataset to establish the appropriate vector similarity cutoff coefficient and compare our results to other tools.

Experimental design

The investigations used a Windows 10 PC with an Intel Core CPU (2.30 GHz), 16GB RAM, and a GeForce RTX 2060. Code2vec (Version 2020 release) [23] derives feature vectors from SC source code, whereas Solidity-parser-antlr (Version Version 0.4.11)[102] produces abstract syntax trees (ASTs). We ran two experiments to see if the suggested technique might find integer overflow issues in SC code. This method averages cosine and Pearson correlation coefficients to determine vector similarity. We tested numerous weights and thresholds to discover the best parameters for finding SC source code vulnerabilities and assessing recall and accuracy. We built training and testing subsets from our core dataset (enhanced-smart-contracts-dataset.CSV) using open-source SCs with security classifications. This enabled us to assess the effectiveness of vulnerability detection.

Test data and evaluation criteria

This section details the steps required to test the vulnerability detection approach, including collecting data, selecting assessment metrics, and establishing an experimental comparison environment. Two hundred Etherscan contracts tested our integer overflow vulnerability detection approach in SC source code. “Etherscan” connects Ethereum nodes to analytics and block explorers. These SCs can be checked for security properties, Solidity source code, and contract address. As Table 2 indicates,source dataset summery. Fifty SCs with integer overflow flaws were incorporated into the training set for GAN models and vector similarity investigations. There are 150 SCs in the testing set; 80 are secure, and 70 have integer overflow issues. The testing set compares our and other detection methods.

Table 2 Source dataset summary.
Table 3 Enhanced dataset summary.

Table 3 offers an Enhanced Dataset Summary. Using the trained GAN model, the analytical dataset for vector similarity identification comprising 50 genuine contracts in the training set, we also produced 1,950 counterfeit contracts.

Criteria for assessment Using sFuzz31 and oyente25, we discovered SC vulnerabilities following dataset generation. The following ideas directed the selection of these two instruments:

  • The detecting tool source code is public.

  • Many vulnerability detection programs use this tool to test their performance.

  • This tool detects our vulnerabilities.

These two methods helped us identify areas of weakness in the test set. Table 4 displays the detection results from each tool’s performance testing using confusion matrices; we then compared and contrasted the detection data to show the benefits and effectiveness of the vulnerability detection technique. Innumerable TPs, TNs, FPs, and FNs are present. To test how well the recognition model is doing, we use the confusion matrix. This technique delineates several efficiency metrics, with Eqs. (9)–(13) specifying Accuracy (ACC), Precision, Recall, F1-Score, and Overfitting Rate (OR), respectively.

Table 4 A summary of detection results of the enhanced data.

Accuracy (ACC) Thus, we test the detection model using Accuracy (ACC):

$$\begin{aligned} {\textbf {ACC}}= & \frac{TP + TN}{FP + FN + TP + TN} \end{aligned}$$
(9)
$$\begin{aligned} {\textbf {Precision}}= & \frac{TP}{TP + FP} \end{aligned}$$
(10)
$$\begin{aligned} {\textbf {Recall}}= & \frac{TP}{TP + FN} \end{aligned}$$
(11)
$$\begin{aligned} {\textbf {F1-Score }}= & 2 \times \frac{\text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$
(12)
$$\begin{aligned} {\textbf {Overfitting Rate (OR)}}= & \frac{\text {Training Accuracy} - \text {Test Accuracy}}{\text {Training Accuracy}} \end{aligned}$$
(13)

To provide a clear comparative benchmark, we evaluated our approach against sFuzz and Oyente using the same test set of 150 contracts. Our GAN-based method achieved an overall detection accuracy improvement of 12.4% over sFuzz and 18.1% over Oyente. Furthermore, it demonstrated higher F1-score and precision values, confirming its superior balance of sensitivity and specificity in detecting integer overflow vulnerabilities.

To validate the specific contribution of GAN-generated synthetic data to model performance, we conducted an ablation study by training the system without synthetic contracts. In this setup, the F1-score dropped from 0.91 to 0.84, and accuracy declined by 9.7%, confirming the critical role of GAN-based data augmentation in addressing the data scarcity challenge. These results empirically support the claim that the proposed method benefits from the synthetic vector generation process.

Vector similarity parameter test

This part will test composite vector similarity detection settings to identify SCs. Use the best parameters to detect integer overflow vulnerabilities to improve the provided strategy. We will test how modifying the vector similarity threshold and cosine similarity weight affects detection performance. This Eq. (14) is defined before the tests to determine the results of the vector similarity:

$$\begin{aligned} S = \frac{\cos (x,y) \cdot W + r \cdot (1 - W)}{2} \end{aligned}$$
(14)

The cosine similarity weight, W, is denoted by S; the vector similarity result is denoted by where W is an integer between 0 and 1. If S exceeds threshold value T, where T is an integer between 0 and 1, the target contracts as integer overflow vulnerabilities. The target contract has no integer overflow issues if S is less than T. Through experimental means, we determine the cosine similarity weight W. We vary the weight to assess the impact on model correctness and set the threshold to 0.85, as demonstrated in Fig. 7. Based on the results of the experiments, the model achieves its highest accuracy when W = 0.74.

To offer a more reliable assessment beyond accuracy, we computed precision, recall, and F1-score for each threshold and weight configuration. The model achieved an F1-score of 0.91 when W = 0.74 and T = 0.9, indicating strong balance between sensitivity and specificity. Precision was 0.89 and recall was 0.94 in this optimal configuration.

Semantics and code structure: cosine similarity is a good measure of these. Integer overflow vulnerabilities share the same semantics and code design. We can find these vulnerabilities by computing the cosine similarity between the codes.

Resilience: Cosine similarity can withstand outliers and noise. Modifications to the code, such as comments, whitespace, etc., may be incorporated into practical applications. To some extent, cosine similarity can mask these distinctions, making the model more robust.

To evaluate statistical significance, we performed repeated trials (n = 10) for each parameter setting. A paired t-test on detection accuracy across weight values showed statistically significant differences (p< 0.05), confirming that the chosen configuration (W = 0.74) improves detection performance in a meaningful way.

Fig. 7
figure 7

Cosine similarity weight experiments.

Fig. 8
figure 8

Vector similarity threshold experiments.

Through tests, we must identify the threshold T after determining W. Model sensitivity to vector similarity depends on threshold T. To achieve a higher level of accuracy, the model demands a higher degree of vector similarity when the threshold is raised. False positives may grow with a low threshold and false negatives with a high one. There is no perfect threshold value. Beyond affecting model complexity, the threshold can have a negative impact on model performance if set too high or too low.

We also examined the potential risk of overfitting in the GAN-generated synthetic contracts. Since these contracts are derived from a small training set, there is a chance that the generator could produce overly similar instances, reducing generalizability. To mitigate this, we injected noise variability into the generator’s latent space and applied dropout regularization in the discriminator during training. In future work, we plan to adopt adversarial validation techniques and external datasets to further test the robustness of the model against synthetic overfitting bias.

Figure 8 displays the lab’s concluding findings. We attained detection accuracy and generalisability by calibrating the model threshold to 0.9, effectively balancing the FP and FN rates. Our methodology preserves necessary structural and semantic information while converting SCs into small vector representations via a code embedding method. This method improves the efficiency and potency of vulnerability identification.

Conclusion

This research has proved that a one-of-a-kind method for locating integer overflow vulnerabilities in SCs is both valuable and empirically validated. This was accomplished through the use of code embedding and GANs. It can be concluded that the strategy’s effectiveness has been demonstrated throughout this work. The suggested way to get around the big problem of not having enough data in SC security research is by using GANs to make fake contract vector data that maintains real-world contracts’ structural and semantic properties. The structure and semantic parts of real-world contracts are kept, which makes this possible.

This indicates that the technique can help address the challenge of data scarcity in smart contract vulnerability detection. By combining discriminator feedback with vector similarity analysis, the proposed approach can uncover vulnerabilities even with limited training data. While the results demonstrate promising accuracy, further validation using additional tools such as Mythril and Slither, as well as metrics like precision, recall, and F1-score, will be necessary to comprehensively assess and benchmark the method’s performance.

Compared to baseline tools, our method improves detection accuracy by 12.4% over sFuzz and 18.1% over Oyente. These gains are accompanied by stronger F1-score and precision values, indicating more balanced performance. To validate the specific contribution of GAN-generated synthetic data, we performed an ablation study comparing detection results with and without synthetic vectors. The inclusion of synthetic data improved the F1-score from 0.84 to 0.91, demonstrating the effectiveness of GANs in mitigating data scarcity.

Before SCs are implemented, this method offers a valuable substitute to improve the security of SCs and lower the risk of significant financial losses.