Abstract
Smart Contract (SC) vulnerabilities are programming errors or design flaws that can lead to financial loss or functional failure, making accurate detection essential. Although Machine Learning (ML) is widely applied to SC vulnerability detection, existing datasets are often small, imbalanced, inconsistently labeled, or nonstandardized, and frequently rely on limited feature representations that do not account for different contract lifecycle stages, restricting generalization and degrading benchmark reliability. This study introduces DIVE, a multi-label dataset that addresses these structural and feature-level limitations. DIVE includes 22,330 real-world SCs deployed between 2016 and 2024, and spanning major Solidity compiler versions, annotated for eight vulnerability types aligned with the Decentralized Application Security Project (DASP) Top 10 taxonomy. It provides 221 pre-deployment and 176 post-deployment features and employs a standardized multi-tool labeling pipeline based on Power-based voting and post-hoc filtering, which corrected 14.3% false positives in DoS and 24.9% in Time Manipulation. Unlike prior datasets, DIVE offers two lifecycle-specific feature sets and an open-source framework enabling reproducible benchmarking and periodic reconstruction aligned with evolving vulnerability patterns.
Similar content being viewed by others
Data availability
The dataset generated using the DIVE framework is publicly available on the Zenodo25 under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. The repository includes the dataset, the associated metadata registry, and data profiling information to support understanding and reuse.
Code availability
The open-source DIVE framework is available on Zenodo36 under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license. The repository includes documentation and usage instructions.
References
Bocek, T. & Stiller, B. Smart contracts–blockchains in the wings. In Linnhoff-Popien, C., Schneider, R. & Zaddach, M. (eds.) Digital Marketplaces Unleashed, 169–184, https://doi.org/10.1007/978-3-662-49275-8_19 (Springer, Berlin, 2018).
Zheng, Z. et al. An overview on smart contracts: Challenges, advances and platforms. Future Generation Computer Systems 105, 475–491, https://doi.org/10.1016/j.future.2019.12.019 (2020).
Kushwaha, S. S., Joshi, S., Singh, D., Kaur, M. & Lee, H.-N. Systematic review of security vulnerabilities in ethereum blockchain smart contract. IEEE Access 10, 6605–6621, https://doi.org/10.1109/ACCESS.2021.3140091 (2022).
Alsunaidi, S. J., Aljamaan, H. & Hammoudeh, M. MultiTagging: A vulnerable smart contract labeling and evaluation framework. Electronics 13, 4616, https://doi.org/10.3390/electronics13234616 (2024).
Ivanov, N. et al. Security threat mitigation for smart contracts: A comprehensive survey. ACM Computing Surveys 55, 1–37, https://doi.org/10.1145/3593293 (2023).
Jiang, F. et al. Enhancing smart-contract security through machine learning: A survey of approaches and techniques. Electronics 12, 2046, https://doi.org/10.3390/electronics12092046 (2023).
Alsunaidi, S. J. & Alhaidari, F. A. A survey of consensus algorithms for blockchain technology. In 2019 International Conference on Computer and Information Sciences (ICCIS), 1–6, https://doi.org/10.1109/ICCISci.2019.8716424 (IEEE, 2019).
Liao, J.-W., Tsai, T.-T., He, C.-K. & Tien, C.-W. Soliaudit: Smart contract vulnerability assessment based on machine learning and fuzz testing. In 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS), 458–465, https://doi.org/10.1109/IOTSMS48152.2019.8939256 (IEEE, 2019).
Tann, W. J.-W., Han, X. J., Gupta, S. S. & Ong, Y.-S. Towards safer smart contracts: A sequence learning approach to detecting security threats. Preprint at https://doi.org/10.48550/arXiv.1811.06632 (2018).
Qian, P., Liu, Z., Yin, Y. & He, Q. Cross-modality mutual learning for enhancing smart contract vulnerability detection on bytecode. In Proceedings of the ACM Web Conference 2023, 2220–2229, https://doi.org/10.1145/3543507.3583367 (2023).
Zhang, L. et al. SPCBIG-EC: a robust serial hybrid model for smart contract vulnerability detection. Sensors 22, 4621, https://doi.org/10.3390/s22124621 (2022).
Rossini, M. Slither audited smart contracts dataset. Hugging Face https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts (2022).
Yashavant, C. S., Kumar, S. & Karkare, A. Scrawld: A dataset of real world ethereum smart contracts labelled with vulnerabilities. Preprint at https://doi.org/10.48550/arXiv.2202.11409 (2022).
Liu, Z. et al. Rethinking smart contract fuzzing: Fuzzing with invocation ordering and important branch revisiting. IEEE Transactions on Information Forensics and Security 18, 1237–1251, https://doi.org/10.1109/TIFS.2023.3237370 (2023).
Storhaug, A. Vulnerable verified smart contracts (v2). figshare https://doi.org/10.6084/m9.figshare.21990287.v2 (2023).
Luo, F. et al. Scvhunter: Smart contract vulnerability detection based on heterogeneous graph attention network. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 1–13, https://doi.org/10.1145/3597503.3639213 (2024).
Durieux, T., Ferreira, J. F., Abreu, R. & Cruz, P. Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In Proceedings of the ACM/IEEE 42nd International conference on software engineering, 530–541, https://doi.org/10.1145/3377811.3380364 (2020).
Wang, Y., Zhao, X., He, L., Zhen, Z. & Chen, H. ContractGNN: Ethereum smart contract vulnerability detection based on vulnerability sub-graphs and graph neural networks. IEEE Transactions on Network Science and Engineering 11, 6382–6395, https://doi.org/10.1109/TNSE.2024.3470788 (2024).
Zheng, Z. et al. DAppSCAN: Building large-scale datasets for smart contract weaknesses in dapp projects. IEEE Transactions on Software Engineering 50, 1360–1373, https://doi.org/10.1109/TSE.2024.3383422 (2024).
Malik, C. eth-reputable-illicit-sc-code. Hugging Face https://huggingface.co/datasets/malikcyrus/eth-reputable-illicit-sc-code (2024).
Ibba, G. et al. A curated solidity smart contracts repository of metrics and vulnerability. In Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering, 32–41, https://doi.org/10.1145/3663533.3664039 (2024).
Forta. token-impersonation-dataset. Hugging Face https://huggingface.co/datasets/forta/token-impersonation-dataset (2023).
Ajienka, N. Supervised machine learning for smart contract vulnerability prediction (v2). figshare https://doi.org/10.6084/m9.figshare.13417316.v2 (2020).
Eshghie, M., Artho, C. & Gurov, D. Dynamic vulnerability detection on smart contracts using machine learning. In Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering, 305–312, https://doi.org/10.1145/3463274.3463348 (2021).
Alsunaidi, S., Aljamaan, H. & Hammoudeh, M. DIVE: A multi-label smart contract vulnerability dataset. Zenodo https://doi.org/10.5281/zenodo.18519253 (2026).
Ghaleb, A. & Pattabiraman, K. How effective are smart contract analysis tools? evaluating smart contract static analysis tools using bug injection. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 415–427, https://doi.org/10.1145/3395363.3397385 (2020).
Zhou, H., Milani Fard, A. & Makanju, A. The state of ethereum smart contracts security: Vulnerabilities, countermeasures, and tool support. Journal of Cybersecurity and Privacy 2, 358–378, https://doi.org/10.3390/jcp2020019 (2022).
Lakadawala, H., Dzigbede, K. & Chen, Y. Detecting reentrancy vulnerability in smart contracts using graph convolution networks. In 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), 188–193, https://doi.org/10.1109/CCNC51664.2024.10454763 (IEEE, Las Vegas, NV, USA, 2024).
Mezina, A. & Ometov, A. Detecting smart contract vulnerabilities with combined binary and multiclass classification. Cryptography 7, 34, https://doi.org/10.3390/cryptography7030034 (2023).
Jain, V. K. & Tripathi, M. An integrated deep learning model for ethereum smart contract vulnerability detection. International Journal of Information Security 23, 557–575, https://doi.org/10.1007/s10207-023-00752-5 (2024).
Sun, X. et al. ASSBert: Active and semi-supervised bert for smart contract vulnerability detection. Journal of Information Security and Applications 73, 103423, https://doi.org/10.1016/j.jisa.2023.103423 (2023).
HajiHosseinKhani, S., Lashkari, A. H. & Oskui, A. M. Unveiling smart contracts vulnerabilities: Toward profiling smart contracts vulnerabilities using enhanced genetic algorithm and generating benchmark dataset. Blockchain: Research and Applications 6, 100253, https://doi.org/10.1016/j.bcra.2024.100253 (2024).
Lê Hùng, B. et al. Contextual language model and transfer learning for reentrancy vulnerability detection in smart contracts. In Proceedings of the 12th International Symposium on Information and Communication Technology, 739–745, https://doi.org/10.1145/3628797.3628945 (ACM, New York, NY, USA, 2023).
Deng, W. et al. Smart contract vulnerability detection based on deep learning and multimodal decision fusion. Sensors 23, 7246, https://doi.org/10.3390/s23167246 (2023).
Yang, Z., Zhu, W. & Yu, M. Improvement and optimization of vulnerability detection methods for ethernet smart contracts. IEEE Access 11, 78207–78223, https://doi.org/10.1109/ACCESS.2023.3298672 (2023).
DIVE Framework. DIVE (version v2.0.0). Zenodo https://doi.org/10.5281/zenodo.18779606 (2026).
Ortner, M. & Eskandari, S. Smart contract sanctuary–Ethereum. GitHub https://github.com/tintinweb/smart-contract-sanctuary-ethereum (2023).
Alsunaidi, S. J., Aljamaan, H. & Hammoudeh, M. Leveraging machine learning models to improve smart contract security: A survey of vulnerabilities and detection methods. ACM Computing Surveys 58, 1–37, https://doi.org/10.1145/3772367 (2025).
Yu, R., Shu, J., Yan, D. & Jia, X. Redetect: Reentrancy vulnerability detection in smart contracts with high accuracy. In 2021 17th International Conference on Mobility, Sensing and Networking (MSN), 412–419, https://doi.org/10.1109/MSN53354.2021.00069 (IEEE, 2021).
Zheng, Z. et al. Turn the rudder: A beacon of reentrancy detection for smart contracts on ethereum. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 295–306, https://doi.org/10.1109/ICSE48619.2023.00036 (IEEE, 2023).
Label post-hoc validator. GitHub https://github.com/DIVE4Data/Label-Post-hoc-Validator (2026).
Wood, G. Ethereum: A secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper https://mholende.win.tue.nl/seminar/references/ethereum_yellowpaper.pdf (2014).
Ethereum Foundation. Ethereum virtual machine (EVM) opcodes documentation https://ethereum.org/en/developers/docs/evm/opcodes/ (2025).
Rawlekar, S., Bhatnagar, S., Srinivasulu, V. P. & Ahuja, N. Improving multi-label recognition using class co-occurrence probabilities. In International Conference on Pattern Recognition, 424–439, https://doi.org/10.1007/978-3-031-78192-6_28 (Springer, 2024).
Grabot, B. Rule mining in maintenance: Analysing large knowledge bases. Computers & Industrial Engineering 139, 105501, https://doi.org/10.1016/j.cie.2018.11.011 (2020).
ISO/IEC. ISO/IEC 25012: Software engineering—software product quality requirements and evaluation (SQuaRE)—data quality model. International Standard https://www.iso.org/obp/ui/#iso:std:iso-iec:25012:ed-1:v1:en (2008).
Croft, R., Babar, M. A. & Kholoosi, M. M. Data quality for software vulnerability datasets. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 121–133, https://doi.org/10.1109/ICSE48619.2023.00022 (IEEE, 2023).
Bylica, P. How to find $10m just by reading the blockchain (2017).
Hu, T. A benchmark dataset of solidity smart contracts. Zenodo https://doi.org/10.5281/zenodo.7744053 (2023).
Acknowledgements
The authors would like to acknowledge the support of the King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia, in the development of this work.
Author information
Authors and Affiliations
Contributions
Conceptualization: S.J.A., H.A., M.H., Data curation: S.J.A., Formal analysis: S.J.A., Investigation: S.J.A., Methodology: S.J.A., H.A., Resources: S.J.A., Software: S.J.A., Supervision: H.A., M.H., Validation: S.J.A., H.A., M.H., Visualization: S.J.A., Writing—original draft: S.J.A., Writing–review & editing: S.J.A., H.A., M.H.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Alsunaidi, S.J., Aljamaan, H. & Hammoudeh, M. DIVE: A Multi-Label Smart Contract Vulnerability Dataset. Sci Data (2026). https://doi.org/10.1038/s41597-026-07025-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07025-5


