Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
DIVE: A Multi-Label Smart Contract Vulnerability Dataset
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 12 March 2026

DIVE: A Multi-Label Smart Contract Vulnerability Dataset

  • Shikah J. Alsunaidi  ORCID: orcid.org/0000-0002-8262-50251,
  • Hamoud Aljamaan  ORCID: orcid.org/0000-0002-2146-93481,2 &
  • Mohammad Hammoudeh1,3 

Scientific Data , Article number:  (2026) Cite this article

  • 1472 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computer science
  • Scientific data

Abstract

Smart Contract (SC) vulnerabilities are programming errors or design flaws that can lead to financial loss or functional failure, making accurate detection essential. Although Machine Learning (ML) is widely applied to SC vulnerability detection, existing datasets are often small, imbalanced, inconsistently labeled, or nonstandardized, and frequently rely on limited feature representations that do not account for different contract lifecycle stages, restricting generalization and degrading benchmark reliability. This study introduces DIVE, a multi-label dataset that addresses these structural and feature-level limitations. DIVE includes 22,330 real-world SCs deployed between 2016 and 2024, and spanning major Solidity compiler versions, annotated for eight vulnerability types aligned with the Decentralized Application Security Project (DASP) Top 10 taxonomy. It provides 221 pre-deployment and 176 post-deployment features and employs a standardized multi-tool labeling pipeline based on Power-based voting and post-hoc filtering, which corrected 14.3% false positives in DoS and 24.9% in Time Manipulation. Unlike prior datasets, DIVE offers two lifecycle-specific feature sets and an open-source framework enabling reproducible benchmarking and periodic reconstruction aligned with evolving vulnerability patterns.

Similar content being viewed by others

Explainable machine learning and ensemble models for predicting fresh properties of self consolidating concrete

Article Open access 26 November 2025

Ensemble-based detection of distributed denial-of-service attacks in IoT networks using majority decision mechanisms

Article Open access 27 March 2026

Multiscale detection of power quality disturbances and cyber intrusions in smart grids using NSCT and frequency band scalograms

Article Open access 05 September 2025

Data availability

The dataset generated using the DIVE framework is publicly available on the Zenodo25 under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. The repository includes the dataset, the associated metadata registry, and data profiling information to support understanding and reuse.

Code availability

The open-source DIVE framework is available on Zenodo36 under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license. The repository includes documentation and usage instructions.

References

  1. Bocek, T. & Stiller, B. Smart contracts–blockchains in the wings. In Linnhoff-Popien, C., Schneider, R. & Zaddach, M. (eds.) Digital Marketplaces Unleashed, 169–184, https://doi.org/10.1007/978-3-662-49275-8_19 (Springer, Berlin, 2018).

  2. Zheng, Z. et al. An overview on smart contracts: Challenges, advances and platforms. Future Generation Computer Systems 105, 475–491, https://doi.org/10.1016/j.future.2019.12.019 (2020).

    Google Scholar 

  3. Kushwaha, S. S., Joshi, S., Singh, D., Kaur, M. & Lee, H.-N. Systematic review of security vulnerabilities in ethereum blockchain smart contract. IEEE Access 10, 6605–6621, https://doi.org/10.1109/ACCESS.2021.3140091 (2022).

    Google Scholar 

  4. Alsunaidi, S. J., Aljamaan, H. & Hammoudeh, M. MultiTagging: A vulnerable smart contract labeling and evaluation framework. Electronics 13, 4616, https://doi.org/10.3390/electronics13234616 (2024).

    Google Scholar 

  5. Ivanov, N. et al. Security threat mitigation for smart contracts: A comprehensive survey. ACM Computing Surveys 55, 1–37, https://doi.org/10.1145/3593293 (2023).

    Google Scholar 

  6. Jiang, F. et al. Enhancing smart-contract security through machine learning: A survey of approaches and techniques. Electronics 12, 2046, https://doi.org/10.3390/electronics12092046 (2023).

    Google Scholar 

  7. Alsunaidi, S. J. & Alhaidari, F. A. A survey of consensus algorithms for blockchain technology. In 2019 International Conference on Computer and Information Sciences (ICCIS), 1–6, https://doi.org/10.1109/ICCISci.2019.8716424 (IEEE, 2019).

  8. Liao, J.-W., Tsai, T.-T., He, C.-K. & Tien, C.-W. Soliaudit: Smart contract vulnerability assessment based on machine learning and fuzz testing. In 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS), 458–465, https://doi.org/10.1109/IOTSMS48152.2019.8939256 (IEEE, 2019).

  9. Tann, W. J.-W., Han, X. J., Gupta, S. S. & Ong, Y.-S. Towards safer smart contracts: A sequence learning approach to detecting security threats. Preprint at https://doi.org/10.48550/arXiv.1811.06632 (2018).

  10. Qian, P., Liu, Z., Yin, Y. & He, Q. Cross-modality mutual learning for enhancing smart contract vulnerability detection on bytecode. In Proceedings of the ACM Web Conference 2023, 2220–2229, https://doi.org/10.1145/3543507.3583367 (2023).

  11. Zhang, L. et al. SPCBIG-EC: a robust serial hybrid model for smart contract vulnerability detection. Sensors 22, 4621, https://doi.org/10.3390/s22124621 (2022).

    Google Scholar 

  12. Rossini, M. Slither audited smart contracts dataset. Hugging Face https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts (2022).

  13. Yashavant, C. S., Kumar, S. & Karkare, A. Scrawld: A dataset of real world ethereum smart contracts labelled with vulnerabilities. Preprint at https://doi.org/10.48550/arXiv.2202.11409 (2022).

  14. Liu, Z. et al. Rethinking smart contract fuzzing: Fuzzing with invocation ordering and important branch revisiting. IEEE Transactions on Information Forensics and Security 18, 1237–1251, https://doi.org/10.1109/TIFS.2023.3237370 (2023).

    Google Scholar 

  15. Storhaug, A. Vulnerable verified smart contracts (v2). figshare https://doi.org/10.6084/m9.figshare.21990287.v2 (2023).

  16. Luo, F. et al. Scvhunter: Smart contract vulnerability detection based on heterogeneous graph attention network. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 1–13, https://doi.org/10.1145/3597503.3639213 (2024).

  17. Durieux, T., Ferreira, J. F., Abreu, R. & Cruz, P. Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In Proceedings of the ACM/IEEE 42nd International conference on software engineering, 530–541, https://doi.org/10.1145/3377811.3380364 (2020).

  18. Wang, Y., Zhao, X., He, L., Zhen, Z. & Chen, H. ContractGNN: Ethereum smart contract vulnerability detection based on vulnerability sub-graphs and graph neural networks. IEEE Transactions on Network Science and Engineering 11, 6382–6395, https://doi.org/10.1109/TNSE.2024.3470788 (2024).

    Google Scholar 

  19. Zheng, Z. et al. DAppSCAN: Building large-scale datasets for smart contract weaknesses in dapp projects. IEEE Transactions on Software Engineering 50, 1360–1373, https://doi.org/10.1109/TSE.2024.3383422 (2024).

    Google Scholar 

  20. Malik, C. eth-reputable-illicit-sc-code. Hugging Face https://huggingface.co/datasets/malikcyrus/eth-reputable-illicit-sc-code (2024).

  21. Ibba, G. et al. A curated solidity smart contracts repository of metrics and vulnerability. In Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering, 32–41, https://doi.org/10.1145/3663533.3664039 (2024).

  22. Forta. token-impersonation-dataset. Hugging Face https://huggingface.co/datasets/forta/token-impersonation-dataset (2023).

  23. Ajienka, N. Supervised machine learning for smart contract vulnerability prediction (v2). figshare https://doi.org/10.6084/m9.figshare.13417316.v2 (2020).

  24. Eshghie, M., Artho, C. & Gurov, D. Dynamic vulnerability detection on smart contracts using machine learning. In Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering, 305–312, https://doi.org/10.1145/3463274.3463348 (2021).

  25. Alsunaidi, S., Aljamaan, H. & Hammoudeh, M. DIVE: A multi-label smart contract vulnerability dataset. Zenodo https://doi.org/10.5281/zenodo.18519253 (2026).

  26. Ghaleb, A. & Pattabiraman, K. How effective are smart contract analysis tools? evaluating smart contract static analysis tools using bug injection. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 415–427, https://doi.org/10.1145/3395363.3397385 (2020).

  27. Zhou, H., Milani Fard, A. & Makanju, A. The state of ethereum smart contracts security: Vulnerabilities, countermeasures, and tool support. Journal of Cybersecurity and Privacy 2, 358–378, https://doi.org/10.3390/jcp2020019 (2022).

    Google Scholar 

  28. Lakadawala, H., Dzigbede, K. & Chen, Y. Detecting reentrancy vulnerability in smart contracts using graph convolution networks. In 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), 188–193, https://doi.org/10.1109/CCNC51664.2024.10454763 (IEEE, Las Vegas, NV, USA, 2024).

  29. Mezina, A. & Ometov, A. Detecting smart contract vulnerabilities with combined binary and multiclass classification. Cryptography 7, 34, https://doi.org/10.3390/cryptography7030034 (2023).

    Google Scholar 

  30. Jain, V. K. & Tripathi, M. An integrated deep learning model for ethereum smart contract vulnerability detection. International Journal of Information Security 23, 557–575, https://doi.org/10.1007/s10207-023-00752-5 (2024).

    Google Scholar 

  31. Sun, X. et al. ASSBert: Active and semi-supervised bert for smart contract vulnerability detection. Journal of Information Security and Applications 73, 103423, https://doi.org/10.1016/j.jisa.2023.103423 (2023).

    Google Scholar 

  32. HajiHosseinKhani, S., Lashkari, A. H. & Oskui, A. M. Unveiling smart contracts vulnerabilities: Toward profiling smart contracts vulnerabilities using enhanced genetic algorithm and generating benchmark dataset. Blockchain: Research and Applications 6, 100253, https://doi.org/10.1016/j.bcra.2024.100253 (2024).

    Google Scholar 

  33. Lê Hùng, B. et al. Contextual language model and transfer learning for reentrancy vulnerability detection in smart contracts. In Proceedings of the 12th International Symposium on Information and Communication Technology, 739–745, https://doi.org/10.1145/3628797.3628945 (ACM, New York, NY, USA, 2023).

  34. Deng, W. et al. Smart contract vulnerability detection based on deep learning and multimodal decision fusion. Sensors 23, 7246, https://doi.org/10.3390/s23167246 (2023).

    Google Scholar 

  35. Yang, Z., Zhu, W. & Yu, M. Improvement and optimization of vulnerability detection methods for ethernet smart contracts. IEEE Access 11, 78207–78223, https://doi.org/10.1109/ACCESS.2023.3298672 (2023).

    Google Scholar 

  36. DIVE Framework. DIVE (version v2.0.0). Zenodo https://doi.org/10.5281/zenodo.18779606 (2026).

  37. Ortner, M. & Eskandari, S. Smart contract sanctuary–Ethereum. GitHub https://github.com/tintinweb/smart-contract-sanctuary-ethereum (2023).

  38. Alsunaidi, S. J., Aljamaan, H. & Hammoudeh, M. Leveraging machine learning models to improve smart contract security: A survey of vulnerabilities and detection methods. ACM Computing Surveys 58, 1–37, https://doi.org/10.1145/3772367 (2025).

    Google Scholar 

  39. Yu, R., Shu, J., Yan, D. & Jia, X. Redetect: Reentrancy vulnerability detection in smart contracts with high accuracy. In 2021 17th International Conference on Mobility, Sensing and Networking (MSN), 412–419, https://doi.org/10.1109/MSN53354.2021.00069 (IEEE, 2021).

  40. Zheng, Z. et al. Turn the rudder: A beacon of reentrancy detection for smart contracts on ethereum. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 295–306, https://doi.org/10.1109/ICSE48619.2023.00036 (IEEE, 2023).

  41. Label post-hoc validator. GitHub https://github.com/DIVE4Data/Label-Post-hoc-Validator (2026).

  42. Wood, G. Ethereum: A secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper https://mholende.win.tue.nl/seminar/references/ethereum_yellowpaper.pdf (2014).

  43. Ethereum Foundation. Ethereum virtual machine (EVM) opcodes documentation https://ethereum.org/en/developers/docs/evm/opcodes/ (2025).

  44. Rawlekar, S., Bhatnagar, S., Srinivasulu, V. P. & Ahuja, N. Improving multi-label recognition using class co-occurrence probabilities. In International Conference on Pattern Recognition, 424–439, https://doi.org/10.1007/978-3-031-78192-6_28 (Springer, 2024).

  45. Grabot, B. Rule mining in maintenance: Analysing large knowledge bases. Computers & Industrial Engineering 139, 105501, https://doi.org/10.1016/j.cie.2018.11.011 (2020).

    Google Scholar 

  46. ISO/IEC. ISO/IEC 25012: Software engineering—software product quality requirements and evaluation (SQuaRE)—data quality model. International Standard https://www.iso.org/obp/ui/#iso:std:iso-iec:25012:ed-1:v1:en (2008).

  47. Croft, R., Babar, M. A. & Kholoosi, M. M. Data quality for software vulnerability datasets. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 121–133, https://doi.org/10.1109/ICSE48619.2023.00022 (IEEE, 2023).

  48. Bylica, P. How to find $10m just by reading the blockchain (2017).

  49. Hu, T. A benchmark dataset of solidity smart contracts. Zenodo https://doi.org/10.5281/zenodo.7744053 (2023).

Download references

Acknowledgements

The authors would like to acknowledge the support of the King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia, in the development of this work.

Author information

Authors and Affiliations

  1. Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, 31261, Saudi Arabia

    Shikah J. Alsunaidi, Hamoud Aljamaan & Mohammad Hammoudeh

  2. Interdisciplinary Research Center for Finance and Digital Economy, King Fahd University of Petroleum and Minerals, Dhahran, 31261, Saudi Arabia

    Hamoud Aljamaan

  3. Interdisciplinary Research Center for Intelligent Secure Systems, King Fahd University of Petroleum and Minerals, Dhahran, 31261, Saudi Arabia

    Mohammad Hammoudeh

Authors
  1. Shikah J. Alsunaidi
    View author publications

    Search author on:PubMed Google Scholar

  2. Hamoud Aljamaan
    View author publications

    Search author on:PubMed Google Scholar

  3. Mohammad Hammoudeh
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization: S.J.A., H.A., M.H., Data curation: S.J.A., Formal analysis: S.J.A., Investigation: S.J.A., Methodology: S.J.A., H.A., Resources: S.J.A., Software: S.J.A., Supervision: H.A., M.H., Validation: S.J.A., H.A., M.H., Visualization: S.J.A., Writing—original draft: S.J.A., Writing–review & editing: S.J.A., H.A., M.H.

Corresponding author

Correspondence to Hamoud Aljamaan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alsunaidi, S.J., Aljamaan, H. & Hammoudeh, M. DIVE: A Multi-Label Smart Contract Vulnerability Dataset. Sci Data (2026). https://doi.org/10.1038/s41597-026-07025-5

Download citation

  • Received: 06 October 2025

  • Accepted: 03 March 2026

  • Published: 12 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-07025-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics