Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A crowdsensing intrusion detection dataset for decentralized federated learning models
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 03 April 2026

A crowdsensing intrusion detection dataset for decentralized federated learning models

  • Chao Feng  ORCID: orcid.org/0000-0002-0672-10901,
  • Alberto Huertas Celdrán1,2,
  • Jing Han1,
  • Heqing Ren1,
  • Xi Cheng1,
  • Zien Zeng1,
  • Lucas Krauter1,
  • Gérôme Bovet3 &
  • …
  • Burkhard Stiller1 

Scientific Data , Article number:  (2026) Cite this article

  • 240 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Electrical and electronic engineering

Abstract

This paper introduces a dataset and an experimental study on Decentralized Federated Learning (DFL) for Internet of Things (IoT) crowdsensing malware detection. The dataset comprises behavioral records from benign and eight malware attacks. A total of 21,582,484 original records were collected from system calls, file system activities, resource usage, kernel events, input/output events, and network records. These records were aggregated into 30-second windows, resulting in 342,106 data records used for model training and evaluation. Experiments on the DFL platform compare traditional Machine Learning (ML), Centralized Federated Learning (CFL), and DFL across different node counts, topologies, and data distributions. Results show that DFL maintains competitive performance while preserving data locality, outperforming CFL in most settings. This dataset provides a solid foundation for studying the security of IoT crowdsensing environments.

Similar content being viewed by others

Enhancing security in IoMT using federated TinyGAN for lightweight and accurate malware detection

Article Open access 03 February 2026

Dataset-centric evaluation of federated intrusion detection models in IoT networks

Article Open access 16 January 2026

Intelligent deep federated learning model for enhancing security in internet of things enabled edge computing environment

Article Open access 03 February 2025

Data availability

The complete dataset is available from the Science Data Bank29, comprising CSV files organized by device and label, with each file containing preprocessed behavioral records and the corresponding extracted features.

Code availability

The scripts for data collection are available at: https://github.com/Cyber-Tracer/MalwareDetectionDataset and the scripts for data processing and model training are provided at: https://github.com/Cyber-Tracer/iot-feature-engineering.

References

  1. Celdrán, A. H. et al. Intelligent and behavioral-based detection of malware in IoT spectrum sensors. International Journal of Information Security 22, 541–561, https://doi.org/10.1007/s10207-022-00602-w (2023).

    Google Scholar 

  2. Celdrán, A. H. et al. Privacy-preserving and syscall-based intrusion detection system for IoT spectrum sensors affected by data falsification attacks. IEEE Internet of Things Journal 10, 8408–8415, https://doi.org/10.1109/JIOT.2022.3213889 (2023).

    Google Scholar 

  3. Rajendran, S. et al. ElectroSense: Open and big spectrum data. IEEE Communications Magazine 56, 210–217, https://doi.org/10.1109/MCOM.2017.1700200 (2017).

    Google Scholar 

  4. Beltrán, E. T. M. et al. Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges. IEEE Communications Surveys & Tutorials 25, 2983–3013, https://doi.org/10.1109/COMST.2023.3315746 (2023).

    Google Scholar 

  5. Nguyen, T. D. et al. DÏoT: A federated self-learning anomaly detection system for IoT. In Proc. IEEE ICDCS, 756-767 https://doi.org/10.1109/ICDCS.2019.00080 (2019).

  6. Rey, V. et al. Federated learning for malware detection in IoT devices. Computer Networks 204, 108693, https://doi.org/10.1016/j.comnet.2021.108693 (2022).

    Google Scholar 

  7. Tavallaee, M. et al. A detailed analysis of the KDD CUP 99 data set. In Proc. IEEE CISDA, 1-6, https://doi.org/10.1109/CISDA.2009.5356528 (2009).

  8. Sharafaldin, I., Lashkari, A. H. & Ghorbani, A. A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proc. ICISSP, 108–116, https://doi.org/10.5220/0006639801080116 (2018).

  9. Koroniotis, N., Moustafa, N., Sitnikova, E. & Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Generation Computer Systems 100, 779–796, https://doi.org/10.1016/j.future.2019.05.041 (2019).

    Google Scholar 

  10. Alsaedi, A. et al. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 8, 165130–165150, https://doi.org/10.1109/ACCESS.2020.3022862 (2020).

    Google Scholar 

  11. Beltrán, E. T. M. et al. NEBULA - Decentralized federated learning for heterogeneous networks. In Proc. ACM SIGCOMM Posters and Demos, 149–151, https://doi.org/10.1145/3744969.3748413 (2025).

  12. Bashlite botnet source code. GitHub repository. https://github.com/hammerzeit/BASHLITE (accessed 24 February 2026).

  13. HttpBackdoor source code. GitHub repository. https://github.com/SkryptKiddie/httpBackdoor (accessed 24 February 2026).

  14. Backdoor source code. GitHub repository. https://github.com/Cyber-Tracer/Backdoor (accessed 24 February 2026).

  15. TheTick backdoor source code. GitHub repository. https://github.com/nccgroup/thetick (accessed 24 February 2026).

  16. Beurk rootkit source code. GitHub repository. https://github.com/unix-thrust/beurk (accessed 24 February 2026).

  17. Bdvl rootkit source code. GitHub repository. https://github.com/Error996/bdvl (accessed 24 February 2026).

  18. XMRig miner source code. GitHub repository. https://github.com/xmrig/xmrig (accessed 24 February 2026).

  19. Ransomware-PoC source code. GitHub repository. https://github.com/jimmy-ly00/Ransomware-PoC (accessed 24 February 2026).

  20. Ilavarasan, E. & Muthumanickam, K. A survey on host-based botnet identification. In Proc. ICRCC, 166–170 https://doi.org/10.1109/ICRCC.2012.6450569 (2012).

  21. da Costa, V. G. T. et al. Detecting mobile botnets through machine learning and system calls analysis. In Proc. IEEE ICC, 1–6, https://doi.org/10.1109/ICC.2017.7997390 (2017).

  22. Canzanese, R., Mancoridis, S. & Kam, M. System call-based detection of malicious processes. In Proc. IEEE QRS, 119–124, https://doi.org/10.1109/QRS.2015.26 (2015).

  23. Petroni, N. L. Jr., Fraser, T., Molina, J. & Arbaugh, W. A. Copilot: A coprocessor-based kernel runtime integrity monitor. In Proc. 13th USENIX Security Symposium, San Diego, CA (2004).

  24. Carbone, M. et al. Mapping kernel objects to enable systematic integrity checking. In Proc. ACM CCS, 555–565, https://doi.org/10.1145/1653662.1653729 (2009).

  25. Kok, S. et al. Ransomware, threat and detection techniques: A review. International Journal of Computer Science and Network Security 19, 136 (2019).

    Google Scholar 

  26. Barbhuiya, S. et al. RADS: Real-time anomaly detection system for cloud data centres. arXiv 1811.04481 (2018).

  27. Tanana, D. Behavior-based detection of cryptojacking malware. In Proc. USBEREIT, 543–545, https://doi.org/10.1109/USBEREIT48449.2020.9117732 (2020).

  28. Feng, C. et al. DART: A solution for decentralized federated learning model robustness analysis. Array 23, 100360, https://doi.org/10.1016/j.array.2024.100360 (2024).

    Google Scholar 

  29. Feng, C. et al. IoT intrusion detection dataset for decentralized federated learning. Science Data Bank https://doi.org/10.57760/sciencedb.25380 (2025).

  30. Meidan, Y. et al. N-BaIoT—network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Computing 17, 12–22, https://doi.org/10.1109/MPRV.2018.03367731 (2018).

    Google Scholar 

  31. Bezerra, V. H. et al. IoTDS: A one-class classification approach to detect botnets in Internet of Things devices. Sensors 19, 3188, https://doi.org/10.3390/s19143188 (2019).

    Google Scholar 

  32. Martinelli, F., Mercaldo, F. & Saracino, A. Bridemaid: An hybrid tool for accurate detection of Android malware. In Proc. ACM AsiaCCS, 899-901 https://doi.org/10.1145/3052973.3055156 (2017).

  33. Saracino, A. et al. MADAM: Effective and efficient behavior-based Android malware detection and prevention. IEEE Transactions on Dependable and Secure Computing 15, 83–97, https://doi.org/10.1109/TDSC.2016.2536605 (2018).

    Google Scholar 

  34. Zhang, Y. & Paxson, V. Detecting backdoors. In Proc. USENIX Security Symposium, 1–15 (2000).

  35. Hoglund, G. & Butler, J. Rootkits: Subverting the Windows Kernel. Addison-Wesley (2005).

  36. Kruegel, C., Robertson, W. & Vigna, G. Detecting kernel-level rootkits through binary analysis. In Proc. ACSAC, 91-100 https://doi.org/10.1109/CSAC.2004.19 (2004).

  37. Baliga, A., Ganapathy, V. & Iftode, L. Automatic inference and enforcement of kernel data structure invariants. In Proc. ACSAC, 77–86, https://doi.org/10.1109/ACSAC.2008.29 (2008).

  38. Baliga, A., Ganapathy, V. & Iftode, L. Detecting kernel-level rootkits using data structure invariants. IEEE Transactions on Dependable and Secure Computing 8, 670–684, https://doi.org/10.1109/TDSC.2010.38 (2011).

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Swiss Federal Office for Defense Procurement (armasuisse) under the CyberDFL project (CYD-C-2020003) and by the University of Zürich UZH.

Author information

Authors and Affiliations

  1. Communication Systems Group, Department of Informatics, University of Zurich, 8050, Zürich, Switzerland

    Chao Feng, Alberto Huertas Celdrán, Jing Han, Heqing Ren, Xi Cheng, Zien Zeng, Lucas Krauter & Burkhard Stiller

  2. Department of Information and Communications Engineering, University of Murcia, 30100, Murcia, Spain

    Alberto Huertas Celdrán

  3. Cyber-Defence Campus, armasuisse Science & Technology, 3602, Thun, Switzerland

    Gérôme Bovet

Authors
  1. Chao Feng
    View author publications

    Search author on:PubMed Google Scholar

  2. Alberto Huertas Celdrán
    View author publications

    Search author on:PubMed Google Scholar

  3. Jing Han
    View author publications

    Search author on:PubMed Google Scholar

  4. Heqing Ren
    View author publications

    Search author on:PubMed Google Scholar

  5. Xi Cheng
    View author publications

    Search author on:PubMed Google Scholar

  6. Zien Zeng
    View author publications

    Search author on:PubMed Google Scholar

  7. Lucas Krauter
    View author publications

    Search author on:PubMed Google Scholar

  8. Gérôme Bovet
    View author publications

    Search author on:PubMed Google Scholar

  9. Burkhard Stiller
    View author publications

    Search author on:PubMed Google Scholar

Contributions

C.F. and A.H.C. drafted the manuscript with contributions from all authors. J.H., H.Q., X.C., Z.Z., and L.K. performed the experiments and data analysis. G.B. conceived and designed the study. B.S. provided overall supervision. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Chao Feng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, C., Huertas Celdrán, A., Han, J. et al. A crowdsensing intrusion detection dataset for decentralized federated learning models. Sci Data (2026). https://doi.org/10.1038/s41597-026-07155-w

Download citation

  • Received: 02 September 2025

  • Accepted: 27 March 2026

  • Published: 03 April 2026

  • DOI: https://doi.org/10.1038/s41597-026-07155-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing