A crowdsensing intrusion detection dataset for decentralized federated learning models

Feng, Chao; Huertas Celdrán, Alberto; Han, Jing; Ren, Heqing; Cheng, Xi; Zeng, Zien; Krauter, Lucas; Bovet, Gérôme; Stiller, Burkhard

doi:10.1038/s41597-026-07155-w

Download PDF

Data Descriptor
Open access
Published: 03 April 2026

A crowdsensing intrusion detection dataset for decentralized federated learning models

Chao Feng ORCID: orcid.org/0000-0002-0672-1090¹,
Alberto Huertas Celdrán^1,2,
Jing Han¹,
Heqing Ren¹,
Xi Cheng¹,
Zien Zeng¹,
Lucas Krauter¹,
Gérôme Bovet³ &
…
Burkhard Stiller¹

Scientific Data , Article number: (2026) Cite this article

240 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Electrical and electronic engineering

Abstract

This paper introduces a dataset and an experimental study on Decentralized Federated Learning (DFL) for Internet of Things (IoT) crowdsensing malware detection. The dataset comprises behavioral records from benign and eight malware attacks. A total of 21,582,484 original records were collected from system calls, file system activities, resource usage, kernel events, input/output events, and network records. These records were aggregated into 30-second windows, resulting in 342,106 data records used for model training and evaluation. Experiments on the DFL platform compare traditional Machine Learning (ML), Centralized Federated Learning (CFL), and DFL across different node counts, topologies, and data distributions. Results show that DFL maintains competitive performance while preserving data locality, outperforming CFL in most settings. This dataset provides a solid foundation for studying the security of IoT crowdsensing environments.

Enhancing security in IoMT using federated TinyGAN for lightweight and accurate malware detection

Article Open access 03 February 2026

Dataset-centric evaluation of federated intrusion detection models in IoT networks

Article Open access 16 January 2026

Intelligent deep federated learning model for enhancing security in internet of things enabled edge computing environment

Article Open access 03 February 2025

Data availability

The complete dataset is available from the Science Data Bank²⁹, comprising CSV files organized by device and label, with each file containing preprocessed behavioral records and the corresponding extracted features.

Code availability

The scripts for data collection are available at: https://github.com/Cyber-Tracer/MalwareDetectionDataset and the scripts for data processing and model training are provided at: https://github.com/Cyber-Tracer/iot-feature-engineering.

References

Celdrán, A. H. et al. Intelligent and behavioral-based detection of malware in IoT spectrum sensors. International Journal of Information Security 22, 541–561, https://doi.org/10.1007/s10207-022-00602-w (2023).
Google Scholar
Celdrán, A. H. et al. Privacy-preserving and syscall-based intrusion detection system for IoT spectrum sensors affected by data falsification attacks. IEEE Internet of Things Journal 10, 8408–8415, https://doi.org/10.1109/JIOT.2022.3213889 (2023).
Google Scholar
Rajendran, S. et al. ElectroSense: Open and big spectrum data. IEEE Communications Magazine 56, 210–217, https://doi.org/10.1109/MCOM.2017.1700200 (2017).
Google Scholar
Beltrán, E. T. M. et al. Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges. IEEE Communications Surveys & Tutorials 25, 2983–3013, https://doi.org/10.1109/COMST.2023.3315746 (2023).
Google Scholar
Nguyen, T. D. et al. DÏoT: A federated self-learning anomaly detection system for IoT. In Proc. IEEE ICDCS, 756-767 https://doi.org/10.1109/ICDCS.2019.00080 (2019).
Rey, V. et al. Federated learning for malware detection in IoT devices. Computer Networks 204, 108693, https://doi.org/10.1016/j.comnet.2021.108693 (2022).
Google Scholar
Tavallaee, M. et al. A detailed analysis of the KDD CUP 99 data set. In Proc. IEEE CISDA, 1-6, https://doi.org/10.1109/CISDA.2009.5356528 (2009).
Sharafaldin, I., Lashkari, A. H. & Ghorbani, A. A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proc. ICISSP, 108–116, https://doi.org/10.5220/0006639801080116 (2018).
Koroniotis, N., Moustafa, N., Sitnikova, E. & Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Generation Computer Systems 100, 779–796, https://doi.org/10.1016/j.future.2019.05.041 (2019).
Google Scholar
Alsaedi, A. et al. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 8, 165130–165150, https://doi.org/10.1109/ACCESS.2020.3022862 (2020).
Google Scholar
Beltrán, E. T. M. et al. NEBULA - Decentralized federated learning for heterogeneous networks. In Proc. ACM SIGCOMM Posters and Demos, 149–151, https://doi.org/10.1145/3744969.3748413 (2025).
Bashlite botnet source code. GitHub repository. https://github.com/hammerzeit/BASHLITE (accessed 24 February 2026).
HttpBackdoor source code. GitHub repository. https://github.com/SkryptKiddie/httpBackdoor (accessed 24 February 2026).
Backdoor source code. GitHub repository. https://github.com/Cyber-Tracer/Backdoor (accessed 24 February 2026).
TheTick backdoor source code. GitHub repository. https://github.com/nccgroup/thetick (accessed 24 February 2026).
Beurk rootkit source code. GitHub repository. https://github.com/unix-thrust/beurk (accessed 24 February 2026).
Bdvl rootkit source code. GitHub repository. https://github.com/Error996/bdvl (accessed 24 February 2026).
XMRig miner source code. GitHub repository. https://github.com/xmrig/xmrig (accessed 24 February 2026).
Ransomware-PoC source code. GitHub repository. https://github.com/jimmy-ly00/Ransomware-PoC (accessed 24 February 2026).
Ilavarasan, E. & Muthumanickam, K. A survey on host-based botnet identification. In Proc. ICRCC, 166–170 https://doi.org/10.1109/ICRCC.2012.6450569 (2012).
da Costa, V. G. T. et al. Detecting mobile botnets through machine learning and system calls analysis. In Proc. IEEE ICC, 1–6, https://doi.org/10.1109/ICC.2017.7997390 (2017).
Canzanese, R., Mancoridis, S. & Kam, M. System call-based detection of malicious processes. In Proc. IEEE QRS, 119–124, https://doi.org/10.1109/QRS.2015.26 (2015).
Petroni, N. L. Jr., Fraser, T., Molina, J. & Arbaugh, W. A. Copilot: A coprocessor-based kernel runtime integrity monitor. In Proc. 13th USENIX Security Symposium, San Diego, CA (2004).
Carbone, M. et al. Mapping kernel objects to enable systematic integrity checking. In Proc. ACM CCS, 555–565, https://doi.org/10.1145/1653662.1653729 (2009).
Kok, S. et al. Ransomware, threat and detection techniques: A review. International Journal of Computer Science and Network Security 19, 136 (2019).
Google Scholar
Barbhuiya, S. et al. RADS: Real-time anomaly detection system for cloud data centres. arXiv 1811.04481 (2018).
Tanana, D. Behavior-based detection of cryptojacking malware. In Proc. USBEREIT, 543–545, https://doi.org/10.1109/USBEREIT48449.2020.9117732 (2020).
Feng, C. et al. DART: A solution for decentralized federated learning model robustness analysis. Array 23, 100360, https://doi.org/10.1016/j.array.2024.100360 (2024).
Google Scholar
Feng, C. et al. IoT intrusion detection dataset for decentralized federated learning. Science Data Bank https://doi.org/10.57760/sciencedb.25380 (2025).
Meidan, Y. et al. N-BaIoT—network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Computing 17, 12–22, https://doi.org/10.1109/MPRV.2018.03367731 (2018).
Google Scholar
Bezerra, V. H. et al. IoTDS: A one-class classification approach to detect botnets in Internet of Things devices. Sensors 19, 3188, https://doi.org/10.3390/s19143188 (2019).
Google Scholar
Martinelli, F., Mercaldo, F. & Saracino, A. Bridemaid: An hybrid tool for accurate detection of Android malware. In Proc. ACM AsiaCCS, 899-901 https://doi.org/10.1145/3052973.3055156 (2017).
Saracino, A. et al. MADAM: Effective and efficient behavior-based Android malware detection and prevention. IEEE Transactions on Dependable and Secure Computing 15, 83–97, https://doi.org/10.1109/TDSC.2016.2536605 (2018).
Google Scholar
Zhang, Y. & Paxson, V. Detecting backdoors. In Proc. USENIX Security Symposium, 1–15 (2000).
Hoglund, G. & Butler, J. Rootkits: Subverting the Windows Kernel. Addison-Wesley (2005).
Kruegel, C., Robertson, W. & Vigna, G. Detecting kernel-level rootkits through binary analysis. In Proc. ACSAC, 91-100 https://doi.org/10.1109/CSAC.2004.19 (2004).
Baliga, A., Ganapathy, V. & Iftode, L. Automatic inference and enforcement of kernel data structure invariants. In Proc. ACSAC, 77–86, https://doi.org/10.1109/ACSAC.2008.29 (2008).
Baliga, A., Ganapathy, V. & Iftode, L. Detecting kernel-level rootkits using data structure invariants. IEEE Transactions on Dependable and Secure Computing 8, 670–684, https://doi.org/10.1109/TDSC.2010.38 (2011).
Google Scholar

Download references

Acknowledgements

This work was supported by the Swiss Federal Office for Defense Procurement (armasuisse) under the CyberDFL project (CYD-C-2020003) and by the University of Zürich UZH.

Author information

Authors and Affiliations

Communication Systems Group, Department of Informatics, University of Zurich, 8050, Zürich, Switzerland
Chao Feng, Alberto Huertas Celdrán, Jing Han, Heqing Ren, Xi Cheng, Zien Zeng, Lucas Krauter & Burkhard Stiller
Department of Information and Communications Engineering, University of Murcia, 30100, Murcia, Spain
Alberto Huertas Celdrán
Cyber-Defence Campus, armasuisse Science & Technology, 3602, Thun, Switzerland
Gérôme Bovet

Authors

Chao Feng
View author publications
Search author on:PubMed Google Scholar
Alberto Huertas Celdrán
View author publications
Search author on:PubMed Google Scholar
Jing Han
View author publications
Search author on:PubMed Google Scholar
Heqing Ren
View author publications
Search author on:PubMed Google Scholar
Xi Cheng
View author publications
Search author on:PubMed Google Scholar
Zien Zeng
View author publications
Search author on:PubMed Google Scholar
Lucas Krauter
View author publications
Search author on:PubMed Google Scholar
Gérôme Bovet
View author publications
Search author on:PubMed Google Scholar
Burkhard Stiller
View author publications
Search author on:PubMed Google Scholar

Contributions

C.F. and A.H.C. drafted the manuscript with contributions from all authors. J.H., H.Q., X.C., Z.Z., and L.K. performed the experiments and data analysis. G.B. conceived and designed the study. B.S. provided overall supervision. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Chao Feng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Feng, C., Huertas Celdrán, A., Han, J. et al. A crowdsensing intrusion detection dataset for decentralized federated learning models. Sci Data (2026). https://doi.org/10.1038/s41597-026-07155-w

Download citation

Received: 02 September 2025
Accepted: 27 March 2026
Published: 03 April 2026
DOI: https://doi.org/10.1038/s41597-026-07155-w