Abstract
This paper introduces a dataset and an experimental study on Decentralized Federated Learning (DFL) for Internet of Things (IoT) crowdsensing malware detection. The dataset comprises behavioral records from benign and eight malware attacks. A total of 21,582,484 original records were collected from system calls, file system activities, resource usage, kernel events, input/output events, and network records. These records were aggregated into 30-second windows, resulting in 342,106 data records used for model training and evaluation. Experiments on the DFL platform compare traditional Machine Learning (ML), Centralized Federated Learning (CFL), and DFL across different node counts, topologies, and data distributions. Results show that DFL maintains competitive performance while preserving data locality, outperforming CFL in most settings. This dataset provides a solid foundation for studying the security of IoT crowdsensing environments.
Similar content being viewed by others
Data availability
The complete dataset is available from the Science Data Bank29, comprising CSV files organized by device and label, with each file containing preprocessed behavioral records and the corresponding extracted features.
Code availability
The scripts for data collection are available at: https://github.com/Cyber-Tracer/MalwareDetectionDataset and the scripts for data processing and model training are provided at: https://github.com/Cyber-Tracer/iot-feature-engineering.
References
Celdrán, A. H. et al. Intelligent and behavioral-based detection of malware in IoT spectrum sensors. International Journal of Information Security 22, 541–561, https://doi.org/10.1007/s10207-022-00602-w (2023).
Celdrán, A. H. et al. Privacy-preserving and syscall-based intrusion detection system for IoT spectrum sensors affected by data falsification attacks. IEEE Internet of Things Journal 10, 8408–8415, https://doi.org/10.1109/JIOT.2022.3213889 (2023).
Rajendran, S. et al. ElectroSense: Open and big spectrum data. IEEE Communications Magazine 56, 210–217, https://doi.org/10.1109/MCOM.2017.1700200 (2017).
Beltrán, E. T. M. et al. Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges. IEEE Communications Surveys & Tutorials 25, 2983–3013, https://doi.org/10.1109/COMST.2023.3315746 (2023).
Nguyen, T. D. et al. DÏoT: A federated self-learning anomaly detection system for IoT. In Proc. IEEE ICDCS, 756-767 https://doi.org/10.1109/ICDCS.2019.00080 (2019).
Rey, V. et al. Federated learning for malware detection in IoT devices. Computer Networks 204, 108693, https://doi.org/10.1016/j.comnet.2021.108693 (2022).
Tavallaee, M. et al. A detailed analysis of the KDD CUP 99 data set. In Proc. IEEE CISDA, 1-6, https://doi.org/10.1109/CISDA.2009.5356528 (2009).
Sharafaldin, I., Lashkari, A. H. & Ghorbani, A. A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proc. ICISSP, 108–116, https://doi.org/10.5220/0006639801080116 (2018).
Koroniotis, N., Moustafa, N., Sitnikova, E. & Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Generation Computer Systems 100, 779–796, https://doi.org/10.1016/j.future.2019.05.041 (2019).
Alsaedi, A. et al. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 8, 165130–165150, https://doi.org/10.1109/ACCESS.2020.3022862 (2020).
Beltrán, E. T. M. et al. NEBULA - Decentralized federated learning for heterogeneous networks. In Proc. ACM SIGCOMM Posters and Demos, 149–151, https://doi.org/10.1145/3744969.3748413 (2025).
Bashlite botnet source code. GitHub repository. https://github.com/hammerzeit/BASHLITE (accessed 24 February 2026).
HttpBackdoor source code. GitHub repository. https://github.com/SkryptKiddie/httpBackdoor (accessed 24 February 2026).
Backdoor source code. GitHub repository. https://github.com/Cyber-Tracer/Backdoor (accessed 24 February 2026).
TheTick backdoor source code. GitHub repository. https://github.com/nccgroup/thetick (accessed 24 February 2026).
Beurk rootkit source code. GitHub repository. https://github.com/unix-thrust/beurk (accessed 24 February 2026).
Bdvl rootkit source code. GitHub repository. https://github.com/Error996/bdvl (accessed 24 February 2026).
XMRig miner source code. GitHub repository. https://github.com/xmrig/xmrig (accessed 24 February 2026).
Ransomware-PoC source code. GitHub repository. https://github.com/jimmy-ly00/Ransomware-PoC (accessed 24 February 2026).
Ilavarasan, E. & Muthumanickam, K. A survey on host-based botnet identification. In Proc. ICRCC, 166–170 https://doi.org/10.1109/ICRCC.2012.6450569 (2012).
da Costa, V. G. T. et al. Detecting mobile botnets through machine learning and system calls analysis. In Proc. IEEE ICC, 1–6, https://doi.org/10.1109/ICC.2017.7997390 (2017).
Canzanese, R., Mancoridis, S. & Kam, M. System call-based detection of malicious processes. In Proc. IEEE QRS, 119–124, https://doi.org/10.1109/QRS.2015.26 (2015).
Petroni, N. L. Jr., Fraser, T., Molina, J. & Arbaugh, W. A. Copilot: A coprocessor-based kernel runtime integrity monitor. In Proc. 13th USENIX Security Symposium, San Diego, CA (2004).
Carbone, M. et al. Mapping kernel objects to enable systematic integrity checking. In Proc. ACM CCS, 555–565, https://doi.org/10.1145/1653662.1653729 (2009).
Kok, S. et al. Ransomware, threat and detection techniques: A review. International Journal of Computer Science and Network Security 19, 136 (2019).
Barbhuiya, S. et al. RADS: Real-time anomaly detection system for cloud data centres. arXiv 1811.04481 (2018).
Tanana, D. Behavior-based detection of cryptojacking malware. In Proc. USBEREIT, 543–545, https://doi.org/10.1109/USBEREIT48449.2020.9117732 (2020).
Feng, C. et al. DART: A solution for decentralized federated learning model robustness analysis. Array 23, 100360, https://doi.org/10.1016/j.array.2024.100360 (2024).
Feng, C. et al. IoT intrusion detection dataset for decentralized federated learning. Science Data Bank https://doi.org/10.57760/sciencedb.25380 (2025).
Meidan, Y. et al. N-BaIoT—network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Computing 17, 12–22, https://doi.org/10.1109/MPRV.2018.03367731 (2018).
Bezerra, V. H. et al. IoTDS: A one-class classification approach to detect botnets in Internet of Things devices. Sensors 19, 3188, https://doi.org/10.3390/s19143188 (2019).
Martinelli, F., Mercaldo, F. & Saracino, A. Bridemaid: An hybrid tool for accurate detection of Android malware. In Proc. ACM AsiaCCS, 899-901 https://doi.org/10.1145/3052973.3055156 (2017).
Saracino, A. et al. MADAM: Effective and efficient behavior-based Android malware detection and prevention. IEEE Transactions on Dependable and Secure Computing 15, 83–97, https://doi.org/10.1109/TDSC.2016.2536605 (2018).
Zhang, Y. & Paxson, V. Detecting backdoors. In Proc. USENIX Security Symposium, 1–15 (2000).
Hoglund, G. & Butler, J. Rootkits: Subverting the Windows Kernel. Addison-Wesley (2005).
Kruegel, C., Robertson, W. & Vigna, G. Detecting kernel-level rootkits through binary analysis. In Proc. ACSAC, 91-100 https://doi.org/10.1109/CSAC.2004.19 (2004).
Baliga, A., Ganapathy, V. & Iftode, L. Automatic inference and enforcement of kernel data structure invariants. In Proc. ACSAC, 77–86, https://doi.org/10.1109/ACSAC.2008.29 (2008).
Baliga, A., Ganapathy, V. & Iftode, L. Detecting kernel-level rootkits using data structure invariants. IEEE Transactions on Dependable and Secure Computing 8, 670–684, https://doi.org/10.1109/TDSC.2010.38 (2011).
Acknowledgements
This work was supported by the Swiss Federal Office for Defense Procurement (armasuisse) under the CyberDFL project (CYD-C-2020003) and by the University of Zürich UZH.
Author information
Authors and Affiliations
Contributions
C.F. and A.H.C. drafted the manuscript with contributions from all authors. J.H., H.Q., X.C., Z.Z., and L.K. performed the experiments and data analysis. G.B. conceived and designed the study. B.S. provided overall supervision. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Feng, C., Huertas Celdrán, A., Han, J. et al. A crowdsensing intrusion detection dataset for decentralized federated learning models. Sci Data (2026). https://doi.org/10.1038/s41597-026-07155-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07155-w


