Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Mitigating label noise in network intrusion detection via graph-based sample selection and purification
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 07 April 2026

Mitigating label noise in network intrusion detection via graph-based sample selection and purification

  • Ruifen Zhao1,
  • Jiangtao Ding2,
  • Qinhao Dong2 &
  • …
  • Hongbin Cheng2 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Engineering
  • Mathematics and computing

Abstract

Machine learning has achieved notable progress in malicious traffic detection, yet its effectiveness highly depends on data that are sufficiently large and reliably labeled. In practice, many datasets are produced by automated labeling pipelines, which inevitably introduce label noise and, in turn, undermine detection performance. Consequently, maintaining robust and generalizable detection under label noise has become a central challenge in network intrusion detection. Existing approaches often emphasize intrinsic model robustness. However, noise can reshape the distribution of hard examples and bias the optimization objective, which may yield unstable decision boundaries and further degrade performance. In this paper, we propose a data-centric relabeling framework \(\texttt{SilentSentinel}\) , comprising two components: Normal Sample Discovery (NSD) via graph propagation and Malicious Sample Screening (MSS) with dual networks. NSD proceeds in three steps: (1) confident-sample selection; (2) K-NN graph construction; and (3) label propagation. We first select high-confidence samples and assume their labels are correct, build a graph over all samples, and propagate labels from the confident subset to the full graph; samples that remain uncertain after propagation are forwarded to MSS for second-stage annotation. NSD aims to recover the majority of correctly labeled instances; these instances act as reliable anchors that guide MSS in labeling the remaining uncertain samples, thereby reducing label noise and stabilizing training. We evaluate \(\texttt{SilentSentinel}\) on CIC-IDS2017 and DoHBrw-2020. Under 40% label noise, \(\texttt{SilentSentinel}\) attains F1 scores of 0.81 and 0.98, respectively, yielding 17.39% and 11.36% relative improvements over state-of-the-art baselines.

Data availability

The datasets analysed in this study are publicly available from the Canadian Institute for Cybersecurity (CIC). Specifically: CIC-IDS2017 (https://www.unb.ca/cic/datasets/ids-2017.html) and DoHBrw (https://www.unb.ca/cic/datasets/dohbrw-2020.html).For queries about data usage in this work or to request ancillary materials (e.g., data splits or preprocessing scripts), please contact the corresponding author.

References

  1. Miramirkhani, N., Appini, M. P., Nikiforakis, N. & Polychronakis, M. Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. In 2017 IEEE Symposium on Security and Privacy (SP), 1009–1024 (IEEE, 2017).

  2. Zhang, J., Li, F., Ye, F. & Wu, H. Autonomous unknown-application filtering and labeling for dl-based traffic classifier update. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, 397–405 (IEEE, 2020).

  3. Han, B. et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems 31 (2018).

  4. Yu, X. et al. How does disagreement help generalization against label corruption? In International conference on machine learning, 7164–7173 (PMLR, 2019).

  5. Chen, P., Liao, B. B., Chen, G. & Zhang, S. Understanding and utilizing deep neural networks trained with noisy labels. In International conference on machine learning, 1062–1070 (PMLR, 2019).

  6. Wang, W. et al. Hast-ids: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE access 6, 1792–1806 (2017).

    Google Scholar 

  7. Wu, P. & Guo, H. Lunet: a deep neural network for network intrusion detection. In 2019 IEEE symposium series on computational intelligence (SSCI), 617–624 (IEEE, 2019).

  8. Wu, P., Guo, H. & Moustafa, N. Pelican: A deep residual network for network intrusion detection. In 2020 50th annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W), 55–62 (IEEE, 2020).

  9. Rhanoui, M., Mikram, M., Yousfi, S. & Barzali, S. A cnn-bilstm model for document-level sentiment analysis. Mach. Learn. Knowl. Extr. 1, 832–847 (2019).

    Google Scholar 

  10. Liu, L. et al. Memcain: A memory-enhanced hybrid cnn-attention model for network anomaly detection. Sci. Rep. 15, 34958 (2025).

    Google Scholar 

  11. Attack, W. et al. Ensemble of feature augmented convolutional neural network and deep autoencoder for efficient detection of network attacks. Sci. Rep. 15, 4267 (2025).

    Google Scholar 

  12. Ghosh, A., Kumar, H. & Sastry, P. S. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, vol. 31 (2017).

  13. Zhang, Z. & Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. neural information processing systems 31 (2018).

  14. Lyu, Y. & Tsang, I. W. Curriculum loss: Robust learning and generalization against label corruption. arXiv preprintarXiv:1905.10045 (2019).

  15. Ma, X. et al. Normalized loss functions for deep learning with noisy labels. In International conference on machine learning, 6543–6553 (PMLR, 2020).

  16. Song, H., Kim, M., Park, D., Shin, Y. & Lee, J.-G. Learning from noisy labels with deep neural networks: A survey. IEEE transactions on neural networks and learning systems (2022).

  17. Hendrycks, D., Mazeika, M., Wilson, D. & Gimpel, K. Using trusted data to train deep networks on labels corrupted by severe noise. Adv. neural information processing systems 31 (2018).

  18. Xia, X. et al. Are anchor points really indispensable in label-noise learning? Adv. neural information processing systems 32 (2019).

  19. Yao, Y. et al. Dual t: Reducing estimation error for transition matrix in label-noise learning. Advances in neural information processing systems 33, 7260–7271 (2020).

    Google Scholar 

  20. Wang, J., Wang, E. X. & Liu, Y. Estimating instance-dependent label-noise transition matrix using a deep neural network. In International Conference on Machine Learning (2022).

  21. Jiang, L., Zhou, Z., Leung, T., Li, L.-J. & Fei-Fei, L. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning, 2304–2313 (PMLR, 2018).

  22. Yuan, Q. et al. Mcre: A unified framework for handling malicious traffic with noise labels based on multidimensional constraint representation. IEEE Trans. Inf. Forensics Secur. 19, 133–147 (2024).

    Google Scholar 

  23. Zhu, X. & Ghahramani, Z. Learning from labeled and unlabeled data with label propagation. ProQuest number: information to all users (2002).

  24. Zhou, D., Bousquet, O., Lal, T., Weston, J. & Schölkopf, B. Learning with local and global consistency. Advances in neural information processing systems 16 (2003).

  25. Wang, F. & Zhang, C. Label propagation through linear neighborhoods. In Proceedings of the 23rd international conference on Machine learning, 985–992 (2006).

  26. Duan, G., Lv, H., Wang, H. & Feng, G. Application of a dynamic line graph neural network for intrusion detection with semisupervised learning. IEEE Trans. Inf. Forensics Secur. 18, 699–714 (2022).

    Google Scholar 

  27. Jiang, B., Zhang, Z., Lin, D., Tang, J. & Luo, B. Semi-supervised learning with graph learning-convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11313–11320 (2019).

  28. Sharafaldin, I. et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1, 108–116 (2018).

    Google Scholar 

  29. MontazeriShatoori, M., Davidson, L., Kaur, G. & Lashkari, A. H. Detection of doh tunnels using time-series classification of encrypted traffic. In 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 63–70 (IEEE, 2020).

  30. Chen, Y., Ding, Z. & Wagner, D. Continuous learning for android malware detection. In 32nd USENIX Security Symposium (USENIX Security 23), 1127–1144 (2023).

  31. Yang, L. et al.\(\{{\rm CADE}\}\): Detecting and explaining concept drift samples for security applications. In 30th USENIX Security Symposium (USENIX Security 21), 2327–2344 (2021).

  32. Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. (2017).

Download references

Funding

This work was supported in part by the Key Program of Nature Science Foundation of Zhejiang Province under Grant LZ24F020007, in part by the National Nature Science Foundation of China under Grant 62072407, in part by the ”Leading Goose Project Plan” of Zhejiang Province under Grant 2022C01086, Grant 2022C03139, and in part by the National Key R&D Program of China under Grant 2022YFB2701400, Supported by the “Tianchi Talent ”Distinguished Expert Program of Xinjiang Province.

Author information

Authors and Affiliations

  1. School of Business Intelligence, Zhejiang Institute of Economics and Trade, HangZhou, 310018, China

    Ruifen Zhao

  2. Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310014, China

    Jiangtao Ding, Qinhao Dong & Hongbin Cheng

Authors
  1. Ruifen Zhao
    View author publications

    Search author on:PubMed Google Scholar

  2. Jiangtao Ding
    View author publications

    Search author on:PubMed Google Scholar

  3. Qinhao Dong
    View author publications

    Search author on:PubMed Google Scholar

  4. Hongbin Cheng
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Author Contributions. J.D. conceived and supervised the study. R.Z. and J.D. designed the overall framework. R.Z. implemented the NSD module and conducted data preprocessing. Q.D. implemented the MSS module and prepared Figures 1-3. H.C. set up experiments, conducted evaluations on CIC-IDS2017 and DoHBrw-2020, and prepared Figures 4-5 and Tables. R.Z., Q.D., and H.C. performed the experiments and analyzed the results. J.D. and R.Z. wrote the main manuscript text. All authors reviewed and approved the final manuscript. (J.D. is the corresponding author.)

Corresponding author

Correspondence to Jiangtao Ding.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, R., Ding, J., Dong, Q. et al. Mitigating label noise in network intrusion detection via graph-based sample selection and purification. Sci Rep (2026). https://doi.org/10.1038/s41598-026-45988-y

Download citation

  • Received: 07 November 2025

  • Accepted: 23 March 2026

  • Published: 07 April 2026

  • DOI: https://doi.org/10.1038/s41598-026-45988-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Network traffic intrusion detection
  • Label noise
  • Machine learning
Download PDF

Associated content

Collection

Intrusion detection systems and anomaly detection techniques

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics