Mitigating label noise in network intrusion detection via graph-based sample selection and purification

Zhao, Ruifen; Ding, Jiangtao; Dong, Qinhao; Cheng, Hongbin

doi:10.1038/s41598-026-45988-y

Download PDF

Article
Open access
Published: 07 April 2026

Mitigating label noise in network intrusion detection via graph-based sample selection and purification

Ruifen Zhao¹,
Jiangtao Ding²,
Qinhao Dong² &
…
Hongbin Cheng²

Scientific Reports , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Machine learning has achieved notable progress in malicious traffic detection, yet its effectiveness highly depends on data that are sufficiently large and reliably labeled. In practice, many datasets are produced by automated labeling pipelines, which inevitably introduce label noise and, in turn, undermine detection performance. Consequently, maintaining robust and generalizable detection under label noise has become a central challenge in network intrusion detection. Existing approaches often emphasize intrinsic model robustness. However, noise can reshape the distribution of hard examples and bias the optimization objective, which may yield unstable decision boundaries and further degrade performance. In this paper, we propose a data-centric relabeling framework \(\texttt{SilentSentinel}\) , comprising two components: Normal Sample Discovery (NSD) via graph propagation and Malicious Sample Screening (MSS) with dual networks. NSD proceeds in three steps: (1) confident-sample selection; (2) K-NN graph construction; and (3) label propagation. We first select high-confidence samples and assume their labels are correct, build a graph over all samples, and propagate labels from the confident subset to the full graph; samples that remain uncertain after propagation are forwarded to MSS for second-stage annotation. NSD aims to recover the majority of correctly labeled instances; these instances act as reliable anchors that guide MSS in labeling the remaining uncertain samples, thereby reducing label noise and stabilizing training. We evaluate \(\texttt{SilentSentinel}\) on CIC-IDS2017 and DoHBrw-2020. Under 40% label noise, \(\texttt{SilentSentinel}\) attains F1 scores of 0.81 and 0.98, respectively, yielding 17.39% and 11.36% relative improvements over state-of-the-art baselines.

Data availability

The datasets analysed in this study are publicly available from the Canadian Institute for Cybersecurity (CIC). Specifically: CIC-IDS2017 (https://www.unb.ca/cic/datasets/ids-2017.html) and DoHBrw (https://www.unb.ca/cic/datasets/dohbrw-2020.html).For queries about data usage in this work or to request ancillary materials (e.g., data splits or preprocessing scripts), please contact the corresponding author.

References

Miramirkhani, N., Appini, M. P., Nikiforakis, N. & Polychronakis, M. Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. In 2017 IEEE Symposium on Security and Privacy (SP), 1009–1024 (IEEE, 2017).
Zhang, J., Li, F., Ye, F. & Wu, H. Autonomous unknown-application filtering and labeling for dl-based traffic classifier update. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, 397–405 (IEEE, 2020).
Han, B. et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems 31 (2018).
Yu, X. et al. How does disagreement help generalization against label corruption? In International conference on machine learning, 7164–7173 (PMLR, 2019).
Chen, P., Liao, B. B., Chen, G. & Zhang, S. Understanding and utilizing deep neural networks trained with noisy labels. In International conference on machine learning, 1062–1070 (PMLR, 2019).
Wang, W. et al. Hast-ids: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE access 6, 1792–1806 (2017).
Google Scholar
Wu, P. & Guo, H. Lunet: a deep neural network for network intrusion detection. In 2019 IEEE symposium series on computational intelligence (SSCI), 617–624 (IEEE, 2019).
Wu, P., Guo, H. & Moustafa, N. Pelican: A deep residual network for network intrusion detection. In 2020 50th annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W), 55–62 (IEEE, 2020).
Rhanoui, M., Mikram, M., Yousfi, S. & Barzali, S. A cnn-bilstm model for document-level sentiment analysis. Mach. Learn. Knowl. Extr. 1, 832–847 (2019).
Google Scholar
Liu, L. et al. Memcain: A memory-enhanced hybrid cnn-attention model for network anomaly detection. Sci. Rep. 15, 34958 (2025).
Google Scholar
Attack, W. et al. Ensemble of feature augmented convolutional neural network and deep autoencoder for efficient detection of network attacks. Sci. Rep. 15, 4267 (2025).
Google Scholar
Ghosh, A., Kumar, H. & Sastry, P. S. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, vol. 31 (2017).
Zhang, Z. & Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. neural information processing systems 31 (2018).
Lyu, Y. & Tsang, I. W. Curriculum loss: Robust learning and generalization against label corruption. arXiv preprintarXiv:1905.10045 (2019).
Ma, X. et al. Normalized loss functions for deep learning with noisy labels. In International conference on machine learning, 6543–6553 (PMLR, 2020).
Song, H., Kim, M., Park, D., Shin, Y. & Lee, J.-G. Learning from noisy labels with deep neural networks: A survey. IEEE transactions on neural networks and learning systems (2022).
Hendrycks, D., Mazeika, M., Wilson, D. & Gimpel, K. Using trusted data to train deep networks on labels corrupted by severe noise. Adv. neural information processing systems 31 (2018).
Xia, X. et al. Are anchor points really indispensable in label-noise learning? Adv. neural information processing systems 32 (2019).
Yao, Y. et al. Dual t: Reducing estimation error for transition matrix in label-noise learning. Advances in neural information processing systems 33, 7260–7271 (2020).
Google Scholar
Wang, J., Wang, E. X. & Liu, Y. Estimating instance-dependent label-noise transition matrix using a deep neural network. In International Conference on Machine Learning (2022).
Jiang, L., Zhou, Z., Leung, T., Li, L.-J. & Fei-Fei, L. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning, 2304–2313 (PMLR, 2018).
Yuan, Q. et al. Mcre: A unified framework for handling malicious traffic with noise labels based on multidimensional constraint representation. IEEE Trans. Inf. Forensics Secur. 19, 133–147 (2024).
Google Scholar
Zhu, X. & Ghahramani, Z. Learning from labeled and unlabeled data with label propagation. ProQuest number: information to all users (2002).
Zhou, D., Bousquet, O., Lal, T., Weston, J. & Schölkopf, B. Learning with local and global consistency. Advances in neural information processing systems 16 (2003).
Wang, F. & Zhang, C. Label propagation through linear neighborhoods. In Proceedings of the 23rd international conference on Machine learning, 985–992 (2006).
Duan, G., Lv, H., Wang, H. & Feng, G. Application of a dynamic line graph neural network for intrusion detection with semisupervised learning. IEEE Trans. Inf. Forensics Secur. 18, 699–714 (2022).
Google Scholar
Jiang, B., Zhang, Z., Lin, D., Tang, J. & Luo, B. Semi-supervised learning with graph learning-convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11313–11320 (2019).
Sharafaldin, I. et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1, 108–116 (2018).
Google Scholar
MontazeriShatoori, M., Davidson, L., Kaur, G. & Lashkari, A. H. Detection of doh tunnels using time-series classification of encrypted traffic. In 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 63–70 (IEEE, 2020).
Chen, Y., Ding, Z. & Wagner, D. Continuous learning for android malware detection. In 32nd USENIX Security Symposium (USENIX Security 23), 1127–1144 (2023).
Yang, L. et al.\(\{{\rm CADE}\}\): Detecting and explaining concept drift samples for security applications. In 30th USENIX Security Symposium (USENIX Security 21), 2327–2344 (2021).
Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. (2017).

Download references

Funding

This work was supported in part by the Key Program of Nature Science Foundation of Zhejiang Province under Grant LZ24F020007, in part by the National Nature Science Foundation of China under Grant 62072407, in part by the ”Leading Goose Project Plan” of Zhejiang Province under Grant 2022C01086, Grant 2022C03139, and in part by the National Key R&D Program of China under Grant 2022YFB2701400, Supported by the “Tianchi Talent ”Distinguished Expert Program of Xinjiang Province.

Author information

Authors and Affiliations

School of Business Intelligence, Zhejiang Institute of Economics and Trade, HangZhou, 310018, China
Ruifen Zhao
Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310014, China
Jiangtao Ding, Qinhao Dong & Hongbin Cheng

Authors

Ruifen Zhao
View author publications
Search author on:PubMed Google Scholar
Jiangtao Ding
View author publications
Search author on:PubMed Google Scholar
Qinhao Dong
View author publications
Search author on:PubMed Google Scholar
Hongbin Cheng
View author publications
Search author on:PubMed Google Scholar

Contributions

Author Contributions. J.D. conceived and supervised the study. R.Z. and J.D. designed the overall framework. R.Z. implemented the NSD module and conducted data preprocessing. Q.D. implemented the MSS module and prepared Figures 1-3. H.C. set up experiments, conducted evaluations on CIC-IDS2017 and DoHBrw-2020, and prepared Figures 4-5 and Tables. R.Z., Q.D., and H.C. performed the experiments and analyzed the results. J.D. and R.Z. wrote the main manuscript text. All authors reviewed and approved the final manuscript. (J.D. is the corresponding author.)

Corresponding author

Correspondence to Jiangtao Ding.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, R., Ding, J., Dong, Q. et al. Mitigating label noise in network intrusion detection via graph-based sample selection and purification. Sci Rep (2026). https://doi.org/10.1038/s41598-026-45988-y

Download citation

Received: 07 November 2025
Accepted: 23 March 2026
Published: 07 April 2026
DOI: https://doi.org/10.1038/s41598-026-45988-y

Mitigating label noise in network intrusion detection via graph-based sample selection and purification

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links