Abstract
Machine learning has achieved notable progress in malicious traffic detection, yet its effectiveness highly depends on data that are sufficiently large and reliably labeled. In practice, many datasets are produced by automated labeling pipelines, which inevitably introduce label noise and, in turn, undermine detection performance. Consequently, maintaining robust and generalizable detection under label noise has become a central challenge in network intrusion detection. Existing approaches often emphasize intrinsic model robustness. However, noise can reshape the distribution of hard examples and bias the optimization objective, which may yield unstable decision boundaries and further degrade performance. In this paper, we propose a data-centric relabeling framework \(\texttt{SilentSentinel}\) , comprising two components: Normal Sample Discovery (NSD) via graph propagation and Malicious Sample Screening (MSS) with dual networks. NSD proceeds in three steps: (1) confident-sample selection; (2) K-NN graph construction; and (3) label propagation. We first select high-confidence samples and assume their labels are correct, build a graph over all samples, and propagate labels from the confident subset to the full graph; samples that remain uncertain after propagation are forwarded to MSS for second-stage annotation. NSD aims to recover the majority of correctly labeled instances; these instances act as reliable anchors that guide MSS in labeling the remaining uncertain samples, thereby reducing label noise and stabilizing training. We evaluate \(\texttt{SilentSentinel}\) on CIC-IDS2017 and DoHBrw-2020. Under 40% label noise, \(\texttt{SilentSentinel}\) attains F1 scores of 0.81 and 0.98, respectively, yielding 17.39% and 11.36% relative improvements over state-of-the-art baselines.
Data availability
The datasets analysed in this study are publicly available from the Canadian Institute for Cybersecurity (CIC). Specifically: CIC-IDS2017 (https://www.unb.ca/cic/datasets/ids-2017.html) and DoHBrw (https://www.unb.ca/cic/datasets/dohbrw-2020.html).For queries about data usage in this work or to request ancillary materials (e.g., data splits or preprocessing scripts), please contact the corresponding author.
References
Miramirkhani, N., Appini, M. P., Nikiforakis, N. & Polychronakis, M. Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. In 2017 IEEE Symposium on Security and Privacy (SP), 1009–1024 (IEEE, 2017).
Zhang, J., Li, F., Ye, F. & Wu, H. Autonomous unknown-application filtering and labeling for dl-based traffic classifier update. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, 397–405 (IEEE, 2020).
Han, B. et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems 31 (2018).
Yu, X. et al. How does disagreement help generalization against label corruption? In International conference on machine learning, 7164–7173 (PMLR, 2019).
Chen, P., Liao, B. B., Chen, G. & Zhang, S. Understanding and utilizing deep neural networks trained with noisy labels. In International conference on machine learning, 1062–1070 (PMLR, 2019).
Wang, W. et al. Hast-ids: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE access 6, 1792–1806 (2017).
Wu, P. & Guo, H. Lunet: a deep neural network for network intrusion detection. In 2019 IEEE symposium series on computational intelligence (SSCI), 617–624 (IEEE, 2019).
Wu, P., Guo, H. & Moustafa, N. Pelican: A deep residual network for network intrusion detection. In 2020 50th annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W), 55–62 (IEEE, 2020).
Rhanoui, M., Mikram, M., Yousfi, S. & Barzali, S. A cnn-bilstm model for document-level sentiment analysis. Mach. Learn. Knowl. Extr. 1, 832–847 (2019).
Liu, L. et al. Memcain: A memory-enhanced hybrid cnn-attention model for network anomaly detection. Sci. Rep. 15, 34958 (2025).
Attack, W. et al. Ensemble of feature augmented convolutional neural network and deep autoencoder for efficient detection of network attacks. Sci. Rep. 15, 4267 (2025).
Ghosh, A., Kumar, H. & Sastry, P. S. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, vol. 31 (2017).
Zhang, Z. & Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. neural information processing systems 31 (2018).
Lyu, Y. & Tsang, I. W. Curriculum loss: Robust learning and generalization against label corruption. arXiv preprintarXiv:1905.10045 (2019).
Ma, X. et al. Normalized loss functions for deep learning with noisy labels. In International conference on machine learning, 6543–6553 (PMLR, 2020).
Song, H., Kim, M., Park, D., Shin, Y. & Lee, J.-G. Learning from noisy labels with deep neural networks: A survey. IEEE transactions on neural networks and learning systems (2022).
Hendrycks, D., Mazeika, M., Wilson, D. & Gimpel, K. Using trusted data to train deep networks on labels corrupted by severe noise. Adv. neural information processing systems 31 (2018).
Xia, X. et al. Are anchor points really indispensable in label-noise learning? Adv. neural information processing systems 32 (2019).
Yao, Y. et al. Dual t: Reducing estimation error for transition matrix in label-noise learning. Advances in neural information processing systems 33, 7260–7271 (2020).
Wang, J., Wang, E. X. & Liu, Y. Estimating instance-dependent label-noise transition matrix using a deep neural network. In International Conference on Machine Learning (2022).
Jiang, L., Zhou, Z., Leung, T., Li, L.-J. & Fei-Fei, L. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning, 2304–2313 (PMLR, 2018).
Yuan, Q. et al. Mcre: A unified framework for handling malicious traffic with noise labels based on multidimensional constraint representation. IEEE Trans. Inf. Forensics Secur. 19, 133–147 (2024).
Zhu, X. & Ghahramani, Z. Learning from labeled and unlabeled data with label propagation. ProQuest number: information to all users (2002).
Zhou, D., Bousquet, O., Lal, T., Weston, J. & Schölkopf, B. Learning with local and global consistency. Advances in neural information processing systems 16 (2003).
Wang, F. & Zhang, C. Label propagation through linear neighborhoods. In Proceedings of the 23rd international conference on Machine learning, 985–992 (2006).
Duan, G., Lv, H., Wang, H. & Feng, G. Application of a dynamic line graph neural network for intrusion detection with semisupervised learning. IEEE Trans. Inf. Forensics Secur. 18, 699–714 (2022).
Jiang, B., Zhang, Z., Lin, D., Tang, J. & Luo, B. Semi-supervised learning with graph learning-convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11313–11320 (2019).
Sharafaldin, I. et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1, 108–116 (2018).
MontazeriShatoori, M., Davidson, L., Kaur, G. & Lashkari, A. H. Detection of doh tunnels using time-series classification of encrypted traffic. In 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), 63–70 (IEEE, 2020).
Chen, Y., Ding, Z. & Wagner, D. Continuous learning for android malware detection. In 32nd USENIX Security Symposium (USENIX Security 23), 1127–1144 (2023).
Yang, L. et al.\(\{{\rm CADE}\}\): Detecting and explaining concept drift samples for security applications. In 30th USENIX Security Symposium (USENIX Security 21), 2327–2344 (2021).
Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. (2017).
Funding
This work was supported in part by the Key Program of Nature Science Foundation of Zhejiang Province under Grant LZ24F020007, in part by the National Nature Science Foundation of China under Grant 62072407, in part by the ”Leading Goose Project Plan” of Zhejiang Province under Grant 2022C01086, Grant 2022C03139, and in part by the National Key R&D Program of China under Grant 2022YFB2701400, Supported by the “Tianchi Talent ”Distinguished Expert Program of Xinjiang Province.
Author information
Authors and Affiliations
Contributions
Author Contributions. J.D. conceived and supervised the study. R.Z. and J.D. designed the overall framework. R.Z. implemented the NSD module and conducted data preprocessing. Q.D. implemented the MSS module and prepared Figures 1-3. H.C. set up experiments, conducted evaluations on CIC-IDS2017 and DoHBrw-2020, and prepared Figures 4-5 and Tables. R.Z., Q.D., and H.C. performed the experiments and analyzed the results. J.D. and R.Z. wrote the main manuscript text. All authors reviewed and approved the final manuscript. (J.D. is the corresponding author.)
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhao, R., Ding, J., Dong, Q. et al. Mitigating label noise in network intrusion detection via graph-based sample selection and purification. Sci Rep (2026). https://doi.org/10.1038/s41598-026-45988-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-45988-y