Anomaly detection in encrypted network traffic using self-supervised learning

Sattar, Sadaf; Khan, Shumaila; Khan, Muhammad Ismail; Akhmediyarova, Ainur; Mamyrbayev, Orken; Kassymova, Dinara; Oralbekova, Dina; Alimkulova, Janna

doi:10.1038/s41598-025-08568-0

Download PDF

Article
Open access
Published: 22 July 2025

Anomaly detection in encrypted network traffic using self-supervised learning

Sadaf Sattar¹,
Shumaila Khan²,
Muhammad Ismail Khan³,
Ainur Akhmediyarova⁴,
Orken Mamyrbayev⁵,
Dinara Kassymova⁶,
Dina Oralbekova⁵ &
…
Janna Alimkulova⁷

Scientific Reports volume 15, Article number: 26585 (2025) Cite this article

6866 Accesses
9 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Privacy and security in network communication have been enhanced via encryption and traditional anomaly detection methods are no longer effective because of their payload inspection. In this paper, we describe ET-SSL, a new approach for encrypted data anomaly detection which uses self-supervised contrastive learning to identify informative representations in flow level, statistical features like packet length; inter arrival time; flow duration and protocol metadata to Detect anomalies in encrypted network traffic without the need for labelled datasets or payload analysis. ET-SSL extends the use of SSL based traffic classification in order to improve detection performance while keeping computational complexity low through the maximization of the difference between normal and anomalous traffic. On CIC-Darknet2020, ISCX VPN (nonVPN), and UNSW-NB15 datasets, ET-SSL achieves 96.8 percent accuracy, 92.7 percent true positive rate (TPR), 1.2 percent false positive rate (FPR), and can do real time anomaly detection with 15 ms to 25 ms latency and speeds up to 10 Gbps processing which makes it suitable for high speed and resource constrained environments. Compared with existing methods, ET-SSL does not rely on labeled data, scales better, and detects zero day attack in dynamic network environment more effectively, serving as a paradigm for private and energy efficient anomaly detection in encrypted traffic.

An optimized LSTM-based deep learning model for anomaly network intrusion detection

Article Open access 10 January 2025

EnCTN: an enhanced AI-enabled deep learning framework for security enhancement in blockchain transactions

Article Open access 27 November 2025

Smart deep learning model for enhanced IoT intrusion detection

Article Open access 01 July 2025

Introduction

As more people adopt encryption protocols like TLS, VPNs and HTTP over DNS (DNS over HTTPS), traditional network security techniques like deep packet inspection (DPI) are no longer suitable¹. Payload-based analysis is prevented by encrypted traffic, and such encrypted traffic makes anomaly detection systems rely on traffic flow characteristics to detect malicious activity². This is even the more complicated by the fact that APTs, zero day attacks, and traffic obfuscation techniques have made signature based intrusion detection tools (IDSs) ineffective³.

Preventive anomaly detection mechanisms largely relied on character level analysis where they sniffed the contents of packet payload and its header to scan for suspicious patterns⁴. Nevertheless, as the actual data content became encrypted, these models became less useful⁵. Thus, non character level approaches have developed using flow level statistical features such as packet length distributions, inter arrival times, flow durations and protocol meta data to distinguish normal and anomalous network behaviour⁶.

Three main approaches to existing machine learning based anomaly detection models can be categorized:

1.
Signature-based and rule-based methods Snort and Suricata are traditional IDSs that depend on predefined attack signatures⁷. Nevertheless, such methods cannot handle unknown or zero day attacks, and they are hard to generalize to changing network threats.
2.
Supervised learning-based models Traffic anomaly detection has been studied before as a task with deep learning approaches such as CNNs, RNNs, etc.⁸. Nevertheless, they require lot of labeled data, and collecting it for encrypted traffic is very difficult, as ground truth for attack patterns is not available.
3.
Unsupervised learning methods So, analyzing traffic without having labeled data has been proposed by techniques such as autoencoders, clustering and statistical anomaly detection⁹. Unfortunately, these models suffer from very high false positive rate, which makes it difficult to detect.

In order to solve these challenges, this paper proposes an Encrypted Traffic Anomaly Detection using Self_supervised Contrastive Learning (ET_SSL), a novel framework that detects anomalies in the aimed encrypted traffic without either the prior availability of labeled datasets or payload analysis. Unlike traditional approaches, ET-SSL learns statistically and temporally meaningful features related to traffic (packet sizes, inter arrival times, etc.) meaningfully (with contrastive learning) among normal and malicious traffic patterns. ET-SSL factors in self-supervised learning which makes the zero day attack detection independent of labeled datasets while providing nearly 5 × increase over prior work in dynamic network environments. Figure 1 shows the anomaly traffic detection on feature fluctuation for secure industrial internet of things.

The main contributions of the paper are as follows:

1.
A novel use of contrastive learning based anomaly detection framework which extracts feature representations from encrypted traffic with no need of inspection of payload.
2.
An efficient, scalable, and real-time anomaly detection system, capable of processing 10 Gbps of network traffic with a latency of 15–25 ms.
3.
A self-supervised learning model that eliminates the need for labeled training data, improving scalability and adaptability to evolving network threats.

The remainder of this paper is structured as follows: Section "Background" discusses related work and existing anomaly detection methods, Section "Methodology" details the ET-SSL methodology, Section "Results and discussion" presents experimental results, and Section "Conclusion" concludes with insights and future research directions.

Background

With the growing adoption of encryption protocols such as TLS, VPNs, and DNS-over-HTTPS, traditional network security techniques that rely on deep packet inspection (DPI) have become ineffective. As encryption hides the content of network packets, anomaly detection methods have shifted from payload-based analysis to flow-level statistical analysis¹⁰. This overview of the evolution of the anomaly detection in encrypted traffic is organized into four major research directions, rule based methods, supervised learning methods, unsupervised learning techniques, self-supervised learning methods. We also discuss the limitation of existing framework and reasons to use the proposed ET-SSL framework.

Rule based and signature based detection were the oldest network security methods like intrusion detection systems (IDSs) used like Snort and Suricata¹¹. The packet payloads, the protocol headers, and known attack signatures were analyzed and the malicious activities were detected by these systems. But with the ever increased amount of encrypted traffic, these systems stopped working since the payloads were inaccessible to inspect. Additionally, they couldn’t generalize to unknown attack signatures, and thus were not suitable for defeating zero day attacks and living attacks¹².

In order to overcome the disadvantages of rules based methods, researchers used supervised machine learning models that included classifier Support Vector Machines (SVMs), Random Forest (RF) and Deep Neural Networks (DNNs) towards training over labelled network traffic datasets¹³. In this work, they classified traffic as normal or anomalous using these methods using packet lengths, inter-arrival times, flow durations, and other protocol metadata¹⁴.

A hybrid SVM + DNN in¹⁵ is used to detect anomaly in encrypted traffic with very high accuracy. Like¹⁶, used CNNs as feature extractor, that improved detection accuracy but at the price of high computational burden. Yet, supervised learning based approaches rely on a large labeled dataset which is difficult to get in encrypted environment since there is a lack of labeled attack traffic. In addition, these models fail to detect zero day attacks because they utilize the patterns that were learnt from past attacks¹⁷.

In order to overcome the lack of labeled data, we introduced supervisory learning methods such as autoencoders, k means clustering and variational autoencoders (VAEs). Although we refer to these models as deviance detectors, they learn normal traffic patterns and detect anomalies by detecting deviations¹⁸. On the unsupervised clustering to detect encrypted traffic anomalies, the work of author¹⁹ was successfully applied to identify outliers. In²⁰, he used a model of autoencoder for anomaly detection, which could effectively extract the complex encrypted traffic patterns. Despite that, the false positive rates of unsupervised methods are often high because of the inability of unsupervised methods to differentiate between traffic patterns that are caused by legitimate traffic patterns and actual malicious activity²¹.

Although several self-supervised learning methods exist, such as autoencoders and variational autoencoders (VAEs)²² they primarily focus on feature reconstruction rather than explicit anomaly separation. Autoencoders detect anomalies based on reconstruction error, which often leads to high false positives in complex encrypted traffic. Similarly, VAEs rely on probabilistic reconstructions, which are effective in capturing normal traffic distributions but struggle to differentiate novel attacks from minor deviations in normal behavior.

Contrastive learning, on the other hand, learns an embedding space where normal traffic samples are clustered closely while anomalies are pushed apart, enabling better feature separability²³. Unlike autoencoders and clustering-based approaches, contrastive learning does not rely on predefined thresholding mechanisms, reducing false positives while improving zero-day attack detection. Furthermore, contrastive learning inherently captures both spatial and temporal relationships in network traffic, making it more robust for real-time anomaly detection in encrypted traffic environments. Given these advantages, ET-SSL leverages contrastive learning to enhance detection accuracy while ensuring adaptability to evolving threats.

Very recent work in self-supervised learning and contrastive learning has made significant progress toward anomaly detection for encrypted traffic. Since self-supervised models learn traffic representations from unlabeled data, they are more suitable for real encrypted network condition compared to supervised methods²².

The anomaly detection model presented in²³ had introduced a contrastive learning based model that was effective in identifying differences between the normal and malware encoded traffic flows. In²⁴, the modeled distributions of encrypted traffic have been also generated through implemented variational autoencoders (VAEs) and the anomaly detection improved without using labelled datasets. Although this progress, existing self-supervised methods still have problems on real time processing and scalability, and thus are not easily deployed in High speed networks²⁵.

Although self-supervised learning on encrypted network traffic has significantly improved the anomaly detection, many of the existing techniques still have computational overhead, utilize inefficient feature extraction, and are suboptimal in separation anomaly²⁶. While supervised models are very accurate, obtaining large labelled datasets in such environments where anomaly is rare and seldom recognised is difficult²⁷. However, these same types of model remove the requirement for labeled data to learn, but tend to label normal traffic as an anomaly at high false positive rates, and hence reduce the reliability to real world deployments²⁸. Moreover, as many existing approaches are not suited to processing high speed network traffic with efficiency, they cannot be used in real time security applications²⁹. Recent advancements in self-supervised pretraining have shown promise in improving anomaly detection accuracy in network traffic ³⁰. A comprehensive survey by ³¹ highlights the state-of-the-art in deep learning for encrypted traffic classification, underscoring the need for innovative approaches like ET-SSL.

In order to solve this issue, we introduce ET-SSL (Encrypted Traffic Anomaly Detection with Self Supervised Contrastive Learning) which improves the separation for anomalies using contrastive learning. By clustering normal traffic and pushing out the anomalous traffic apart, ET-SSL improves detection accuracy. The scanner can pack 10 Gbps of encrypted traffic with 15–25 ms latency, which is sufficient for the scalability in real time security applications. In addition, its self-supervised approach makes it not require the need for labeled data, and thus be adaptable to zero day threats. In Table 1, ET-SSL is compared with existing methods in terms of advantages.

Table 1 Comparative analysis of recent anomaly detection approaches in encrypted network traffic.

Subjects

Abstract

Similar content being viewed by others

An optimized LSTM-based deep learning model for anomaly network intrusion detection

EnCTN: an enhanced AI-enabled deep learning framework for security enhancement in blockchain transactions

Smart deep learning model for enhanced IoT intrusion detection

Introduction

Background

Problem Formulation

Contrastive Learning for Feature Representation

Anomaly Detection Objective

Zero-day Attack Detection

Constraints and Parameter Tuning

Dataset Collection

Dataset Description

Methodology

Proposed model: Encrypted Traffic Anomaly Detection using Self-supervised Contrastive Learning (ET-SSL)

Overview

Mathematical Framework

Algorithm: ET-SSL Training and Anomaly Detection

Results and Discussion

Evaluation Metrics

Discussion

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links