Abstract
The application of the \(\Phi\)-OTDR (Phase-Optical Time Domain Reflectometry) system in real-time monitoring of power grid infrastructure has been proven effective in identifying and classifying various anomalies, such as digging, watering, and shaking. However, previous deep learning-based methods for \(\Phi\)-OTDR event classification are primarily designed for balanced classification problems, where the number of abnormal and normal event samples is relatively equal. In practical scenarios, the data for abnormal events are often much smaller than those for normal events (noise), resulting in a long-tailed distribution problem that poses significant challenges for accurate classification. To address this long-tailed imbalance issue in the practical application of \(\Phi\)-OTDR data, we introduce the Controllable Diffusion (ConDiff) framework, which aims to generate high-quality synthetic samples for abnormal situations. The ConDiff framework is composed of three essential components: Feedback-guided \(\Phi\)-OTDR Augmenter, the High-Quality Sample Selection module, and the Dynamic Threshold Adjustment module. The Feedback-guided \(\Phi\)-OTDR Augmenter utilizes diffusion model to generate synthetic samples that simulate abnormal events. The High-Quality Sample Selection module evaluates the quality of the generated synthetic samples and selects high-Quality samples. The Dynamic Threshold Adjustment module provides real-time feedback to dynamically control the sample generation process of Feedback-guided \(\Phi\)-OTDR Augmenter. Compared to current state-of-the-art baselines, our proposed ConDiff framework achieves a notable improvement in classification accuracy, with an increase ranging from 3.7% to 7.2% in the BJTU-OTDR-LT dataset. This improvement demonstrates the effectiveness of the proposed ConDiff framework in addressing the long-tailed imbalance problem in \(\Phi\)-OTDR event classification. The code will be released upon acceptance.
Similar content being viewed by others
Introduction
Phase-Sensitive Optical Time-Domain Reflectometry (\(\Phi\)-OTDR) is a high-sensitivity distributed fiber-optic sensing technology that detects minute vibrations and dynamic disturbances through phase changes of backscattered Rayleigh light. Leveraging narrow-linewidth lasers and coherent detection, it offers robust electromagnetic interference immunity, concealment, and precise localization. Widely applied in pipeline protection, intrusion detection, and structural health monitoring, \(\Phi\)-OTDR enables real-time monitoring of power grid infrastructure by identifying anomalies such as digging, watering, and shaking, thereby enhancing grid safety and maintenance efficiency.
In the field of \(\Phi\)-OTDR event classification, deep learning-based methods are widely used. Zhao et al.1 applied convolutional neural networks (CNNs) for event classification and vibration-frequency measurement, leveraging Markov Transition Fields for enhanced feature extraction. Shi et al.2 combined CNN with support vector machine (SVM) to improve classification accuracy by using CNN for feature extraction and SVM for optimized decision boundaries. Cao et al.3 introduced the first open \(\Phi\)-OTDR dataset with baseline models like SVM and CNN, providing a benchmark for model evaluation. Tian et al.4 proposed an attention-based temporal convolutional network (TCN) incorporating channel attention and bidirectional long short-term memory (LSTM) to enhance feature learning. Wang et al.5 developed a hybrid CNN-LSTM model for \(\Phi\)-OTDR pattern recognition, effectively identifying six vibration types with time-series features enhanced by discrete wavelet transform (DWT) and short-time Fourier transform (STFT).
Although these methods have achieved promising results, they are all designed for the balanced \(\Phi\)-OTDR event classification problem. However, in practical applications, the data for abnormal events are much smaller than those for normal events. Therefore, how to optimize for this long-tailed imbalance situation, where abnormal events occur much less frequently than normal events, is something that previous methods have not explored.
To better address the long-tailed imbalance problem in the practical use of \(\Phi\)-OTDR data (i.e., the occurrence of abnormal situations is much less frequent than that of normal situations), we propose the Controllable Diffusion framework to generate samples for abnormal situations. It includes the Feedback-guided \(\Phi\)-OTDR Augmenter, the High-Quality Sample Selection module, and the Dynamic Threshold Adjustment module. In the Feedback-guided \(\Phi\)-OTDR Augmenter, we use diffusion model to generate synthetic samples for abnormal situations. We also propose the High-Quality Sample Selection module to evaluate the generated synthetic samples and discard samples with low confidence. Moreover, we introduce the Dynamic Threshold Adjustment module to provide feedback to the Feedback-guided \(\Phi\)-OTDR Augmenter, dynamically controlling the samples generation process of different classes.
To evaluate the effectiveness of the proposed Controllable Diffusion framework, we performed experiments on a new benchmark BJTU-OTDR-LT. The BJTU-OTDR-LT use 10% training sample of each abnormal event (except for class ’Noise’) from BJTU-OTDR3 to performs the long-tailed situation. The Controllable Diffusion framework achieved an accuracy of Our proposed Controllable Diffusion framework achieves a \(3.7\%-7.2\%\) improvement in accuracy on the BJTU-OTDR-LT dataset for the OTDR event classification compared to the current baselines. With some baselines, it even reaches accuracy results close to those obtained by training with 100% training data. Additionally, ablation studies demonstrate the contribution of each component of the Controllable Diffusion framework to its overall effectiveness.
The main contributions are summarized as follows:
-
To better address the long-tailed imbalance problem in \(\Phi\)-OTDR even classification, we propose the Controllable Diffusion framework to generate high-quality synthetic samples for abnormal situations.
-
To dynamically control the diffusion process, we proposed the Dynamic Threshold Adjustment module which provide the feedback for the Feedback-guided \(\Phi\)-OTDR Augmenter. In addition, we proposed the High-quality Sample Selection module to select better generated synthetic samples for training.
-
Our proposed Controllable Diffusion framework can be applied to existing baselines, and achieves a \(3.70\%-7.2\%\) improvement in accuracy on the BJTU-OTDR-LT dataset for the long-tailed OTDR event classification.
Background and related work
\(\Phi\)-OTDR event classification
The Phase Sensitive Optical Time Domain Reflectometer (\(\Phi\)-OTDR) is a technology that detects backward Rayleigh-scattered light from optical pulses in fibers to sense disturbances. It uses a narrow-band laser and an acousto-optic modulator (AOM) to create coherent pulses, which are amplified by an Erbium-Doped Fiber Amplifier (EDFA) and sent into the sensing fiber. The scattered light is received by a photodetector via a circulator, and the data is stored. When a disturbance occurs in the sensing fiber, it causes a change in the phase of the signal light, which in turn modifies the intensity of the detected light. By analyzing these intensity changes, the location and type of disturbances can be determined.
Several studies have explored \(\Phi\)-OTDR event classification using deep learning. Zhao et al. 1 applied CNNs with Markov Transition Fields for feature enhancement. Kandamali et al. 6 reviewed machine learning techniques for event identification. Shi et al. 2 combined CNN and SVM to improve classification accuracy, while Cao et al. 3 introduced the first open \(\Phi\)-OTDR dataset with baseline models. Shi et al. 7 proposed a CNN-based event recognition method achieving 96.67% accuracy. Tian et al. 4 developed an attention-based Temporal convolutional network (TCN), integrating channel attention and bidirectional LSTM. Wang et al. 5 introduced a CNN-LSTM model utilizing discrete wavelet transform (DWT) and short-time Fourier transform (STFT) for enhanced time-series feature extraction. Jiang et al. 8 proposed a method based on Vision Transformer(ViT), leveraging self-attention for global feature extraction.
While these methods improve classification accuracy, challenges remain in handling data complexity, noise, and diverse disturbances. In this study, we enhance CNN-based and hybrid models with advanced data augmentation and semi-supervised learning, improving generalization under limited labeled data. By integrating temporal and spectral features, our approach achieves robust event discrimination in complex environments.
Controllable Diffusion framework
Data augmentation
Data augmentation is a key technique in deep learning, enhancing data diversity and model robustness. Mumuni et al. 9 and Maharana et al. 10 surveyed modern augmentation methods, including GANs, meta-learning, and geometric transformations. Yang et al. 11 and Wen et al. 12 reviewed augmentation strategies for image and time-series data, respectively, highlighting their impact on generalization. Summers et al. 13 proposed a mixed-example technique, while Fawzi et al. 14 introduced adaptive augmentation for robust classification. Chen et al. 15 evaluated augmentation in Natural language processing (NLP), and Rebuffi et al. 16 demonstrated its role in mitigating overfitting. Lemley et al. 17 proposed Smart Augmentation to optimize data transformations.
While prior research focused on image-based augmentation, we extend these methods to \(\Phi\)-OTDR time-series data using a diffusion model. Our approach integrates random flipping and noise perturbation to simulate realistic signal variations and environmental disturbances, while adaptively controlling class-wise sample generation through periodic feedback, effectively balancing class distribution and improving generalization.
Semi-supervised learning
Semi-supervised learning (SSL) effectively leverages unlabeled data to enhance model performance. Hady and Schwenker 18 reviewed SSL methods, while Berthelot et al. 19 introduced MixMatch, combining consistency regularization and data augmentation. Ouali et al. 20 surveyed deep SSL techniques, and Laine and Aila 21 proposed Temporal Ensembling to exploit temporal correlations. Mallapragada et al. 22 developed SemiBoost, integrating boosting with SSL. Zheng et al. 23 presented SimMatch, leveraging similarity matching. Yang et al. 24 and Chen et al. 25 explored SSL for large-scale models, while Sohn et al. 26 introduced FixMatch, combining consistency regularization with pseudo-labeling. Veselý et al. 27 applied SSL to speech recognition using confidence-based self-training.
Despite SSL advancements, most methods focus on images and text. We extend SSL to \(\Phi\)-OTDR event classification, addressing temporal and noisy data challenges. Using MixMatch and self-training, we improve classification of unlabeled samples. Our dynamic threshold adjustment further refines sample selection, enhancing both performance and robustness.
Methodology
Overview of the proposed Controllable Diffusion (ConDiff) framework for imbalanced \(\Phi\)-OTDR event classification. The \(\Phi\)-OTDR system detects phase variations in backscattered light caused by external disturbances. ConDiff uses a diffusion-based generator to create synthetic samples for minority events, which are refined through the High-Quality Sample Selection (HQSS) and Dynamic Threshold Adjustment (DTA) modules under adaptive feedback control \(\textbf{r}\).
Controllable diffusion framework
In this work, we propose the Controllable Diffusion framework (ConDiff) to address the class imbalance problem in a six-class dataset. As illustrated in Fig. 1, ConDiff employs a \(\Phi\)-OTDR-based diffusion model to generate additional samples for minority classes, while ensuring that well-represented classes remain unchanged. As outlined in Algorithm 1, the framework integrates two key modules–High-Quality Sample Selection (HQSS) and Dynamic Threshold Adjustment (DTA)–to iteratively refine both the training data and the generation process. Specifically, HQSS filters out unreliable or low-confidence samples from both the real and generated datasets, retaining only those with high confidence scores to ensure stable training. The DTA module dynamically adjusts class-wise thresholds \(\varvec{\tau }\) according to model feedback, promoting balanced learning among classes as training progresses. Meanwhile, a feedback vector \(\textbf{r}\) guides the generation of new synthetic samples for minority event types, preventing excessive or redundant augmentation. This closed-loop mechanism allows ConDiff to continuously improve data quality and class balance across epochs. By combining adaptive thresholding, selective sampling, and feedback-guided generation, the framework achieves high-quality augmentation and enhanced model generalization.
Feedback-guided \(\Phi\)-OTDR augmenter
The Feedback-guided \(\Phi\)-OTDR Augmenter utilizes a diffusion process to generate new samples. The model is trained to predict noise accurately, and once trained, it generates synthetic samples for the minority classes based on feedback received from the Controllable Diffusion framework. This feedback dictates the number of additional samples to be generated for each class.
Training Stage for Feedback-guided \(\Phi\)-OTDR Augmenter
Feedback-guided \(\Phi\)-OTDR augmenter training process
To address the class imbalance in the BJTU-OTDR-LT dataset, the training stage of the Feedback-guided \(\Phi\)-OTDR Augmenter focuses exclusively on the minority five event types: ’Digging’, ’Knocking’, ’Watering’, ’Shaking’, and ’Walking’. The ’Noise’ class contains sufficient samples and does not require augmentation. By concentrating on these minority classes, the model learns the noise patterns and underlying spatiotemporal features specific to these events.
The overall training procedure is summarized in Algorithm 2. The training objective is to optimize a noise predictor network \(\epsilon _\theta\) within the diffusion model, enabling it to accurately estimate the noise component at different noise levels. For each original sample \(x_0\) drawn from the true data distribution \(q(x_0)\), a noise vector \(\epsilon\) is sampled from a standard normal distribution \(\textbf{N}(0,I)\). A noisy version of the input is then constructed as
where the subscript \(t\) indicates the diffusion time step. The parameter \(\bar{\alpha }_t = \cos ^2(\frac{\pi }{2} t)\) controls the noise intensity at each time step, allowing a smooth progression from low to high noise levels. The model receives the perturbed sample \(\sqrt{\bar{\alpha }_t} x_0 + \sqrt{1 - \bar{\alpha }_t} \epsilon\) along with the time step \(t\) as input and predicts the corresponding noise. The loss function is defined as the mean squared error between the predicted noise and the true noise,
By minimizing this loss over all sampled time steps and inputs, the model learns to capture the spatiotemporal structure of the minority classes effectively.
The training process proceeds iteratively as follows. For each iteration, an original sample \(x_0\) is drawn from the minority classes, and a diffusion time step \(t\) is sampled from a low-discrepancy Sobol sequence to ensure uniform coverage of the noise schedule. The noise schedule \(\bar{\alpha }_t\) is computed, and a standard Gaussian noise vector \(\epsilon\) is sampled. The noisy input \(\epsilon _t\) is then computed, and the network predicts the noise corresponding to this perturbed sample. The loss is calculated as the squared difference between the predicted and true noise, and the network parameters \(\theta\) are updated via gradient descent. This procedure repeats until convergence or a maximum number of iterations is reached.
The use of a Sobol sequence rather than purely random sampling ensures that the model experiences a uniform distribution of noise levels, avoiding underfitting at certain time steps. The cosine-based noise schedule allows a smooth transition from low to high noise, which stabilizes training and improves the model’s ability to generate realistic synthetic samples. Through this process, the augmenter learns robust noise predictions and effectively captures the distinctive spatiotemporal features of the minority classes, providing a solid foundation for generating additional high-quality data.
Feedback-guided \(\Phi\)-OTDR Augmenter (FGA)
Feedback-guided diffusion process
After the training stage, the Feedback-guided \(\Phi\)-OTDR Augmenter enters the generation phase, where it produces synthetic samples to compensate for class imbalance. The number of additional samples required for each class is determined by the feedback variable \(\textbf{r}[c]\), which originates from the Controllable Diffusion framework. This variable dictates how many new instances the augmenter should generate per class, thereby ensuring that minority event types receive proportionally more samples.
The feedback-guided generation procedure is summarized in Algorithm 3. Guided by the feedback vector \(\textbf{r}[c]\), the generation process begins by initializing a random noise tensor \(\textbf{u}_S^c \sim \textbf{N}(0, I)\), where \(S\) denotes the total number of diffusion steps. This initialization is performed only for classes where \(\textbf{r}[c] > 0\), ensuring that additional samples are generated exclusively for minority event types, while well-represented classes such as ’Noise’ are excluded from the synthesis process. The random noise \(\textbf{u}_S^c\) serves as the starting point of the denoising trajectory. A cosine-based noise schedule is then applied to control the noise removal rate during the iterative process. Specifically, for each step \(s\), two coefficients are defined as
where \(\textbf{t}\) is a linear sequence from 1 to 0, dividing the denoising process into uniformly spaced steps. The coefficients \(\alpha _s\) and \(\sigma _s\) respectively control the contributions of the signal and the noise components at each step, ensuring a smooth and stable reduction of noise.
At every step \(s\), the model predicts the noise component \(\textbf{v}\) using the learned noise estimation function \(\epsilon _\theta\), which takes the current noisy sample \(\textbf{u}_s^c\), the timestep \(s\), and the class label \(c\) as input:
This predicted noise allows the model to estimate the corresponding clean signal at step \(s\), given by
Here, the subscript \(s\) indicates the current step in the reverse diffusion process. As \(s\) decreases from \(S\) to 1, the sample gradually evolves from pure Gaussian noise toward a realistic synthetic signal. The term \(\alpha [s]\) scales the signal component, while \(\sigma [s]\) scales the estimated noise, ensuring that both are properly balanced throughout the iterative refinement.
In addition, a residual noise term \(\varvec{\epsilon }\) is computed as
which captures the stochastic variability in the reconstruction and enables smoother transitions between consecutive diffusion steps. This residual serves as a correction term for the next denoising iteration, ensuring temporal consistency and realistic sample trajectories.
When \(s > 1\), the model updates the latent variable \(\textbf{u}_{s-1}^c\) using two adjusted variance terms that control the degree of stochasticity in the denoising process. The first term, \(\sigma _{\text {ddim}}\), represents the controllable noise variance defined as
where \(\eta \in [0,1]\) determines the trade-off between deterministic and stochastic sampling. When \(\eta = 0\), the sampling process becomes deterministic, while higher values of \(\eta\) reintroduce controlled randomness. The adjusted variance for the next step is then computed as
The predicted clean signal is combined with the residual noise to obtain the sample for the next iteration:
If the noise scale \(\eta > 0\), an additional Gaussian perturbation is injected to maintain stochasticity in the generative process:
This iterative denoising continues until \(s = 1\), where the sample \(\textbf{u}_{1}^c\) represents a fully generated synthetic signal for class \(c\). The entire process is thus dynamically guided by the feedback parameter \(\textbf{r}[c]\), which specifies the quantity of data required for each class. Through this feedback-guided mechanism, the augmenter adaptively balances data generation across classes–producing more samples for rare events while maintaining diversity and fidelity through stochastic noise control. As a result, the Feedback-guided \(\Phi\)-OTDR Augmenter effectively reconstructs high-quality synthetic \(\Phi\)-OTDR traces that mirror the temporal and spectral characteristics of the original minority-class signals.
High-Quality Sample Selection (HQSS) Module
High-quality sample selection module
To ensure the reliability of generated synthetic samples, we adopt a robust selection strategy that integrates confidence-based filtering with an advanced augmentation mechanism, as shown in Algorithm 4. This process enhances the quality of training data, improving model robustness and generalization.
To simulate realistic \(\Phi\)-OTDR signal variations and environmental disturbances, we incorporate two augmentation operations–random flipping and noise perturbation. Random flipping reverses the temporal order of a signal segment, mimicking vibration propagation from opposite directions, while noise perturbation introduces Gaussian noise to emulate stochastic phase fluctuations caused by environmental and system noise. These operations enrich the diversity of the training data and improve the model’s ability to generalize under complex real-world conditions.
For each batch, both labeled and generated synthetic samples undergo augmentation. Specifically, for each generated synthetic sample, K augmented versions are generated by adding noise and applying random flipping. The model produces predictions for all K versions, and their average serves as the confidence estimate:
where \(P_{\text {model}}(y|\hat{u}_{b,k}; \theta )\) is the model’s prediction for the augmented sample.
To refine these predictions, a sharpening operation is applied:
where T is a temperature parameter controlling the sharpness of the probability distribution.
To ensure the reliability of generated synthetic samples, we select high-confidence samples based on a thresholding strategy. Specifically, for each generated synthetic sample, we first determine its predicted class \(c^* = \arg \max (\textbf{m}_i)\) and simultaneously obtain the pseudo-label class \(c' = \arg \max (\textbf{q}_i)\).
A sample is considered high confidence and included in the set \(\textbf{S}\) if \(c^* = c'\) and \(\textbf{m}[c^*] > \varvec{\tau }[c^*]\), in which case we update the set as \(\textbf{S} \leftarrow \textbf{S} \cup \{(\hat{\textbf{u}}_{i,1}, \textbf{m}_i),\dots ,(\hat{\textbf{u}}_{i,k}, \textbf{m}_i)\}\).
To further refine the quality of the training set, we apply a structured augmentation strategy to both the labeled set \(\textbf{Y}\) and the high-confidence selectedmix.drawio set \(\textbf{S}\). By mixing information between selected samples, we generate new training examples that enhance the model’s generalization ability and reduce noise in pseudo-labels.
Specifically, given the combined dataset \(\textbf{D} \leftarrow \textbf{Y} \cup \textbf{S}\), we randomly shuffle it to obtain two subsets \(\textbf{D}^s\) and \(\textbf{D}^p\). Then, for each sample \((\textbf{y}_i, \textbf{z}_i) \in \textbf{Y}\) and its corresponding shuffled counterpart \((\hat{\textbf{y}}_i, \hat{\textbf{z}}_i) \in \textbf{D}^s\), we generate a new augmented sample:
Similarly, for each sample \((\textbf{s}_i, \textbf{t}_i) \in \textbf{S}\) and its shuffled counterpart \((\hat{\textbf{s}}_i, \hat{\textbf{t}}_i) \in \textbf{D}^p\), we generate:
where \(\lambda\) is a mixing coefficient that controls the interpolation between the original and shuffled samples.
This augmentation strategy not only smooths decision boundaries but also mitigates noise introduced by incorrect pseudo-labels, ultimately improving model robustness in semi-supervised learning scenarios. The resulting high-quality mixed sample \(\{\textbf{W}_s,\textbf{W}_p\}\) serves as an enhanced training set that further strengthens the model’s learning process.
Dynamic threshold adjustment module
To adaptively control class-wise sample selection, a dynamic threshold adjustment strategy is employed, as illustrated in Algorithm 5. This module updates the per-class thresholds \(\varvec{\tau }[c]\) based on the number of selected samples \(\textbf{n}[c]\) and guides the augmentation process via feedback \(\textbf{r}[c]\), ensuring balanced representation across classes.
At the start of each epoch, the dynamic threshold \(\varvec{\tau }[c]\) for each class are adjusted based on the epoch number:
where \(\tau _{\text {ini}}\) is the initial threshold, and \(f_{inf}\) is the influence factor.
For each class \(c\), the number of selected samples count \(\textbf{n}[c]\) is computed. If \(\textbf{n}[c]\) deviates from the average selected sample count \(\bar{n}\), the threshold is adjusted, and a feedback \(\textbf{r}[c]\) is computed to guide the augmentation process:
where \(\tau _{\text {max}}\) is the maximum allowable threshold, \(\tau _{\text {min}}\) is the minimum allowable threshold, \(\delta\) is deviation percentage, \(\gamma\) is adjustment factor and \(\textbf{h}[c]\) is deviation counting for class \(c\).
This dynamic adjustment allows for adaptive control of class-wise sample selection, ensuring balanced class representation.
Dynamic Threshold Adjustment (DTA) Module
Baseline model training
The training of the baseline classifier is performed using the high-quality mixed dataset \(\{ \textbf{W}^{s}, \textbf{W}^{p} \}\), which includes both selected real samples (\(\textbf{W}^{s}\)) and generated pseudo samples (\(\textbf{W}^{p}\)) obtained from the previous modules. To ensure stable optimization and balanced learning between real and synthetic data, we design a two-part loss function that combines a categorical cross-entropy loss (\(L_{ce}\)) and a mean squared error loss (\(L_{mse}\)). The total loss is defined as:
where epoch denotes the current training epoch index and \(E\) is the total number of epochs. The weighting term \(\frac{epoch}{E}\) gradually increases the contribution of \(L_{mse}\) as training progresses, allowing the model to first stabilize on high-confidence real data and later adapt to the softer distribution of synthetic samples. This progressive weighting strategy effectively mitigates instability in early-stage training and promotes smooth convergence.
The cross-entropy loss \(L_{ce}\) is calculated over all N classes as follows:
where \(y_i\) denotes the one-hot encoded ground-truth label and \(\hat{y}_i\) represents the predicted probability for class i. The negative log-likelihood formulation penalizes misclassification more heavily for confident but incorrect predictions, encouraging the model to produce accurate and well-calibrated probabilities. Here, the subscript i indexes the class dimension, ensuring that the loss is computed independently for each class before aggregation.
The mean squared error term \(L_{mse}\) is defined as:
where K represents the number of augmented (pseudo) samples and N is the number of classes. This component measures the Euclidean distance between the predicted and true probability distributions, encouraging consistency and smoother predictions for the generated samples. In particular, the subscript i here again traverses all classes, ensuring that each dimension of the probability vector contributes equally to the reconstruction error.
By combining these two complementary objectives, \(L_{ce}\) enforces categorical discrimination on real data, while \(L_{mse}\) regularizes the model against overconfidence and improves its generalization toward synthetic distributions. The epoch-dependent weighting further ensures a gradual shift from supervised learning on reliable samples to semi-supervised fine-tuning with generated data, thus enhancing robustness and preventing overfitting.
Experiments
Dataset
In our experimental investigation, we utilize the BJTU-OTDR dataset, which comprises six distinct event types:‘Noise’, ‘Digging’, ‘Knocking’, ‘Watering’, ‘Shaking’, and ‘Walking’. Each sample in this dataset follows a spatiotemporal structure represented as a \(10,000 \times 12\) matrix, ensuring consistency in data format for analysis.
To simulate a real-world data imbalance scenario, we construct a long-tailed version of this dataset, termed BJTU-OTDR-LT. In BJTU-OTDR-LT, we retain all samples of the “Noise” event while deliberately reducing the sample count of the remaining five event types, namely ‘Digging’, ‘Knocking’, ‘Watering’, ‘Shaking’, and ‘Walking ’ to only one-tenth of their original quantity. This one-tenth ratio was chosen arbitrarily for experimental convenience, rather than based on any specific theoretical consideration. This strategic subsampling aims to better reflect practical imbalance scenarios and assess its impact on model performance. Table 1 provides a detailed breakdown of the dataset into training, validation, and testing subsets, forming the foundation for our experiments.
Experimental setup
In this section, we outline our experimental setup. Our model is implemented using PyTorch for flexibility and efficiency. We use the Adam optimizer 28 with a learning rate of 0.0001. The minibatch size is set to 32, balancing computational efficiency and convergence stability. Key hyperparameters are tuned as follows: \(\tau _{\text {ini}} = 0.95\), \(\tau _{\text {min}} = 0.80\), \(\tau _{\text {max}} = 0.95\), \(T = 0.5\), \(K = 2\), \(\mathbf {\gamma } = 0.015\), \(\mathbf {\delta } = 0.2\), \(f_{\text {inf}} = 0.0015\), \(f_{\text {ret}} = 15\), and \(\lambda = 0.75\). This setup ensures optimal model performance across various conditions.
Performance comparison
In this section, we compare the performance of four baseline methods: CNN 3 is primarily used for extracting local features and is suitable for image and time-series data. CNN-LSTM29 combines CNN’s feature extraction capability with LSTM’s sequential modeling ability, where CNN extracts spatial features and LSTM captures temporal dependencies. C-CNN-LSTM 4 enhances CNN-LSTM by incorporating concatenated CNN layers to improve feature extraction. CNN-ABiLSTM 4 integrates CNN, bidirectional LSTM, and an attention mechanism to enhance feature learning and sequence modeling. They are trained under three data conditions respectively: normal data (balanced data), imbalanced data, and data augmented with Controllable Diffusion framework. The results are summarized in Table 2.
On the BJTU-OTDR dataset, all models performed well, with CNN-LSTM and CNN-ABiLSTM exceeding 99% accuracy and CNN achieving 95.60% with balanced precision. However, performance dropped significantly on the BJTU-OTDR-LT dataset due to class imbalance, with CNN falling to 81.81% and minority class precision declining sharply. While CNN-LSTM, C-CNN-LSTM, and CNN-ABiLSTM outperformed CNN, they still suffered performance degradation. Introducing the Controllable Diffusion framework effectively mitigated these issues, boosting CNN accuracy to 89.53% and restoring CNN-LSTM to 99.23%. Minority class precision also improved markedly, with CNN-LSTM achieving 100% on ’Digging’ and C-CNN-LSTM and CNN-ABiLSTM reaching 100% on ’Watering’, demonstrating the framework’s ability to enhance robustness, especially for minority classes.
Ablation study
To assess the contribution of each component in the Controllable Diffusion framework, we performed an ablation study on the \(\Phi\)-OTDR dataset (Table 3). The full model achieved 99.23% accuracy. Removing the High-Quality Sample Selection (HQSS) module caused a sharp drop to 94.96%, underscoring its critical role in filtering high-confidence samples. Excluding the Dynamic Threshold Adjustment (DTA) led to 98.24% accuracy, while removing the feedback mechanism slightly reduced it to 99.02%. These results highlight that all components contribute to performance, with HQSS being the most impactful for accuracy and robustness.
Quantitative analysis
The importance of diverse augmentation strategies: Table 4 shows the impact of different augmentations in the Controllable Diffusion framework. AddNoise and RandomFlip individually yield accuracies of 98.53% and 98.27%, respectively, while their combination achieves the highest accuracy of 99.23%. This demonstrates that combining diverse augmentations improves model generalization.
The significance of pre-training: Table 5 shows the effect of pre-training epochs on performance. Accuracy improves from 98.02% (epoch 0) to 99.23% (epoch 100), indicating that longer pre-training helps the model better learn data distribution and select higher-quality samples, enhancing overall performance.
The role of deviation percentage in sample selection: Table 5 shows the effect of \(\delta\) on accuracy. Performance peaks at 99.23% when \(\delta =20\%\), balancing threshold adjustment and sample selection. Lower or higher values lead to reduced accuracy, highlighting the importance of carefully tuning \(\delta\).
Impact of adjustment factor on threshold settings: Table 5 illustrates how the adjustment factor \(\gamma\) influences accuracy. The best result (99.23%) is achieved at \(\gamma = 0.015\). Smaller or larger values lead to suboptimal performance, showing that careful tuning of \(\gamma\) is key to effective threshold adjustment and sample selection.
The role of influence factor: Table 5 shows how the influence factor \(f_{\text {inf}}\) affects accuracy. Values of 0.00100 and 0.00150 yield the highest accuracy (99.23%), while a higher \(f_{\text {inf}} = 0.00175\) slightly reduces it. This suggests that overly rapid threshold decay can introduce noise, impacting performance.
The importance of return frequency: Table 5 shows how return frequency \(f_{\text {ret}}\) affects accuracy. The best performance (99.23%) is achieved at \(f_{\text {ret}} = 15\), with accuracy declining as \(f_{\text {ret}}\) increases. This indicates that frequent updates help maintain high-quality pseudo-labels.
Sample maps of \(\Phi\)-OTDR samples and generated synthetic samples in temporal dimension and spatial dimension.
Visualization
To provide an intuitive understanding of our methods and results, we present a series of visualizations that highlight the performance and impact of the proposed approaches. These visualizations include real and generated synthetic samples, a confusion matrix representing the performance of the CNN-LSTM model, and examples showcasing the efficacy of our Controllable Diffusion framework in correcting prediction errors.
Visualization of real and generated synthetic \(\Phi\)-OTDR samples
Figure 2 compares real and generated synthetic \(\Phi\)-OTDR samples by class, with original samples on the top and generated synthetic samples ones on the bottom. Each heatmap illustrates the temporal-spatial structure of the signals. The generated synthetic samples closely mimic the real patterns, validating the diffusion model’s ability to produce realistic augmentations that enhance model robustness and performance.
Confusion matrix comparison for CNN-LSTM model
Figure 3 compares confusion matrices before and after applying the Controllable Diffusion framework. Initially, the model shows frequent misclassifications, especially for minority classes. After augmentation, classification accuracy improves significantly, with more balanced predictions across classes. This demonstrates the framework’s effectiveness in mitigating class imbalance and enhancing model robustness.
Impact of ConDiff algorithm
To address the challenges posed by imbalanced datasets, we proposed the Controllable Diffusion framework, which selectively augments the dataset by generating high-quality synthetic samples. Figure 3 presents two real event samples ’Digging’ and ’Shaking’ and their corresponding classification results across multiple models. The ground truth labels are provided alongside predictions from different models, highlighting misclassifications by baseline models and the corrective effect of Controllable Diffusion framework. This comparison underscores the efficacy of Controllable Diffusion framework in mitigating class imbalance issues, improving classification accuracy, and ensuring more reliable predictions across different event categories.
Left: The confusion matrices of CNN-LSTM with two methods, (a) baseline, and (b) ConDiff. Labels 0 to 5 represent ’Noise,’ ’Digging,’ ’Knocking,’ ’Watering,’ ’Shaking,’ and ’Walking,’ respectively. Right: Visualization of classification results for two event samples (’Digging’ and ’Shaking’) across multiple models. Correct predictions are shown in green, incorrect ones in red, demonstrating the improvements brought by the Controllable Diffusion framework.
Conclusion
The Controllable Diffusion framework is proposed to tackle the long-tailed imbalance problem in \(\Phi\)-OTDR event classification. It includes the Feedback-guided \(\Phi\)-OTDR Augmenter, High-Quality Sample Selection module, and Dynamic Threshold Adjustment module. The Feedback-guided \(\Phi\)-OTDR Augmenter generates samples for abnormal situations using diffusion model technology. The High-Quality Sample Selection module evaluates and discards low-confidence samples, while the Dynamic Threshold Adjustment module provides feedback to dynamically control the sample generation process. Experiments on the BJTU-OTDR-LT dataset show that the framework achieves a \(3.70\%-7.2\%\) improvement in accuracy compared to current baselines, with some results approaching those from training with full data.
Data availability
The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.
References
Zhao, X. et al. Markov transition fields and deep learning-based event-classification and vibration-frequency measurement for φ-otdr. IEEE Sens. J. 22, 3348–3357. https://doi.org/10.1109/JSEN.2021.3137006 (2022).
Shi, Y., Wang, Y., Wang, L., Zhao, L. & Fan, Z. Multi-event classification for φ-otdr distributed optical fiber sensing system using deep learning and support vector machine. Optik 221, 165373. https://doi.org/10.1016/j.ijleo.2020.165373 (2020).
Cao, X., Su, Y., Jin, Z. & Yu, K. An open dataset of φ-otdr events with two classification models as baselines. Results Optics 10, 100372. https://doi.org/10.1016/j.rio.2023.100372 (2023).
Tian, M., Dong, H. & Yu, K. Attention based temporal convolutional network for φ-otdr event classification. In 2021 19th International Conference on Optical Communications and Networks (ICOCN), 1–3, https://doi.org/10.1109/ICOCN53177.2021.9563673 (2021).
Wang, M., Feng, H., Qi, D., Du, L. & Sha, Z. φ-otdr pattern recognition based on cnn-lstm. Optik 272, 170380. https://doi.org/10.1016/j.ijleo.2022.170380 (2023).
Kandamali, D. F. et al. Machine learning methods for identification and classification ofevents in φ-otdr systems: a review. Appl. Opt. 61, 2975–2997. https://doi.org/10.1364/AO.444811 (2022).
Shi, Y., Wang, Y., Zhao, L. & Fan, Z. An event recognition method for φ-otdr sensing system based on deep learning. Sensors https://doi.org/10.3390/s19153421 (2019).
Jiang, W. & Yan, C. High-accuracy classification method of vibration sensing events in φ-otdr system based on vision transformer. In 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD), 1704–1709, https://doi.org/10.1109/CSCWD61410.2024.10580074 (2024).
Mumuni, A. & Mumuni, F. Data augmentation: A comprehensive survey of modern approaches. Array 16, 100258. https://doi.org/10.1016/j.array.2022.100258 (2022).
Maharana, K., Mondal, S. & Nemade, B. A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings3, 91–99, https://doi.org/10.1016/j.gltp.2022.04.020 (2022). International Conference on Intelligent Engineering Approach(ICIEA-2022).
Yang, S. et al. Image data augmentation for deep learning: A survey (2023). arXiv:2204.08610.
Wen, Q. et al. Time series data augmentation for deep learning: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-2021, 4653–4660, https://doi.org/10.24963/ijcai.2021/631 (International Joint Conferences on Artificial Intelligence Organization, 2021).
Summers, C. & Dinneen, M. J. Improved mixed-example data augmentation. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 1262–1270, https://doi.org/10.1109/WACV.2019.00139 (2019).
Fawzi, A., Samulowitz, H., Turaga, D. & Frossard, P. Adaptive data augmentation for image classification. In 2016 IEEE International Conference on Image Processing (ICIP), 3688–3692, https://doi.org/10.1109/ICIP.2016.7533048 (2016).
Chen, J., Tam, D., Raffel, C., Bansal, M. & Yang, D. An empirical survey of data augmentation for limited data learning in nlp. Transactions of the Association for Computational Linguistics11, 191–211, https://doi.org/10.1162/tacl_a_00542 (2023). https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00542/2074871/tacl_a_00542.pdf.
Rebuffi, S.-A. et al. Data augmentation can improve robustness. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. & Vaughan, J. W. (eds.) Advances in Neural Information Processing Systems, vol. 34, 29935–29948 (Curran Associates, Inc., 2021).
Lemley, J., Bazrafkan, S. & Corcoran, P. Smart augmentation learning an optimal data augmentation strategy. IEEE Access 5, 5858–5869. https://doi.org/10.1109/ACCESS.2017.2696121 (2017).
Hady, M. F. A. & Schwenker, F. Semi-Supervised Learning 215–239 (Springer, 2013).
Berthelot, D. et al. Mixmatch: A holistic approach to semi-supervised learning. In Wallach, H. et al. (eds.) Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., 2019).
Ouali, Y., Hudelot, C. & Tami, M. An overview of deep semi-supervised learning. CoRR (2020). arXiv:2006.05278.
Laine, S. & Aila, T. Temporal ensembling for semi-supervised learning. CoRR (2016). arXiv:1610.02242.
Mallapragada, P. K., Jin, R., Jain, A. K. & Liu, Y. Semiboost: Boosting for semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2000–2014. https://doi.org/10.1109/TPAMI.2008.235 (2009).
Zheng, M. et al. Simmatch: Semi-supervised learning with similarity matching. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14471–14481 (2022).
Yang, X., Song, Z., King, I. & Xu, Z. A survey on deep semi-supervised learning. IEEE Trans. Knowl. Data Eng. 35, 8934–8954. https://doi.org/10.1109/TKDE.2022.3220219 (2023).
Chen, T., Kornblith, S., Swersky, K., Norouzi, M. & Hinton, G. E. Big self-supervised models are strong semi-supervised learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, 22243–22255 (Curran Associates, Inc., 2020).
Sohn, K. et al. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, 596–608 (Curran Associates, Inc., 2020).
Veselý, K., Hannemann, M. & Burget, L. Semi-supervised training of deep neural networks. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 267–272, https://doi.org/10.1109/ASRU.2013.6707741 (2013).
Kingma, D. P. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Li, Y., Zeng, X. & Shi, Y. A spatial and temporal signal fusion based intelligent event recognition method for buried fiber distributed sensing system. Opt. Laser Technol. 166, 109658 (2023).
Acknowledgement
This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY24F020021, the Ningbo Science and Technology Special Projects under Grant No. 2024Z263, and 2023Z129, ZJ2024144.
Author information
Authors and Affiliations
Contributions
B.Z. designed the overall framework, conducted the main experiments, and wrote the primary manuscript text. W.C. and S.W. assisted in model implementation and data analysis. Y.M. and Y.Z. contributed to the preparation of Figures 1-3 and helped interpret the results. J.G. and Y.L. performed data preprocessing and experimental validation. Z.Z. supervised the project, provided theoretical guidance, and revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhu, B., Cheng, W., Wen, S. et al. Controllable diffusion framework for imbalanced Phi OTDR events classification. Sci Rep 16, 305 (2026). https://doi.org/10.1038/s41598-025-29691-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-29691-y










