Introduction

As a pivotal infrastructure underlying national economic and social development, the electric power system is indispensable in supporting vital economic activities and societal operations. Despite the rapid growth of the market economy, along with the progress of industrial modernization and digitization, the reliance of diverse industries on a steady and reliable power supply is increasing. In this context, the consistent operation of distribution lines, serving as the essential transmission medium connecting power generation and consumption, is crucial to guarantee uninterrupted power supply.

Distribution lines are constantly exposed to diverse environmental factors and weather conditions, such as storms, hail, extreme temperatures, pollution, and wildlife threats, making them susceptible to various faults. These faults can compromise the stability and reliability of the entire power system. Prompt and accurate diagnosis and handling of line faults are crucial to prevent the impact of faults from escalating and to minimize disruption to daily lives, work, and personal safety. The increasing demand for electricity, the integration of renewable energy sources and the limitations of the expansion of the grid pose unprecedented challenges to traditional power system protection and control strategies. The advancement of fault diagnosis technology can significantly enhance fault handling efficiency and accuracy, minimizing the impact on the power system and ensuring a reliable power supply. Such technological advancements hold significant theoretical and practical implications for enhancing the operational excellence of the power system.

Currently, the management of distribution line faults primarily relies on the switch-disconnection method to isolate faulty segments, followed by a meticulous manual inspection to accurately pinpoint the location of the fault and identify its underlying cause. However, line faults are often multifaceted, involving multiple contributing factors, which complicates the accurate classification and effective mitigation strategies employed by field personnel.

Traditional line fault classification techniques, such as threshold methods and fuzzy logic inference, rely on manually extracting features from fault signals. This manual extraction process can be cumbersome and time-consuming, especially when dealing with large-scale power systems. Additionally, these traditional methods often require substantial expertise, and achieving satisfactory classification accuracy can be challenging to meet real-world demands.

Recently, the rapid evolution of Artificial Intelligence (AI) technology has sparked interest among researchers in exploring machine learning-based approaches for line fault classification. Although these methods have shown promise in enhancing classification accuracy, they typically rely solely on a single fault feature type and require extensive simulated data for training to achieve satisfactory performance. However, in practical scenarios, the availability of line fault samples is often limited, making it challenging to meet the training data requirements of machine learning models. Furthermore, real-world data is typically more complex and noisy compared to simulated data, leading to concerns about the prediction accuracy of AI-based line fault classification methods in practical applications. Additionally, current methods fail to utilize temporal information within fault signals and to address the inherent uncertainty associated with relay protection and circuit breaker actions.

To address these challenges, this paper introduces an AI-based approach for power system distribution line fault diagnosis. To improve the efficiency and accuracy of distribution lines faults diagnosis, we propose to diagnose the distribution lines faults by traveling wave data analysis with multi-dimensional temporal-frequency domain features and Dempster–Shafer (D–S) evidence theory-based multi-model fusion method. Firstly, we utilize sensors and data acquisition equipment to capture real-time traveling wave signals generated during distribution line faults. Then, we extract multi-dimension temporal-frequency domain features such as waveform, peaks, frequency distribution, harmonic content, instantaneous frequency, instantaneous phase, etc. to represent fault signals. Finally, we construct several pre-diagnosis models by Support Vector Machines (SVM), Deep Belief Networks (DBN) to learn the characteristics of fault signals and ensemble the models by D–S evidence theory to improve the accuracy and reliability of fault diagnosis.

The main contributions of this paper are summarized as follows:

  • This paper establishes a systematic distribution line faults diagnosis framework that effectively addresses complex fault diagnosis scenarios within the power system.

  • We extract multi-dimension temporal-frequency domain features and propose a D–S evidence theory-based multi-model fusion method to significantly improve the accuracy and efficiency of distribution line fault identification.

  • We collect the distribution line faults data in real-world and conduct several experiments to show the performance of our approach. The experimental results show that our approach achieves 74.6% average accuracy in the identification of low-resistance grounding, arcing grounding, high-resistance grounding and intermittent arcing grounding. The experimental results demonstrate that the proposed method has achieved remarkable improvements in diagnosing distribution line faults in power systems.

The rest of the paper has five sections. In Section 2, we introduce related works. In Section 3, we propose the deep belief network-based method for fault diagnosis of distribution line. In Section 4, we provide the experiments and evaluations. The discussion and the conclusion are presented in Section 5.

Related works

Conventional line fault diagnosis methods

The current research area of conventional line fault diagnosis in power systems is extensive, covering symmetrical/asymmetrical fault detection, fault segment diagnosis, timing information utilization, fuzzy logic processing, and machine learning applications. Key advances include: Chatterjee et al.1 proposed a dual-purpose line relay protection scheme using conductance and power variations to detect symmetrical/asymmetrical faults, validated through simulations in single infinite bus and WSCC 9-bus systems. Li et al.2 developed an improved fault-segment diagnosis approach using a Takagi-Sugeno fuzzy neural network to address uncertainties in relay protection and circuit breaker actions, establishing optimal diagnostic models through distributed parallel processing. Yuan et al.3 introduced a fault diagnosis method for distribution networks based on Time-Series Hierarchical Fuzzy Petri Nets, incorporating temporal constraints and using a matrix inference algorithm for reasoning. Lin et al.4 proposed a method using fuzzy temporal Petri nets to combine temporal sequences of alarm messages with hierarchical Petri net models, utilizing temporal constraints on protective relays and circuit breakers.

Rajesh et al.5 developed a hybrid technique combining Truncated Singular Value Decomposition and the Human Urbanization Algorithm based on Recurrent Perceptual Neural Networks for predicting and categorizing line faults. Xing et al.6 introduced a multimodal information analysis method for power transformer fault diagnosis, using time series and multimodal data (dissolved gas, infrared images) with a selective kernel network, bi-directional gated loop cell, and cross-attention mechanism. Abbasi et al.7 provided a comprehensive review of Fault Detection and Diagnostics techniques in power transformers, emphasizing their role in system reliability and security. Wang et al.8 proposed a method using non-dominated sequencing-II in a genetic algorithm to transform fault diagnosis into a multi-objective optimization problem, solving it with the Pareto method. Yang et al.9 reviewed traditional and emerging machine learning applications for power system fault diagnosis, highlighting the need for advanced techniques in complex systems. Fatemeh et al.10 developed a real-time convolutional neural network-based system for detecting and identifying short-circuit faults in high-voltage lines, enhancing fault restoration and reducing downtime costs.

However, these conventional methods often face significant limitations. Threshold-based techniques can be susceptible to variations in system operating conditions and noise, leading to maloperations. Methods relying on manual feature extraction require extensive domain expertise and may not capture the full complexity of fault signatures, especially for intermittent or high-impedance faults. Furthermore, many traditional approaches struggle with the non-linear and non-stationary characteristics inherent in fault signals.

Data-driven fault diagnosis methods

In the study of traditional line fault diagnosis methods, the research progress and prospects of fault diagnosis techniques using deep learning, variational modal decomposition, convolutional neural networks (CNN), artificial neural networks (ANN), and machine learning are highlighted. These methods offer broad application potential and critical technical support for the safe and stable operation of power systems. Recent advancements include: Zhang et al.11 proposed a novel method for power system fault diagnosis using an attention mechanism and a bi-structured GRU neural network, which can comprehensively analyze data and extract fault characteristics. Zhang et al.12 introduced a hybrid approach for grid fault type identification, combining variational modal decomposition and CNN.

Hu et al.13 developed a deep learning-based fault diagnosis framework for preventing serious faults in power system emergencies. It uses an unsupervised deep autoencoder for offline feature selection and data cleaning, and supervised CNNs for online learning of fault feature extraction. Feng et al.14 proposed a transformer fault diagnosis method based on ANN, which utilizes neural network algorithms for offline learning and training of operational state data. Hong et al.15 introduced a CNN-based method for detecting fault lines and locations in power systems, aiming to improve reliability, stability, and energy quality. Xie et al.16 proposed an ANN-based Fault Sector Diagnosis method to improve fault localization in power system scheduling, constructing transparent diagnostic models for lines, transformers, and buses. Vaish et al.17 provided a comprehensive review of machine learning-based fault diagnosis techniques for power systems. Vivek et al.18 presents a comprehensive approach to electrical fault detection and localization, leveraging machine learning algorithms. Thomas et al.19 presents an end-to-end deep learning strategy to detect and localize symmetrical and unsymmetrical faults as well as high-impedance faults (HIFs) in a distribution system. Soham et al.20 presents a data driven fault detection approach with an ensemble classifier based smart meter in modern distribution system. Najafzadeh et al.21 proposes new methods using fuzzy logic and adaptive fuzzy neural networks as well as machine learning and meta-heuristic algorithms. Wang and Gooi22 introduced a distribution-balanced federated learning (DBFed-LSTM) approach that integrates FL with attention-based bidirectional LSTM to address data privacy, non-IID distributions, and limited computational resources in edge devices. Their results demonstrated that DBFed-LSTM can achieve comparable or even superior performance to centralized training with shared data, while significantly improving robustness against data imbalance and preserving privacy. This decentralized learning paradigm is particularly relevant to large-scale smart grid applications where data ownership, security, and communication efficiency are critical. Poursaeed and Namdari23 proposed an explainable AI-driven quantum deep neural network (QDNN) for fault location in DC microgrids. Their method leverages high-order synchrosqueezing transform (HOSST) for traveling wave feature extraction, followed by a hybrid CNN–QNN–BDLSTM architecture. Importantly, explainability is enhanced using Shapley additive explanations (SHAP), which provides transparent reasoning for the model’s decisions. Their experiments showed that QDNN maintains high accuracy even under noisy and high-resistance fault scenarios, while also offering interpretability and robustness across diverse microgrid configurations.

With the emergence of new power sources and loads, traditional protection schemes face unprecedented challenges. Machine learning techniques have become widely applied in intelligent fault diagnosis, particularly in line fault diagnosis. Innovations include deep learning-based fault classification and fuzzy logic-assisted fault diagnosis, which have demonstrated significant results in feature extraction, fault type identification, and fault localization. Significant progress has been made in diagnosing line faults, supporting the safe and stable operation of power systems. Leveraging these methods enables more precise fault diagnosis, facilitating timely intervention and ensuring reliable power system operation.

While data-driven methods have shown promise, they also present challenges. Many early machine learning models require large, high-quality labeled datasets, which are often scarce in real-world power systems due to the rarity of certain fault types. Some models may act as ’black boxes,’ lacking interpretability. Moreover, the performance of data-driven models can be sensitive to the quality of input features and may suffer from generalization issues if the training data does not adequately represent the diversity of real-world fault scenarios, including variations in fault resistance, location, and inception angle.

Methods

This paper aims to establish a systematic, multilevel fault diagnosis framework that significantly enhances the precision and efficiency of distribution line fault identification in power systems. We address complex fault scenarios within the power system. The objective is to significantly improve the accuracy and efficiency of fault diagnosis procedures. Figure 1 shows the architecture of our method, which consists of five key components: Signal acquisition and analysis, Temporal feature extraction of fault traveling wave signals, Spectral feature extraction of faulty traveling save signals, Semantic feature extraction for fault traveling wave signals, Pre-trained fault diagnosis model construction, D–S evidence theory-based multi-model fusion and Final fault diagnosis.

Fig. 1
figure 1

The overall architecture of the deep belief network-based pre-trained fault diagnosis model.

Signal acquisition and analysis

During the acquisition phase, we utilize state-of-the-art sensors and data acquisition equipment to capture real-time traveling wave signals generated during distribution line faults. These signals encapsulate vital information about instantaneous changes in current and voltage during fault occurrences, serving as the fundamental data source for fault diagnosis. In the subsequent analysis phase, these signals undergo a rigorous mathematical and physical examination to reveal underlying characteristics and patterns. This analysis encompasses the exploration of time-domain, frequency-domain, time-frequency domain, and semantic features, providing a comprehensive understanding for subsequent fault diagnosis.

Traditional fault diagnosis methods often rely solely on a single fault feature, which can limit the diagnostic outcomes. Conversely, this paper proposes to represent fault signal characteristics. By analyzing the traveling wave signals generated by line faults, we extract fault signal features from multiple dimensions, encompassing time-domain features (such as waveform, peaks, means, etc.), frequency-domain features (including frequency distribution, harmonic content, etc.), time-frequency features (like instantaneous frequency, instantaneous phase, etc.), and semantic features (indicating fault types, locations, etc.). Utilizing a feature selection algorithm, we sift through the extracted features to identify the most relevant subset for fault diagnosis. This reduction in the feature set not only decreases computational complexity but also enhances the accuracy of diagnosis. This comprehensive feature extraction approach ensures the high quality of test samples, effectively capturing the intricate characteristics of fault signals and providing a rich information pool for subsequent fault diagnosis.

Temporal feature extraction of fault traveling wave signals

This paper is dedicated to the meticulous and comprehensive extraction of temporal characteristics within line fault traveling wave signals. In our signal processing endeavors, we focused on several key statistics of the signal to fully elucidate its inherent properties. Initially, we extracted the maximum and minimum values of the signal, which provided us with insights into the amplitude range and potential extreme fluctuations of the signal. The extracted maximum \(x_{\text {max}}\) and minimum \(x_{\text {min}}\) values are presented in Eq. (1) and Eq. (2), where \(x_i\) is the value of the \(i{th}\) fault traveling wave signal sequence point, offering a quantitative insight into the signal’s dynamic behavior.

$$\begin{aligned} x_{\max } = \max (x_i) \end{aligned}$$
(1)
$$\begin{aligned} x_{\min } = \min (x_i) \end{aligned}$$
(2)

Furthermore, the calculation of the extreme deviation R serves as an effective method to quantitatively assess the level of signal volatility, according to Eq. (3), where \(x_{max}\) is the maximum value and \(x_{min}\) is the minimum value of the signal sequence. By computing the difference between the maximum and minimum values of a signal, the extreme deviation provides a visualization of the signal’s amplitude range, thereby clarifying the intensity of its volatility. Additionally, the extraction of variance and root mean square enhances our understanding of signal characteristics. Variance \(V_{ar}\), as a metric quantifying the dispersion of signal data points, mirrors the fluctuations in signal values, according to Eq. (4), where N is the number of sequence points and \(\bar{X}\) is the mean of the sample. Meanwhile, the root mean square (RMS) encapsulates the overall energy level of the signal, assisting us in evaluating its magnitude, according to Eq. (5). The extraction of these time-series features not only enhances our comprehension of the statistical properties of the signal but also provides crucial data support for subsequent fault diagnosis efforts. The pertinent details are enumerated below:

$$\begin{aligned} R = x_{\max } - x_{\min } \end{aligned}$$
(3)
$$\begin{aligned} Var = \frac{\sum _{i=1}^{N}(x_i - \bar{X})^2}{N-1} \end{aligned}$$
(4)
$$\begin{aligned} RMS = \sqrt{\frac{1}{N}\sum _{i=1}^{N}x_i^2} \end{aligned}$$
(5)

The calculation of cliff value Kur provides us with information about the sharpness of the signal waveform, according to Eq. (6), which helps to further reveal the nonlinear characteristics of the signal, where \(\sigma\) is the standard deviation of the sample. The cliff index can reflect the steepness of the waveform in the signal, which indicates the rate of change of the signal value in a short period. When the cliff value is high, it indicates that the signal waveform is sharper, potentially containing more mutation points or high-frequency components. These characteristics are typically associated with the nonlinear properties of the signal.

$$\begin{aligned} Kur = \frac{1}{N}\sum _{i=1}^{N}\left( \frac{x_i - \bar{X}}{\sigma }\right) ^4 \end{aligned}$$
(6)

where \(\Sigma\) represents the standard deviation of the time series signal X. By comprehensively extracting and analyzing the timing characteristics of the line fault traveling wave signal, we can gain a deeper understanding of its characteristics and provide robust data support for subsequent fault diagnosis efforts.

Spectral feature extraction of faulty traveling save signals

Since signal characterization is not limited to the temporal domain, frequency domain features also play a significant role. To provide a more comprehensive understanding of the complexity of the faulty traveling wave signals, this paper not only emphasizes the extraction of time-domain features, but also introduces the Fast Fourier Transform (FFT) method. The FFT method is utilized to convert the time-domain signals into frequency-domain features. Through FFT processing, we can accurately extract frequency-domain information from faulty traveling wave time series signals. This process helps to clearly illustrate the distribution state of the signal across different frequency components. Figure 2 shows the spectrum of different types of faults, including low-resistance grounding, high-resistance grounding, arc grounding, and intermittent arc grounding. From top to bottom in Fig. 2, the spectral data correspond to these four fault types, respectively. The spectral data are obtained by applying FFT to traveling wave signals, reflecting the differences in frequency-domain characteristics among various fault types. This provides a practical basis for subsequent deep learning-based classification of fault spectra.

Fig. 2
figure 2

The spectrum of different types of faults..

In this process, the FFT formula plays a crucial role. It skillfully decomposes the complex time-domain waveform into a series of superpositions of sine and cosine waveforms, making the frequency composition of the signal clear at a glance. Through the in-depth analysis of the frequency domain characteristics, we can further explore the nonlinear, non-smooth, and other complex features of the faulty traveling wave signal. This exploration aims to provide more comprehensive and accurate data support for the subsequent fault diagnosis. The Fourier transform of the fault traveling wave time series signal is performed, and the operational formula is as shown in Eq. (7).

$$\begin{aligned} x(k) = \sum _{n=0}^{N-1} x_n e^{-\frac{2\pi i}{N} nk} \end{aligned}$$
(7)

where x(k) denotes the frequency value, \(x_n\) represents the time sequence signal sequence points; n denotes the sequence index of the time domain sampling points, k denotes the index of the time domain value, and N represents the number of sampling points carried out. The Fourier transform is a simplification of the Discrete Fourier Transform (DFT). If the complexity of the DFT is represented by the number of \(N^2\) operations, then the number of operations for the Fourier transform is Nlg(N).

Correspondence before and after the transformation, the modal value of each point of the Fourier transform is a multiple of N/2 of the peak value of the original signal, and the frequency domain features \(X_{fft} = \{ x_{fft1}, x_{fft2}, \ldots , x_{fftn} \}\) of the faulty traveling wave are obtained.

Next, for the complex and variable fault traveling wave signals, it is often challenging to fully uncover their underlying patterns through a single time domain or frequency domain analysis. Therefore, this paper introduces the wavelet packet decomposition technique based on frequency domain analysis to analyze the faulty traveling wave signal more carefully from the perspective of time-frequency combination.

Semantic feature extraction for fault traveling wave signals

Even with these time-frequency features, we still need to explore more signal characteristics to refine the fault diagnosis framework. To this end, this paper further introduces the co-occurrence matrix algorithm for semantic feature extraction of faulty traveling wave signals. The co-occurrence matrix algorithm is an effective image texture analysis tool that captures the spatial distribution and correlation relationship between the internal elements of a signal and then extracts the semantic features of the signal.

In this paper, we have successfully extracted the semantic features of a faulty traveling wave by utilizing the co-occurrence matrix on the time series signal of the faulty traveling wave. These semantic features not only enhance our understanding of the fault-traveling wave signals but also offer more comprehensive and in-depth data support for the subsequent fault diagnosis work. By comprehensively utilizing time-frequency features and semantic features, we can more accurately identify the fault type and quickly locate the fault location. This provides powerful technical support for the stable operation of the power system and the rapid recovery from faults. Using the co-occurrence matrix algorithm to process the fault-traveling wave signal enhances the extraction of semantic features from the signal.

First, we normalized the fault-traveling wave signal and mapped it to the [− 1,1] interval. The purpose of normalization is to eliminate the influence of signal amplitude differences on subsequent processing so that the signals are compared and analyzed on the same scale. The normalization formula is shown in Eq. (8).

$$\begin{aligned} x_{\text {norm}} = \frac{x}{\left| x \right| _{\max }} \end{aligned}$$
(8)

where x denotes the original signal amplitude, \(\left| x \right| _{\max }\) denotes the absolute maximum value of the faulty traveling wave time series signal, and \(x_{norm}\) denotes the time series signal after normalization.

Next, we performed a mapping operation on the normalized timing signal. Specifically, we divided the interval [− 1,1] into 200 intervals at every 0.01 interval and mapped all amplitudes of the signal to values in the corresponding intervals. For example, the values of the interval [− 1.00,− 0.99] are mapped to 1, and the values of the interval [0.99, 1] are mapped to 200. Through this mapping process, we obtain the new feature sequence.

Subsequently, we constructed the covariance matrix of neighboring points using the mapped sequence. The size of this matrix is 200*200, and the rows and columns represent the intervals in which the current point of the sequence is located and the interval in which the latter point is located, respectively, and its value is the number of times these two points appear adjacent to each other.

The specific construction method is: if \(x_{normi}\) and \(x_{normi+1}\) the mapping of the points after the value of \(x_{mi}\) and points \(x_{mi+1}\), then the symbiotic matrix \(x_{mi}\) rows and \(x_{mi+1}\) columns of the value of plus 1, from the above description, can be seen, the symbiotic matrix of all the points of the sum of \(n-1\), that is, the n total of one sampling point has a total of one neighboring point of the number of logarithms.

Through the construction of the co-occurrence matrix, we can clearly show the correlation between neighboring points in the faulty traveling wave signal, and then extract semantic features. These features not only reflect the internal structure and pattern of the signal but also provide an important basis for subsequent fault diagnosis.

Pre-trained fault diagnosis model construction

This paper constructs two individual pre-trained fault diagnosis models based on the extracted fault features. These models can autonomously make initial judgments on the fault category of test samples, thereby enhancing the accuracy and efficiency of fault diagnosis. Additionally, by employing multiple models, our method demonstrates greater adaptability and robustness, effectively handling diverse types of faults. In this work, we selected Deep Belief Networks (DBN) and Support Vector Machines (SVM) as the individual pre-diagnosis models. SVMs are well-established for their strong generalization performance, especially with limited datasets, and their ability to handle high-dimensional feature spaces effectively using appropriate kernel functions. DBNs, as a form of generative deep learning model, are capable of learning hierarchical feature representations from data in an unsupervised manner, which can be particularly beneficial for capturing complex patterns in fault signals. While other advanced deep learning models such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs) have shown success in various domains, our choice was guided by the characteristics of our dataset and feature representation. Our extracted features are multi-dimensional temporal-frequency domain characteristics rather than raw time-series or image-like representations where CNNs or LSTMs might offer more direct advantages. Given our dataset size (212 samples), DBNs and SVMs offer a good balance between model complexity and the risk of overfitting, as they can perform well without requiring the vast amounts of data that more complex architectures like CNNs or LSTMs often need for optimal training from scratch. By training these models with fault samples, they automatically learn the characteristics of faults, thereby enhancing the accuracy and efficiency of fault diagnosis.

In this paper, we first extract the time domain features and input them as feature vectors into the logistic regression model for training. Through the construction of the logistic regression model, we developed a line fault pre-diagnosis model based on time domain features and logistic regression. Using the logistic regression algorithm, the model can initially diagnose the type of line faults by learning and analyzing the time domain features.

Then, we further extracted the frequency domain features and inputted them as feature vectors into the SVM for pre-training. In the construction process of the SVM, we selected the linear kernel function due to its simplicity, efficiency, and ease of interpretation. These qualities make it suitable for the fault pre-diagnosis task in this paper. The pre-diagnosis model for line faults, based on frequency domain features and support vector machine, can accurately classify different types of line faults by learning frequency domain features.

As the SVM is a binary classifier. To address the multi-classification problem, we utilize the one-to-one method for constructing multiple classifiers. Specifically, we design a support vector machine for each pair of classes of samples, so that for samples of k classes, \(k(k-1)/2\) SVM models need to be designed. This method can fully utilize the information among samples within each category to enhance the accuracy and reliability of classification.

Fig. 3
figure 3

The example of the deep belief betwork-based pre-trained fault diagnosis model..

This paper delves deeper into the application of time-frequency features and deep belief networks in pre-diagnosing line faults. Figure 3 shows the architecture of the DBN model. By inputting the extracted time-frequency features as feature vectors into a deep belief network for training, we successfully constructed a line fault pre-diagnosis model based on time-frequency features and a deep belief network. The core of the model is that it consists of multiple layers of Restricted Boltzmann Machines (RBMs). Each RBM layer consists of an input layer (visible layer) v and an output layer (hidden layer) h. This structure allows the deep belief network to learn the intrinsic representation of the data layer by layer and extract higher-level features, thereby enabling accurate diagnosis of line fault types.

In the process of constructing the deep belief network, we first defined the structure and parameters of the network. Specifically, we constructed a three-layer RBM network model. The first RBM layer has display layer units \(v_1 = \{ v_{11}, v_{12}, \ldots , v_{1n} \}\) and hidden layer units \(h_1 = \{ h_{11}, h_{12}, \ldots , h_{1n/2} \}\), where n and n/2 denote the number of display layer and hidden layer units, respectively, and the size is the size of the input feature vector.

Subsequently, we use the hidden layer units \(h_1\) of the first restricted Boltzmann machine RBM layer, as the display layer units \(v_2\) of the second restricted Boltzmann machine RBM layer, and the number of hidden layer units of the second restricted Boltzmann machine RBM layer as 1/2 of the display layer units.

The third restricted Boltzmann machine RBM layer is constructed in the same way to complete the construction of the three-layer RBM network model. In addition, at the top of the third RBM layer, we add a BP neural network for outputting classification results. The number of units in the output layer corresponds to the number of fault types.

The constrained Boltzmann machine, as a core component of the deep belief network, can be viewed as a model of energy. Its energy function integrates the input values of the units in the display layer and the output values of the units in the hidden layer, as well as the connection weights and biases between them. Its energy can be expressed in Eq. (9).

$$\begin{aligned} E(v, h | \theta ) = -\sum _{i=1}^{n} \sum _{j=1}^{m} v_iw_{ij}h_j - \sum _{i=1}^{n} a_iv_i - \sum _{j=1}^{m} b_ih_j \end{aligned}$$
(9)

where \(v_i\) denotes the input value of the display layer unit, \(h_j\) denotes the output value of the hidden layer unit, \(w_ij\) denotes the connection weight between the display layer unit \(v_i\) and the hidden layer unit \(h_j\), \(a_j\) denotes the bias of the display layer unit \(v_i\),\(b_j\) denotes the bias of the hidden layer unit\(h_j\), m and n denotes the number of nodes of the display layer and the hidden layer unit, respectively, \(\theta = (a, b, w)\) constitutes the model parameter of the RBM of a restricted Boltzmann machine.

According to the energy model, the joint distribution probability is defined in Eq. (10).

$$\begin{aligned} P(v,h)=\frac{e^{-E(v,h)}}{\sum _{v,h}e^{-E(v,h)}} \end{aligned}$$
(10)

For a restricted Boltzmann machine RBM containing display layer unit m and hidden layer unit n, the conditional probability of the hidden layer unit and the display layer unit is shown in Eq. (11).

$$\begin{aligned} P(v, h) = \prod _{i=1}^{m} p(v_i | h) \end{aligned}$$
(11)

The conditional probability of the hidden layer unit is shown in Eq. (12).

$$\begin{aligned} P(v, h) = \prod _{i=1}^{m} p(h_j | v) \end{aligned}$$
(12)

According to the above formula, the activation probability of the hidden layer unit can be obtained according to Eq. (13).

$$\begin{aligned} P(h_j = 1 | v) = \sigma (\sum _{i=1}^m w_{ij} v_i + b_j) \end{aligned}$$
(13)

The activation probability of the display layer unit is shown in Eq. (14).

$$\begin{aligned} P(v_j = 1 | h) = \sigma (\sum _{j=1}^n w_{ij} h_j + a_i) \end{aligned}$$
(14)

where \(\Sigma\) denotes the activation function. The training of each RBM within the DBN is performed in a layer-wise, unsupervised manner using an algorithm like Contrastive Divergence (CD-k). The core idea of CD-k is to approximate the gradient of the log-likelihood of the data. For each RBM, given an input vector v (either the initial features or the hidden activations from the previous RBM), the hidden unit activations h are sampled using P(h|v) (Eq. 13). Then, a reconstruction \(v'\) of the visible layer is sampled from P(v|h) (Eq. 14), followed by a new hidden activation \(h'\) sampled from \(P(h|v')\). The weight update rule for \(w_{ij}\) is proportional to the difference between the correlations of the original data and the reconstructed data: \(\Delta w_{ij} \propto<v_i, h_j>_{data} - <v_i, h_j>_{recon}\). Similar updates are applied to the biases \(a_i\) and \(b_j\). This greedy, layer-by-layer pre-training initializes the DBN’s weights effectively. After all RBM layers are pre-trained, a supervised fine-tuning stage using a BP neural network (added as the final layer) is performed on the entire DBN architecture. This fine-tuning adjusts the pre-trained weights using labeled data to optimize the network for the specific fault classification task, minimizing a classification error loss function.

D–S evidence theory-based multi-model fusion

To leverage the complementary strengths of the DBN and SVM models and to handle potential uncertainties or conflicts in their individual predictions, we employ Dempster–Shafer (D–S) evidence theory for decision-level fusion. D–S theory is particularly advantageous in such scenarios because it allows for the explicit representation of uncertainty and ignorance, unlike purely probabilistic approaches. It enables the assignment of belief not just to individual hypotheses (fault types) but also to unions of hypotheses, or even to the entire frame of discernment \(\theta\) (representing complete uncertainty). The process begins by converting the outputs (e.g., posterior probabilities or confidence scores) of the DBN and SVM models for each fault type into Basic Probability Assignments (BPAs), denoted \(m_1\) and \(m_2\) respectively. These BPAs quantify the degree of belief assigned to each fault type (and subsets thereof) based on the evidence from each model. This fusion approach is employed to address potential limitations inherent in relying on a single model alone. By leveraging the fusion mechanism of the evidence theory, we have constructed a fault diagnosis fusion model that capitalizes on the strengths of each model, thereby enhancing the accuracy and reliability of fault diagnosis. The mathematical justification for D–S theory’s potential to outperform simpler fusion methods (like averaging or voting) lies in its ability to: (1) Manage Ignorance: Explicitly assign belief to ”uncertainty” rather than forcing a distribution over known classes. (2) Handle Conflict: The conflict measure K can be used to assess the consistency of evidence sources. (3) Combine Diverse Evidence: It provides a robust framework for combining evidence from heterogeneous sources with varying reliability. By combining evidence in this structured manner, D–S fusion can lead to more robust and reliable diagnostic decisions, especially when individual models exhibit different error patterns or levels of confidence for different fault types.

Based on the D–S fusion theory, the decision-level fusion of the results from two models is conducted as follows:

  1. (1)

    The samples to be classified undergo preprocessing and are then input into both a trained DBN and SVM model.

  2. (2)

    The posterior probabilities for the classification of the samples are obtained from both models. These two sets of posterior probabilities represent the belief distribution functions \(m_1\)m1,m2 \(m_2\) of the two independent evidences for the samples.

  3. (3)

    The D–S evidence theory is utilized to fuse the two sets of belief distribution functions m1\(m_1\), m2\(m_2\), resulting in a fused belief distribution function. In this paper, the identification framework is denoted as \(\Theta = x_1, x_2, x_3, x_4\) , which represents low-resistance grounding, high-resistance grounding, arcing grounding, and intermittent arcing grounding, respectively. The classification results of the two models on a given sample, \(E_{DBN}\) and \(E_{SVM}\), serve as two independent evidences. The classification confidences of the two models on the sample \(m_1\), \(m_2\), are fused to obtain the fused confidence distribution function of the two models on the sample. The confidence distribution functions of the two models for the classification of the samples are denoted as \(m_1\)m1, \(m_2\)m2, respectively, and the fused confidence distribution function of the two models is denoted as \(m(x) = \{ m(x_1), m(x_2), m(x_3), m(x_4) \}\).

    $$\begin{aligned} m(x) = \left\{ \begin{array}{ll} \frac{\displaystyle \sum _{x_i \cap x_j = x} m_1(x_i)m_2(x_j)}{1 - K}, & x \ne \emptyset \\ 0, & x = \emptyset \end{array} \right. \end{aligned}$$
    (15)

    where \(K=\sum _{x_{i}\cap x_{j}=\varnothing }m_{1}(x_{i})m_{2}(x_{j})\) denotes the degree of conflict between two pieces of evidence.

  4. (4)

    In the final obtained trust distribution function m, the category containing the maximum value is the classification result after fusion of this sample. The computational complexity of the fused model is given by \(O(K\times 2^C)\) multiplied by the average complexity of a single model, where K denotes the number of evidence sources. In this fusion framework, the evidence sources consist of two individual models: DSN and SVM. C represents the number of fault classes, which is 4 in the present case. Therefore, the overall complexity of the fused model is \(O(2\times 2^4)\) times the average complexity of individual models.

Final fault diagnosis

The fault diagnosis fusion model serves as the basis for the final diagnosis of the fault type in the test sample line. This approach not only considers the diverse nature of fault-induced traveling wave signals but also improves diagnostic accuracy through the evidence theory fusion mechanism. Consequently, it provides robust technical support for the safe and stable operation of the power system.

The approach presented in this paper not only considers the diversity of fault-induced traveling wave signals but also significantly enhances diagnostic accuracy through a multi-machine learning model fusion method based on D–S evidence theory. This advancement provides crucial technical support to ensure the safe and stable operation of power systems, thereby contributing to the overall reliability and efficiency of the electrical grid.

Experiments

Setup

The experiments in this paper are run on a machine with Intel(R) Core(TM) i5-1135G7 @ 2.40GHz processor, 16GB of internal memory, Windows 10 Home Edition, and Python 3.6.

Datasets

The experimental dataset of this paper is the fault transient zero sequence current signal data extracted from the actual recorded waveforms of a company, with a total of 212 samples in the dataset, and each sample contains data with a duration of 80ms, with a total of 2048 sampling points. The fault types in the samples are categorized into four types: low resistance grounding, high resistance grounding, arcing grounding, and intermittent arcing grounding. Among all the samples, there are 47 samples of low resistance grounding, 46 samples of high resistance grounding, 65 samples of arcing grounding, and 54 samples of intermittent arcing grounding.

Performance of the D–S evidence theory-based multi-model fusion

As shown in Tables 1 and 2, ’F1’, ’F2’, ’F3’, and ’F4’ represent the Basic Probability Assignments (BPAs) or belief masses assigned by the respective model to the four fault types: low-resistance grounding, high-resistance grounding, arcing grounding, and intermittent arcing grounding, respectively. The ’Conclusion’ column indicates the fault type to which the model assigns the highest belief based on these BPAs for the given sample. The classification accuracies of Deep Belief Networks, Support Vector Machines, and Fusion Models are shown, from the results, it can be concluded that the accuracy can be improved by fusion modeling and the fusion modeling fuses the results of the different models, which usually has better stability in practical applications, and the subsequent demonstration of the improvement of its fault-tolerance ability through experiments.

Table 1 Example basic probability assignments (BPAs) from SVM for a misclassified sample.
Table 2 Example basic probability assignments (BPAs) from DBN for a misclassified sample.

Decision layer fusion through D–S evidence theory can improve the model’s ability to handle uncertain information, as shown in Fig. 4a,b, the confusion matrix after fusion of models using D–S evidence fusion theory, it can be found that the overall effect of the model has been improved, and in the case of type 2 (high-resistance grounding) samples. As shown in Fig. 4c, which show the confusion matrices of the support vector machine model. From the confusion matrix, it can be observed that for the samples of type 1 (low-resistance grounding) and type 3 (arcing grounding), both models work relatively well, while for type 2 (high-resistance grounding) and type 4 (intermittent arcing grounding), the mis-classification rate is relatively high. Although the performance of the support vector machine model is poor, the effect of the fusion model is still the same as that of the deep belief network model, which shows that the D–S evidence theory has an ability to deal with uncertain information. This shows that D–S evidence theory has a strong ability to deal with uncertain information, and the fusion of the classification confidence results of the two models using D–S evidence theory can improve the fault tolerance of the model and thus improve the model accuracy.

The underlying reasons of the misclassified samples may includes: In real-world scenarios, the scarcity of anomalous samples makes single models prone to overfitting. In practical environments, traveling wave signals are susceptible to noise and line attenuation, resulting in low signal-to-noise ratios. Furthermore, power system faults are diverse, and the traveling wave characteristics of different fault types may overlap or exhibit gradual transitions, leading to ambiguous decision boundaries and increased difficulty in model discrimination.

Fig. 4
figure 4

The confusion matrix comparison of different model.

Ablation study

Since The proposed method fuses DBN, SVM and D–S evidence theory, we conduct ablation experiments to show the neccesary of each modules.From the results in Fig. 5, Tables 1 and 2, it can be concluded that both the samples misclassified by the deep belief network and the samples misclassified by the support vector machine can be fused by the D–S evidence theory to the classification confidence of their outputs, and it is possible to get the correct results.

Fig. 5
figure 5

The accuracy of fault diagnosis models.

Fault samples that are misclassified by the deep belief network or support vector machine model are selected from the test samples and their probabilities before fusion and after fusion are shown in Tables 1 and 2.

Taking the data in Table 1 as an example, the results are calculated as follows: There are four types of distribution line faults in this paper, so the identification framework \(\theta = {x_1,x_2,x_3,x_4}\) represents four types of distribution line faults, namely, low-resistance grounding, high-resistance grounding, arcing grounding and intermittent arcing grounding.

The outputs of the two independent classification models on the samples are two sets of basic trust distribution functions \(m_1 = {0.039740, 0.248672, 0.50638, 0.185208}\) and \(m_2 = {0.000718, 0.772286, 0.168866, 0.058}\). The classification results obtained from the underlying trust distribution function are high resistance grounding and intermittent arcing grounding, respectively. Calculation of the normalization factor \(K = \sum _{x_i \cap x_j = \emptyset } m_1(x_i)m_2(x_j)\) = 0.696203, Then calculate the fused trust distribution according to Eq. (16).

$$\begin{aligned} m(x_k) = \frac{\sum _{x_i \cap x_j = x_k} m_1(x_i)m_2(x_j)}{1 - K} \end{aligned}$$
(16)

In the fused trust distribution function, the value of \(m(x_2)\) is the largest, and the fault type of this sample is finally determined to be high-resistance grounding.

Performance comparison with state-of-the-art methods

To further evaluate the effectiveness of the proposed framework, we compared our method with two recently published state-of-the-art approaches: the distribution-balanced federated learning (DBFed-LSTM) framework22 and the explainable quantum deep neural network (QDNN)23. Both methods were adapted to the same dataset setting described in Section X, where the experimental dataset consists of 212 real-world transient zero-sequence current samples, each with 2048 sampling points over a duration of 80 ms, covering four fault types (low-resistance grounding, high-resistance grounding, arcing grounding, and intermittent arcing grounding).

For a fair comparison, we carefully tuned the hyperparameters of each method under the given dataset constraints: DBFed-LSTM: The widths of the two LSTM layers were adjusted to (64, 96), and the learning rate was set to 0.001. QDNN: The learning rate was set to 0.0001. Proposed Method: The settings followed Section X, including multi-dimensional temporal–frequency feature extraction and D–S evidence theory-based multi-model fusion.

The classification accuracies of the three approaches are summarized in Table 3.

Table 3 Comparison of fault diagnosis accuracy with advanced methods.

As shown in Table 3, the proposed approach achieves an accuracy of 74.6%, which significantly outperforms both DBFed-LSTM (63.6%) and QDNN (52.2%). This improvement demonstrates that our method is better suited for handling small-scale, noisy, and imbalanced real-world datasets, where advanced paradigms such as federated learning or quantum neural networks may face challenges due to data scarcity or hyperparameter sensitivity.

Performance and parametric analysis of DBN-based diagnosis model

The deep belief network is a network that consists of multiple RBM layers. The number of RBM layers and the number of neurons in the hidden layer of each RBM are the parameters that determine the structure of the deep belief network. These parameters also significantly impact the effectiveness of the deep belief network. In this paper, we adopt a layer-by-layer greedy approach to determine the structure of the deep belief network by training the number of RBM layers and the number of neurons in each layer. Specifically, we begin by setting the number of neurons in the hidden layer of the first RBM to a specific value range. Subsequently, we adjust the number of neurons within a defined range to observe the experimental impact of the set structural parameters. The number of RBM layers and the number of neurons in the hidden layer of each RBM are determined by the classification impact of various structural parameters. As depicted in Table 4, the impact of varying numbers of RBM layers and the quantity of neurons in each RBM layer on the power failure dataset utilized in this paper is presented in the table. 170 samples are randomly selected from the sample set as the training set for each experiment. 42 samples are used as the test set, and the accuracy rate in the table is calculated as the average value of the three experiments.

Table 4 Classification accuracy corresponding to the number of layers and the number of neurons in each layer of RBM.
Fig. 6
figure 6

Training time with varying numbers of RBM layers.

In this paper, we assess the accuracy of these two parameters on the test sample set with varying numbers of RBM layers and the number of neurons in the hidden layer of the RBM. As shown in the table, increasing the number of RBM layers from one to two or three significantly enhances the classification accuracy rate. This suggests that augmenting the number of RBM layers can enhance unsupervised learning capabilities and amplify the information extracted from training samples when the layer count is low. When increasing the number of layers from the third layer to the fourth layer, the accuracy does not change significantly or may even decrease. This could be due to the small sample size and the fact that the increasing number of RBM layers causes the model to repeatedly learn from the training samples, leading to overfitting. Consequently, the training time of the entire network will also increase. According to Table 4, if the number of neurons in the hidden layer is initialized as [1000, 600, 200, 100] from the first layer to the fourth layer of the RBM hidden layer, the training time cost is displayed in Fig. 6.

According to Table 4 and Fig. 6, it can be observed that the unsupervised pre-training time increases with each additional RBM layer, but the results do not necessarily improve. Table 4 and Fig. 6 show that the designed deep belief network contains three RBM layers, and the number of hidden layer neurons in each RBM layer is [1000, 600, 200].

Performance and parametric analysis of SVM-based diagnosis model

In this section, the penalty coefficients and kernel function coefficients of the support vector machine are selected and optimized through experiments. The grid search range of the penalty coefficient c is \(2^{-10}\)\(2^{10}\), and the grid search range of the kernel function coefficient g is \(2^{-10}\)\(2^{10}\), and the dataset is divided into 10 parts, and the results of the grid search and cross-validation are shown in Fig. 7.

The results of grid search and cross-validation are shown in Fig. 7. It can be seen from the above figure that the accuracy on the real dataset is the highest at c = 16 and g = 0.25, which is 63.6%, so the two parameters are set to c = 16 and g = 0.25, respectively.

Fig. 7
figure 7

The selection of SVM parameters.

Conclusion and discussion

This paper aims to analyze and study the transient zero sequence current faults of each line recorded by the device when a single-phase grounding fault occurs in an real-world line. It combines signal processing and artificial intelligence methods to conduct fault diagnosis of the line. A distribution line fault classification method based on the fusion model is proposed. Since it is difficult to distinguish between fault types in the actual samples in the time series, and the type characteristics are not obvious, this paper utilizes the fast Fourier transform to convert the transient zero sequence currents into the frequency domain. The frequency spectrum is then used as the characteristic feature of the samples for training the model. Due to the limited number of fault samples in practice, this paper utilizes a deep belief network and support vector machine, which exhibit superior performance in tasks with small sample sizes, to develop distribution line fault classification models. Finally, the paper employs the D–S evidence fusion theory to combine the two models at the decision level, aiming to enhance the classification accuracy and stability of the models in real-world applications.

This paper analyzes the transient zero-sequence current data collected by a wave recording device when a single-phase grounding fault occurs in an real-world line. The study introduces a method for selecting distribution line faults and identifying their types, which has demonstrated effectiveness and practical benefits. However, the distribution line fault diagnosis work presented in this paper still has several areas that require improvement. Enhancing the accuracy of fault selection and classification is essential to ensure greater stability in practical applications. The accuracy of fault routing and classification also needs improvement to enhance stability in practical applications. The potential areas for improvement in this paper are as follows: Based on the data provided in the project, this paper can only use the transient zero-sequence current collected at the time of fault occurrence to perform the distribution line fault diagnosis task. However, there is no provision to synthesize the information of other transient and steady-state electrical quantities in the power system during a distribution line fault. Although the transient zero-sequence current contains more fault information at the time of a grounding fault occurrence, the fault information obtained by considering only one electrical quantity is always limited. Therefore, if there is a provision, the accuracy of fault classification should be improved. Therefore, under certain conditions, it is possible to synthesize various electrical quantities when a ground fault occurs, such as transient traveling waves, steady-state zero-sequence current, etc., to comprehensively conduct fault diagnosis research on distribution lines. This helps improve the accuracy and stability of diagnoses in practical applications.

A key limitation of this work is the relatively small dataset, comprising 212 real-world fault samples. While our proposed D–S fusion method demonstrated improved performance on this dataset, it is important to acknowledge that small sample sizes can impact the generalization capability of machine learning models. There is a higher risk that the models might overfit to the specific characteristics of the training data, potentially leading to reduced performance on unseen data or different operational conditions. The reported accuracy, while an improvement in our specific context, should be interpreted with this limitation in mind. Future research should prioritize the acquisition of a larger and more diverse dataset to robustly validate the proposed framework and further assess its generalization ability. Techniques such as data augmentation, transfer learning, or few-shot learning could also be explored to mitigate the challenges posed by limited data.