Introduction

Power distribution networks1 are essential electrical system components that interface between 35 kV and 400 V. These devices control voltage, surge protection, and grid stability to provide reliable power supply2. Power distribution nowadays is essential for healthcare, utilities, communication, and manufacturing3. Uncontrolled power blackouts can pose safety issues and lost productive time, therefore economic or operational activities limit power flow to consumers4. Vegetation, powered weather, operational equipment, and cyclic short circuits are the primary reasons for faults along the fencing line. To reduce downtime and improve fault identification, the system needs effective and automated fault management5. Fault identification, isolation, and service restoration are typical Fault Management System stages6. Sensors, intelligent electronic devices, and SCADA systems with defect detection capabilities monitor the network’s state in real-time7. Advanced automation can quickly restructure the distribution grid, employing replacement tributaries to restore power to non-affected regions in seconds8. Smart grid technologies enable machine learning and IoT-based monitoring systems9, which automate inspections and reduce manual interventions, thereby decreasing maintenance costs and network downtime. Predictive maintenance models use real-time and historical fault data to proactively fix. Localized fault containment using microgrids and Distributed Energy Resources (DERs)10,11 reduces outages and improves fault tolerance. For efficient fault diagnosis, cost-effective smart fault management systems with resilient cybersecurity and efficient communication protocols are needed. Artificial intelligence12, renewable energy13, and automation will enhance the reliability and efficiency of power distribution networks by improving problem detection14.

Problem scope

Current technology lacks real-time, high-resolution detection in complex, geographically broad distribution networks6,7. Traditional fault management systems require improved detection, classification, and restoration methods due to delayed response times, imprecise problem identification, and poor localization4. LiDAR and high-resolution optical images improve network monitoring and problem diagnostics in the new system, eliminating issues2. AI, machine learning, and automation enhance strategy decision-making and reduce the need for human inspections3. This new technology enhances the dependability, efficiency, and adaptability of power distribution systems, ensuring a continuous electricity supply and improved performance.

Research intent

The major research improves power distribution network fault detection, predictive maintenance, and localization using LiDAR and optical image technology. Goals are: A hybrid deep learning model fuses multimodal data to reduce defect localization errors and reaction time. To develop hybrid fault detection systems using spatial mapping and optimal imaging to improve real-time fault detection and investigate merging the system with smart grid frameworks and IoT monitoring systems for power distribution network fault management and remote asset monitoring.

Modern power distribution networks face challenges in fault identification and localization due to load unpredictability and the complexity of fault propagation, which has inspired the development of the Multimodal Deep Feature Hybrid Deep Learning Model (MDF-HDL). More precise, real-time defect management solutions are needed due to high false alarm rates, poor reaction times, and limited precision in traditional systems. MDF-HDL integrates LiDAR, optical pictures, and sensor data with deep learning layers to extract rich feature representations, making it innovative. Kalman filtering and Adam-optimized decision trees improve accuracy and reduce errors in feature fusion and classification. Precision fault localization through GIS mapping facilitates effective maintenance planning. A comprehensive and sophisticated solution that addresses previous system limitations, MDF-HDL delivers dependable and efficient fault management in complex grid environments, with excellent performance metrics, low inference time, and low computational complexity.

Related works

Mirshekali et al.15 use a CNN capsule network (CapsNet) to track voltage and current hierarchies and relations to increase fault location precision using deep learning. CapsNet’s spatial information preservation and CNN’s feature extraction during training on synthetic data from simulated faults improve accuracy. Shafiullah et al.16 used ML learning to diagnose active faults. Current and voltage signals are processed by SVM, ANN, or Decision Tree Algorithm. The suggested models use historical data for fast calculation and real-time defect identification. The necessity for high-quality training data and grid structure model difficulties limit the research. Thomas & Shihabudheen17 develop Neural Architecture Search (NAS), which creates optimal deep transformer (NAS-MDT) models that outperform MDT approaches. NAS-MDT solves multi-dimensional fault detection faster and more accurately than typical deep learning models, but it requires high-quality labeled datasets for training and substantial computational costs during the search phase. Yoon and Yoon18 developed a CNN-LSTM automated fault diagnostic system for electric power systems to improve noise and fault scenario resilience. The voltage and current signal-based fault detection and classification method is accurate and real-time on synthetic and real-world datasets.

Recent research has enhanced multimodal deep learning models for power distribution system defect detection, offering cutting-edge techniques that improve precision and responsiveness in real-time. For example, a multimodal ResNet method that combined recorded electrical data with waveform-driven feature extraction showed strong fault diagnosis capabilities, outperforming conventional single-modality models. By utilizing multi-sensor data, such as vibration, sound, and current signals, another study combined time series analysis and Transformer-based networks to optimize fault diagnosis in electric motors. Through improved feature fusion and hyperparameter tuning, the study achieved extremely accurate classification results19. To overcome the challenges of complex background interference, small component recognition, and real-time inspection delays, a multi-scale fusion enhanced detection algorithm was specifically designed for power transmission lines, incorporating Coordinate Convolution and optimized detection heads. Furthermore, with a focus on both computational efficiency and performance, lightweight multimodal CNN architectures such as ShuffleNet V2 have been applied to large datasets of current and vibration signals for effective fault detection with high classification accuracies exceeding 98.8% (2010 re-visited with large modern data). These new models complement the MDF-HDL approach by demonstrating advancements in hybrid model optimization, multimodal sensor fusion approaches, and practical applicability for complex grid fault management20.

To improve signal resilience and reliability in multimodal sensing environments, Men et al.19 created a photonic-assisted method for producing combined radar and communication signals that are resistant to power fading. Hazim and Al-Allaf20 offer a thorough analysis of fault detection developments in optical fiber networks, building on optical failure analytics and highlighting the application of deep learning and artificial intelligence to obtain accurate problem localization. By releasing a DE–HHO hybrid metaheuristic model for microgrid energy management optimization, Liu et al.21 shown how hybrid intelligence algorithms may enhance the operational efficiency and flexibility of energy systems. Similar to this, Soothar et al.22 looked at sophisticated machine learning-based methods for identifying optical defects, highlighting the need for real-time adaptability and data-driven problem diagnostics. In their demonstration of an OOA-optimized bidirectional LSTM network with spectral attention for power load forecasting, Liu, Hou, and Yin23 showed how temporal–spectral fusion may improve the accuracy of predictive modeling. By altering the phase-frequency distribution for multi-band phased array applications, Zhou et al.24 improved optical pulse-based radar systems and laid the groundwork for multimodal fusion techniques that combine optical and signal-based sensing. In line with the MDF-HDL model’s focus on reliable data fusion and reliability enhancement, Hou and Liu25 improved smart grid sustainability by hybrid machine learning approaches that take into consideration multi-factor effects and missing data imputation.

Additionally, Zhang et al.26 and Hou, Liu, and Yu27 validated the efficacy of fusion-based frameworks for enhancing diagnostic precision and interpretability by proposing dual-stream convolutional fusion models and multimodal data imputation for reliable fault diagnosis in mechanical and analog systems. The goals of this work, which include multimodal feature integration and accurate fault diagnosis, are quite similar to those of Song et al.‘s28 introduction of Fast Fusion Net, a deep learning-based technique for identifying high-voltage power line flaws. Similar to the deep feature fusion idea at the heart of MDF-HDL, Li et al.29 presented an enhanced LSTM fusion and cross-attention framework for multimodal fault classification in power distribution systems. Concurrently, Chaurasia et al.30 undertook a statistical investigation of SNR and optical power distribution in visible light communication systems, offering analytical insights for understanding the behavior of optical data in multimodal frameworks. Liu et al.31 showed the benefits of combining machine learning and mathematical modeling for power system monitoring by using a hybrid neural network based on Beluga Whale Optimization to predict transformer oil temperature. Finally, Kulandaivel and Jeyachitra32 used a shallow multitask neural network to experimentally examine optical spectrum-based power distribution, including optical analytics for intermediate node monitoring and network defect evaluation. These studies, which together reflect a progressive trend toward multimodal, hybrid, and optimization-driven AI architectures that combine optical, photonic, and deep learning techniques for high-accuracy fault detection and localization, serve as the conceptual and empirical foundation for the proposed MDF-HDL framework.

Research gap

CNN capsule networks, ML algorithms like SVM and ANN, Neural Architecture Search (NAS) optimizing transformer models, and CNN-LSTM hybrid systems are used in this research to detect power system faults. Significant research gaps remain. First, most models rely on high-quality labeled datasets, which are scarce or expensive in complex grid systems, thereby impacting their generalizability and resilience. Second, computing resource demands, especially those associated with NAS model searches, limit deployment. Third, synthetic training data may not fully replicate real-world variability and noise, which can limit problem detection performance across different settings. Fourth, comprehensive defect detection has yet to integrate multimodal sensor data, such as LiDAR and optical images, beyond voltage and current signals. Addressing these gaps would improve real-time, precise, and computationally efficient fault detection for dynamic power distribution networks.

Joint processing technology for fault detection in power-distributed networks

Multimodal Data Fusion and a Hybrid Deep Learning Model are used to build an intelligent real-time fault detection and management system for power distribution networks. Traditional fault detection (SCADA-based monitoring and manual inspection) is slow, inaccurate, expensive, and complex. This work employs multimodal data fusion with high-resolution Laser Radar (LiDAR) scans and Optical Image processing to improve fault localization, classification, and predictive maintenance. Figure 1 shows MDF-HDL construction overall.

Fig. 1
figure 1

Overall working process of Fault detection in power distributed networks.

Data collection and preprocessing

For more efficient data analysis, the multimodel data fusion system synchronizes optical imaging and LiDAR scanning. Figure 2 illustrates the processing flow for this step. LiDAR sensors first map distribution networks in 3D. The laser pulse from LiDAR calculates the time it takes each plus to return after reflecting on objects. During this process, distance \(\:\left(d\right)\) to the particular object is estimated as \(\:d=\frac{c\cdot\:t}{2}\); here \(\:d\) is computed from the light speed \(\:\left(c\right)(\sim3.0\times\:{10}^{8}m/s)\), time \(\:\left(t\right),\) and the factor \(\:(1/2)\) represented for laser pulses round-trip travel.

Fig. 2
figure 2

Process of data preparation in fault localization.

Then it is denoted as \(\:P=\left\{\right({x}_{i},{y}_{i},{z}_{i},{I}_{i})\mid\:i=\text{1,2},\dots\:,N\}\); here \(\:({x}_{i},{y}_{i},{z}_{i})\) is defined as LiDAR return points spatial coordinates, return intensity value \(\:{(I}_{i})\) and LiDAR points total counts \(\:\left(N\right)\). Then, the optimal image-based capture information is represented as \(\:I(x,y)=\left\{R\right(x,y),G(x,y),B(x,y\left)\right\}\); here, image intensity is denoted as \(\:I(x,y)\), red, green and blue channel is represented as \(\:R(x,y),G(x,y),B(x,y)\). From the image, the edge component is derived with the help of Sobel filtering, which is mentioned in Eq. (1)

$$\:{G}_{x}=\left[-1\:0\:1\:-2\:0\:2\:-1\:0\:1\:\right],{G}_{y}=\left[-1\:-2\:-1\:0\:0\:0\:1\:2\:1\:\right]$$
(1)

From the Eq. (1), gradients, the magnitude of the component is computed as \(\:G=\sqrt{{G}_{x}^{2}+{G}_{y}^{2}}\); \(\:{G}_{x}^{2}\:and\:{G}_{y}^{2}\) is defined as horizontal and vertical direction gradients. Then, the thermal image gives the temperature-based fault details that are denoted as \(\:T(x,y)={f}_{thermal}\left(I\right(x,y\left)\right)\). Here, \(\:{f}_{thermal}\) is represented as the transformation mapping of the RGB image to the temperature data, and the temperature pixel is defined as \(\:T(x,y)\). First, the gathered LiDAR and optical images are tagged with the help of GPS coordinates denoted as \(\:G=(lat,lon,alt,t)\). Then, the inertial measurement unit and GPS, the gathered information is aligned in the general coordinate frame that is denoted as \(\:({x}^{{\prime\:}},{y}^{{\prime\:}},{z}^{{\prime\:}})=R\cdot\:(x,y,z)+T\).

Feature extraction and data fusion

Multi-sensor fusion methods, such as deep Kalman filtering, neural networks, and attention-based models, integrate multiple data streams to accurately represent the power infrastructure and its components. The point cloud is represented as \(\:P=\left\{\right({x}_{i},{y}_{i},{z}_{i},{I}_{i}){\}}_{i=1}^{N}\). Here, \(\:{I}_{i}\) is the image intensity, spatial coordinates are represented as \(\:({x}_{i},{y}_{i},{z}_{i})\) and the total number of points are denoted as\(\:\:N\).

Fig. 3
figure 3

Process of feature extraction and data fusion.

Process of feature extraction and data fusion is shown in Fig. 3. Initially, the best fitting plane region is identified \(\:Ax+By+Cz+D=0\) to extract the power lines. Here, D is the offset, and the plane normal vector is defined as \(\:(A,B,C)\). Then, the residual errors are computed from the distance that is calculated using Eq. (2)

$$\:{d}_{i}=\frac{\mid\:A{x}_{i}+B{y}_{i}+C{z}_{i}+D\mid\:}{\sqrt{{A}^{2}+{B}^{2}+{C}^{2}}}$$
(2)

In Eq. (2), \(\:{d}_{i}\) is below the threshold \(\:{(d}_{i}>{d}_{threshold})\)value, and then the LiDAR points are allocated to the plane. This process is repeated continuously to fit the points into the model. Then, curvature-based features are derived to identify the conductors and pole deformation. Then, the local curvature \(\:\left(\lambda\:\right)\) is estimated as \(\:{\lambda\:}_{i}=\frac{{\sum\:}_{j}^{\:}{w}_{j}{d}_{ij}^{2}}{{\sum\:}_{j}^{\:}{w}_{j}}\); here, Euclidean distance is represented as \(\:{d}_{ij}\), weight function is \(\:{w}_{j}\) (that is of higher importance to closer neighbors). The maximum \(\:\lambda\:\) value is represented as the misaligned components. As said, the gradient filters applied to extract the spatial features and the gradient points are defined in Eq. (1). From the gradient points (\(\:{G}_{x}\:and\:{G}_{y})\) the edge transitions are estimated according to Eq. (3)

$$\:{G}_{x}=\left[-1\:0\:1\:-2\:0\:2\:-1\:0\:1\:\right]*I\left(x,y\right)\:{G}_{y}=\left[-1\:-2\:-1\:0\:0\:0\:1\:2\:1\:\right]*I(x,y)\:]$$
(3)

From the gradient points, the magnitude is estimated as \(\:G=\sqrt{{G}_{x}^{2}+{G}_{y}^{2}}\) which helps to derive the edge intensity. The texture features are extracted with the help of convolution operations that are defined in Eq. (4)

$$\:F\left(l,m\right)={\iint\:}_{-\infty\:}^{\infty\:}K\left(i,j\right)I\left(l-i,m-j\right)di\:dj$$
(4)

in Eq. (4), the input image intensity is \(\:I(x,y)\) at coordinates \(\:(x,y)\), the convolution kernel is represented as \(\:K(i,j),\) and the output feature map is signified as \(\:F(l,m)\). Therefore, the derivative of \(\:F(l,m)\) concerning \(\:I(x,y)\) is given by Eq. (5) (6)& (7),

  1. a.

    Start from the discrete convolution definition (our Eq. (4)):

    $$\:F(x,y)=\sum\:_{u}\:\sum\:_{v}\:I(x+u,y+v)K(u,v)$$
    (5)
  2. b.

    Differentiate \(\:F(x,y)\) with respect to a kernel element \(\:K\left({u}^{{\prime\:}},{v}^{{\prime\:}}\right)\) : because the sum is linear, only the term with \(\:(u,v)=\left({u}^{{\prime\:}},{v}^{{\prime\:}}\right)\) survives:

    $$\:\frac{\partial\:F(x,y)}{\partial\:K\left({u}^{{\prime\:}},{v}^{{\prime\:}}\right)}=I\left(x+{u}^{{\prime\:}},y+{v}^{{\prime\:}}\right)$$
    (6)
  3. c.

    Differentiate \(\:F(x,y)\) with respect to an input pixel \(\:I\left({x}^{{\prime\:}},{y}^{{\prime\:}}\right)\) : similarly, only kernel entries that align with that input pixel contribute:

    $$\:\frac{\partial\:F(x,y)}{\partial\:I\left({x}^{{\prime\:}},{y}^{{\prime\:}}\right)}=K\left({x}^{{\prime\:}}-x,{y}^{{\prime\:}}-y\right)$$
    (7)
    $$\begin{aligned}\frac{\partial\:F\left(l,m\right)}{\partial\:I\left(x,y\right)} & =\frac{\partial\:}{\partial\:I\left(x,y\right)}{\iint\:}_{-\infty\:}^{\infty\:}K\left(i,j\right)I\left(l-i,m-j\right)\hspace{0.17em}di\hspace{0.17em}dj{\iint\:}_{-\infty\:}^{\infty\:}K\left(i,j\right)\frac{\partial\:I\left(l-i,m-j\right)}{\partial\:I\left(x,y\right)}\hspace{0.17em}di\hspace{0.17em}dj\:\:\frac{\partial\:I\left(l-i,m-j\right)}{\partial\:I\left(x,y\right)} \\ & =\delta\:\left(l-i-x,m-j-y\right)\:\:\frac{\partial\:F(l,m)}{\partial\:I(x,y)}=K(l-x,m-y)\end{aligned}$$
    (8)

In Eq. (8), the linear operators, Dirac delta function \(\:\left(\delta\:\right(x\left)\right)\) at value \(\:i=l-x\) and \(\:j=m-y\) is used to compute the textural features. The derivative outputs help to identify the change of outputs \(\:F(l,m)\) concerning \(\:I(x,y),\) equal to the kernel values. During this process, the gradient loss function \(\:\left(L\right)\) is computed to \(\:I\left(x,y\right)\:and\:K(i,j)\).

  1. a.

    Let \(\:L\) denote the scalar loss. For a given filter weight \(\:w\) (previously written as a generic \(\:\text{W}\) or \(\:\text{w}\) ), the gradient used in backpropagation is shown in in Eq. (9).

    $$\:\frac{\partial\:L}{\partial\:w}=\sum\:_{x,y}\:\frac{\partial\:L}{\partial\:F(x,y)}\frac{\partial\:F(x,y)}{\partial\:w}$$
    (9)
  2. b.

    Using the result from step (2b) above (derivative of \(\:F\) w.r.t. a kernel element) we substitute \(\:\partial\:F(x,y)/\partial\:w=I\left(x+{u}^{{\prime\:}},y+{v}^{{\prime\:}}\right)\) (or the appropriate alignment), as shown in Eq. (10).

    $$\:\frac{\partial\:L}{\partial\:w}=\sum\:_{x,y}\:{\delta\:}_{F}(x,y)I\left(x+{u}^{{\prime\:}},y+{v}^{{\prime\:}}\right)$$
    (10)

    where we define \(\:{\delta\:}_{F}(x,y)\triangleq\:\frac{\partial\:L}{\partial\:F(x,y)}\) (the local sensitivity / error signal at location \(\:(x,y)\) ). The \(\:I\left(x,y\right)\) based computed loss function is shown in Eq. (11)

    $$\begin{aligned}\frac{\partial\:L}{\partial\:I\left(x,y\right)}&={\sum\:}_{l,m}\frac{\partial\:L}{\partial\:F\left(l,m\right)}\cdot\:\frac{\partial\:F\left(l,m\right)}{\partial\:I\left(x,y\right)}\:\frac{\partial\:F\left(l,m\right)}{\partial\:I\left(x,y\right)}=K\left(l-x,m-y\right)\left(previous\:result\right)\:\frac{\partial\:L}{\partial\:I(x,y)} \\ & ={\sum\:}_{l,m}\frac{\partial\:L}{\partial\:F(l,m)}\cdot\:K(l-x,m-y)\end{aligned}$$
    (11)

    In Eq. (11), the gradient loss is computed with the help of chain rules, and the filter weight \(\:K(i,j)\) based gradient loss is estimated using Eq. (12)

    $$\begin{aligned}\frac{\partial\:L}{\partial\:K\left(i,j\right)}&={\sum\:}_{l,m}\frac{\partial\:L}{\partial\:F\left(l,m\right)}\cdot\:\frac{\partial\:F\left(l,m\right)}{\partial\:K\left(i,j\right)}\:\frac{\partial\:F\left(l,m\right)}{\partial\:K\left(i,j\right)}=I\left(l-i,m-j\right)\:\frac{\partial\:L}{\partial\:K(i,j)} \\ & ={\sum\:}_{l,m}\frac{\partial\:L}{\partial\:F(l,m)}\cdot\:I(l-i,m-j)\end{aligned}$$
    (12)

    The filter weight-based computed gradient values are estimated with the help of a feature map with a loss gradient concerning output value. During the optimization \(\:{m}_{t}\:and\:{v}_{t}\) is zero; therefore, the bias value is also zero. Hence, the weight and bias updating rule is defined in Eq. (13).

    $$\:{\widehat{m}}_{t}=\frac{{m}_{t}}{1-{\beta\:}_{1}^{t}}\:{\widehat{v}}_{t}=\frac{{v}_{t}}{1-{\beta\:}_{2}^{t}}\:{w}_{t+1}={w}_{t}-\frac{\eta\:}{\sqrt{{\widehat{v}}_{t}}+\varepsilon}{\widehat{m}}_{t}\:\}$$
    (13)

    In Eq. (13), \(\:\eta\:\) is the learning rate, \(\:\varepsilon\) is the small constant, \(\:{\widehat{m}}_{t}\:and\:{\widehat{v}}_{t}\:\)computed bias corrected moments, The β₁ (first-moment decay rate), and β₂ (second-moment decay rate) are described. To guarantee the openness and repeatability of the optimization process, the text specifies their default or experimental settings. According to the discussion, the convolution operations derive the discoloration, cracks, and hotspots from the images.

Multimodal data fusion

Multimodal data fusion is crucial in fault detection systems, as it optimally integrates valuable sensor information to enhance accuracy and robustness. Multimodal fusion is the systematic integration of several data sources, namely LiDAR point clouds, optical images, and sensor measurements, into a unified representational space to increase the robustness of defect characterisation. The process uses Kalman filtering, a statistical estimator that properly combines sequential sensor values, to decrease noise and uncertainty. Textural, structural, and spatial information is progressively extracted from multimodal input by a hierarchy of neural processing units known as deep learning layers. In this scenario, graph neural networks (GNNs) capture the topological and relational interactions among LiDAR points, whereas convolutional neural networks (CNNs) learn constrained spatial patterns from optical pictures. Together, these layers enable coherent feature fusion and comprehensive system comprehension across modalities. The processing structure of multimodal data fusion is shown in Fig. 4.

Fig. 4
figure 4

Process of multimodal data fusion.

A Kalman filter algorithm uses an algebraic estimator, combining several sensor measurements over time. The state update is given as \(\:{\widehat{X}}_{k}={\widehat{X}}_{k-1}+{K}_{k}({Z}_{k}-H{\widehat{X}}_{k-1})\); here, \(\:{\widehat{X}}_{k}\) is denoted as computed fault prediction state, \(\:{Z}_{k}\) is observed measurement vector (LiDAR and optical features), \(\:H\) is the transformation matrix and Kalman gain (\(\:{K}_{k})\) that is estimated as \(\:{K}_{k}={P}_{k-1}{H}^{T}(H{P}_{k-1}{H}^{T}+R{)}^{-1}\). \(\:{P}_{k-1}\) is the state covariance matrix, and R is the measured noise covariance matrix.To improve the performance of the fusion, this work includes a Hybrid Deep Learning Model, which integrates convolution networks and graph neural networks (GNN). The LiDAR points consist of a set of 3D points that are defined as \(\:({x}_{i},{y}_{i},{z}_{i})\) and the unstructured data is processed by constructing the graph \(\:G=(V,E)\) with a set of LiDAR points \(\:V\) and edges \(\:\left(E\right)\). The \(\:V\:and\:E\) has the connections that are connected with the help of the adjacency matrix \(\:\left(A\right),\) which is represented in Eq. (14)

$$\:{A}_{ij}=\{exp(-\frac{\parallel\:{x}_{i}-{x}_{j}{\parallel\:}^{2}}{2{\sigma\:}^{2}}),\:if\:\parallel\:{x}_{i}-{x}_{j}\parallel\:<{d}_{thresh}\:0,\:otherwise\:$$
(14)

In Eq. (14), the two LiDAR points are represented as \(\:{x}_{i},{x}_{j}\), the scaling parameter is \(\:\left(\sigma\:\right)\) and \(\:{d}_{thresh}\) is denoted as a predefined distance value, which is limited to distant points. The spectral graph generalizes the irregular graphs, and the convolution operation is defined as \(\:{F}^{{\prime\:}}=\sigma\:\left({D}^{-1/2}A{D}^{-1/2}FW\right)\). The extracted features \(\:{F}_{Optical}\:and\:{F}_{LiDAR}\) aligned into the common latent space with the transformation matrix (T), which is defined as \(\:{F}_{aligned}=T\cdot\:[{F}_{LiDAR},{F}_{Optical}]\) to perform the feature fusion. The derived features are fused and weighted with the help of \(\:{W}_{1}\:and\:{W}_{2}\) trainable parameters. The fusion is performed via the fully connected layer, and the weighted concatenation is applied for the trainable fusion operation that is defined in Eq. (15)

$$\:{F}_{fusion}=\sigma\:\left(W\right[{F}_{LiDAR}\oplus\:{F}_{Optical}]+b)$$
(15)

In Eq. (15), the fusion weight matrix is defined as \(\:W\), the bias value is \(\:b\), the activation function is \(\:\sigma\:,\) and concatenation is denoted as \(\:\oplus\:\). The final layer identifies the structural fault probability using \(\:\widehat{Y}=Softmax({W}_{out}{F}_{fusion}+{b}_{out}),\) which is obtained from the fused features.

Fault classification and localization

The final stage receives the input from the previous stage that consists of fused features like LiDAR, voltage sensor, and thermal features are represented in Eq. (16). Several multimodal inputs were included into the NN model, including LiDAR point clouds (3D coordinates and intensity values), optical image properties (RGB intensities, gradient edges, and textures), and thermal sensor data (temperature pixels). These inputs were normalized, spatially aligned, and fused using Kalman filtering before being processed by the neural network for classification and fault location.

$$\:{F}_{fused}={\alpha\:}_{1}{F}_{LiDAR}+{\alpha\:}_{2}{F}_{thermal}+{\alpha\:}_{3}{F}_{voltage}$$
(16)

In Eq. (16), feature importance is denoted as \(\:{\alpha\:}_{i}\) which are updated during the training, and the fused features are represented as \(\:{F}_{fused}\in\:{R}^{n\times\:d}\) that helps the hybrid deep learning model to perform the classification. While fault classification is the process of assigning fault classes or situations based on multimodal fused feature representations, fault localization is the process of mapping these faults onto real grid coordinates via GIS transformation. The activation function allows the network to depict complex patterns found in power distribution data by introducing non-linearity into neural computations. The fusion weight matrix, a learnable parameter set, controls the contribution of each modality in the final decision-making layer to guarantee equal representation of LiDAR, optical, and sensor-derived data. These concepts clarify how localization accuracy and classification accuracy are attained and simplify the suggested hybrid deep learning architecture. Then, the working process of fault classification and localization is shown in Fig. 5.

Fig. 5
figure 5

Neural structure for fault classification.

The input is transferred to the hidden lay that uses the linear transformation tailed by a non-linear activation function. A neural network including an input layer, three hidden layers, and an output layer is used in the suggested MDF-HDL model. 512 fused features from LiDAR, optical, and thermal data are processed by the input layer. Tanh and ReLU activations with batch normalization and dropout (0.3) are used by the hidden layers (256, 128, and 64 neurons) to enhance learning and avoid overfitting. Using a Softmax function, the output layer categorizes fault types, including conductor damage, vegetation-induced faults, and normal faults. With an accuracy of 98.91% and an inference time of 12.5 ms, the network provides dependable and effective real-time defect detection after being trained using the Adam optimizer (learning rate 0.001) and cross-entropy loss. The linear transformation observes the complex relationship between the data and the layer function, which is denoted as \(\:{h}_{k}=\sigma\:({W}_{k}{h}_{k-1}+{b}_{k})\); where, \(\:{b}_{k}\in\:{R}^{{d}_{k}},{W}_{k}\in\:{R}^{{d}_{k}\times\:{d}_{k-1}}\) and activation function (\(\:\sigma\:\)). The hidden layer output is nourished into the output layer that predicts the output using the fault class probability value that is estimated as \(\:P(y\mid\:{F}_{fused})=\frac{exp({W}_{o}{h}_{L}+{b}_{o})}{{\sum\:}_{j=1}^{C}exp({W}_{o}{h}_{L}+{b}_{o})}\). The computed \(\:P(y\mid\:{F}_{fused})\) value helps to improve the classification accuracy by using cross-entropy-based loss function training. The training process identifies the deviation between the outputs \(\:L=-{\sum\:}_{i=1}^{N}{y}_{i}logP({y}_{i}\mid\:{F}_{fused})\). The deviations are reduced by updating the network parameters using the Adam optimizer, which is defined as Eq. (17)

$$\:{m}_{t}={\beta\:}_{1}{m}_{t-1}+\left(1-{\beta\:}_{1}\right)\nabla\:L\:{v}_{t}={\beta\:}_{2}{v}_{t-1}+(1-{\beta\:}_{2})(\nabla\:L{)}^{2}\:{\theta\:}_{t}={\theta\:}_{t-1}-\frac{\eta\:}{\sqrt{{v}_{t}}+\varepsilon}{m}_{t}\:\}$$
(17)

In Eq. (17), momentum gradient estimation is denoted as \(\:{m}_{t}\:and\:{v}_{t}\), the learning rate is \(\:\eta\:\), and the decay factor is represented as \(\:{\beta\:}_{1},{\beta\:}_{2}\) and \(\varepsilon\) eliminates the division by zero.Then, the refinement process is defined as \(\:{y}^{*}=D\left(P\right(y\mid\:{F}_{fused}\left)\right)\); \(\:{y}^{*}\) is final refined fault abel, decision tree refinement function is \(\:D(\cdot\:)\), probabilistic classification result is denoted as \(\:P(y\mid\:{F}_{fused})\) and multimodal fused vector is \(\:{F}_{fused}\). According to the discussions, the pseudocode for fault detection and localization is shown in Table 1.

Table 1 Fault detection and localization using a hybrid deep learning model.

In this step, geospatial transformation is performed, where the spatial grid representation of the fault probability and its contours drawn in the computer-aided design tool is converted into real geospatial coordinates, allowing accurate fault localization on the power grid.

Results and discussions

This section discusses the efficiency of the Multimodal Data Fusion and a Hybrid Deep Learning Model (MDF-HDL) based fault classification and localization in power distribution networks. This work uses the ArcGIS Power Line Classification Project33 and Awesome 3D LiDAR Datasets34 to create multimodal data fusion systems. These geographical, annotated point cloud data of power line environments can be obtained through the use of the ArcGIS Dataset product. It gives the real-world environment (wires, poles, background) that you require as a baseline for the classification jobs that you are trying to complete.

Dataset description

The ArcGIS Power Line Classification Project and the Awesome 3D LiDAR Datasets are the two main datasets used in this investigation. There are over 3 million annotated point cloud samples in the ArcGIS collection that depict actual power line settings, complete with cables, poles, and background objects. For applications requiring the location and categorization of faults in power distribution networks, this data offers a realistic geographic baseline.

The Awesome 3D LiDAR Datasets were chosen based on their applicability for urban odometry, localization, and segmentation applications, taking into account experimental parameters like scale, objectives, and sensor type. There are about 2 million high-resolution 3D point clouds in the collection. Both datasets were divided into 25-meter spatial blocks with a maximum of 8,192 points each during data processing. The application of normalization and noise filtering techniques enhanced the quality of the data.

The dataset was split into 80% training, 10% validation, and 10% testing subsets both geographically and temporally to guarantee reliable model assessment and avoid overfitting. Through stratified sampling, our partitioning technique ensured that each fault category was proportionately represented in each subgroup while maintaining constant class representation across all splits. Furthermore, to reduce imbalance and enhance model generalization, data augmentation methods were used for underrepresented fault classes, including random rotations, horizontal and vertical mirroring, and controlled noise injection. To make sure minority classes made a suitable contribution to the optimization process, a weighted categorical cross-entropy loss function was also used during training. Together, these actions successfully reduced class disparity and improved the model’s dependability and fairness in problem identification across a range of operating conditions. Methods such as stratified sampling are utilized in an effort to alleviate class imbalance. Following the application of CNN layers for image data and 3D convolutional networks or RandLA-Net for LiDAR data, Kalman filtering was utilized for the purpose of achieving robust multimodal fusion that was achieved by feature extraction. When it came to classification, Adam-optimized decision trees with early halting (patience of 8) were utilized in order to avoid excess fitting. This comprehensive description of the dataset and the preprocessing workflow contribute to the enhancement of the reproducibility and reliability of the fault detection results.

Experimental setup

Multimodal data fusion and reproducibility are ensured by the experimental setup. The study uses the ArcGIS Power Line Classification Project and Awesome 3D LiDAR Datasets from ArcGIS Hub and GitHub. ArcGIS has 3 million point cloud points, while LiDAR has 2 million. High-resolution 3D LiDAR point clouds and GIS power line photos are included. Data are preprocessed by dividing them into 25-meter blocks with 8,192 points per block, noise filtering, and normalization. Mode-specific feature extraction uses CNN layers for images and 3D convolutional networks or RandLA-Net for LiDAR. The collected characteristics are incorporated using Kalman filtering for robust multimodal fusion. Decision tree models optimized with the Adam optimizer (learning rate = 0.001, batch size = 64, 50 epochs) are used for classification, while GIS mapping with 1 m spatial accuracy is used for fault location. The model achieves 98.9% accuracy, 98.7% precision, 98.3% recall, 98.5% F1-score, According to experimental results, the model ensures real-time performance at over 80 frames per second by achieving an average end-to-end latency of 12.5 ms per sample on a high-performance workstation (NVIDIA RTX 3090 GPU, Intel i9 CPU). Latency rises to 20–35 ms on mid-range GPUs like the RTX 3060, which still satisfies real-time needs. Near-real-time requirements for field applications are met by efficient deployment using TensorRT and FP16 quantization, which keeps latency below 100 ms even on tiny GPUs. In order to verify and maintain real-time operation under various hardware restrictions, optimization techniques (model pruning, quantization, and asynchronous execution) have been implemented, along with a thorough latency breakdown across preprocessing, feature extraction, fusion, and inference phases. The implementation environment runs Python 3.9, TensorFlow 2.x, NumPy, and ArcGIS API on an NVIDIA RTX 3090 GPU, Intel i9-10900 K CPU, and 64 GB RAM. The dataset is split into 80% training, 10% validation, and 10% testing sets, with the Adam optimizer used for optimization and early halting (patience = 8 epochs) to prevent overfitting.

Collecting and synchronizing disparate datasets like ArcGIS power line photos and Awesome 3D LiDAR point clouds ensures spatial alignment for consistent feature mapping in multimodal data fusion. Data quality is improved via noise filtering, normalization, and temporal-spatial alignment for each modality. Convolutional neural networks (CNN) for image data and 3D convolutional networks for LiDAR data extract rich, modality-specific characteristics. Kalman filtering iteratively merges complementing information from each source to improve resilience and accuracy. For accurate fault classification and GIS mapping fault localization, Adam-optimized decision trees are fed the fused multimodal feature representation. Power distribution network fault detection is accurate and real-time with this systematic fusion approach, which leverages the complementary strengths of different modalities and addresses data heterogeneity.

Figure 6 shows a complete data analysis after feature extraction and multimodal fusion. In the comparison histogram (Fig. 6a), the fused feature distribution reduces variance while retaining the important bimodal features of LiDAR and optical modalities, suggesting effective complementary information integration, specifically shows the comparative feature distribution before and after fusion by plotting Feature Value (x-axis) against Frequency (y-axis). Feature value sorting (Fig. 6b) shows that fused features smooth and uniformly represent samples, bridging their statistical qualities and the distribution homogeneity attained by fusion by plotting Feature Index (x-axis) against Variance (y-axis). The spectrum energy plot (Fig. 6c) indicates a concentration of energy at lower frequencies, validating the conclusion that the fusion process suppressed high-frequency noise. Plotting frequency (Hz) against spectral energy (dB) in subfigure (c) illustrates how high-frequency noise is suppressed. PCA results (Fig. 6d) show that the first two components explain virtually all of the variance, demonstrating that most relevant information is recorded post-fusion and Principal Component (x-axis) vs. Explained Variance (%) to emphasize dimensionality reduction efficiency. Fused features have the greatest significance ratings (Fig. 6e), surpassing separate modalities and showing their vital influence in classification accuracy. In order to determine which qualities are more important for categorization, Subfigure (e) displays Feature Rank (x-axis) against Importance Score (0–1 scale). Finally, t-SNE clustering (Fig. 6f) shows well-separated clusters, proving that the fused feature space improves inter-class separability and classification over unimodal representations. The inter-class separability attained following multimodal fusion is demonstrated in subfigure (f), which compares t-SNE Dimension 1 to Dimension 2.

Fig. 6
figure 6

Efficiency analysis of feature extraction and fusion (a) Feature distribution before/after fusion. (b) Feature variance distribution. (c) Spectral energy of fused features. (d) PCA explained variance. (e) PCA explained variance. (f) t-SNE feature cluster analysis.

The fault classification system based on deep learning approaches has surpassed expectations by achieving an accuracy of 98.91% with a 3-layer DNN architecture, which is remarkably close to the proposed ceiling of 99.4% displayed in Fig. 7 (a). Accuracy (%) vs. Epoch is shown in Subfigure (a), which shows how the model’s performance improves throughout training iterations. From the analysis of training convergence (Fig. 7b), it is clear that optimization is achieved without considerable overfitting, as demonstrated by the training loss and validation loss curves of the model, The convergence pattern between the training and validation sets is highlighted in Subfigure (b), which displays Loss Value versus Epoch. The neural network (NN) training state graphic is included, which displays the evolution of the gradient magnitude, validation checks, and learning rate scalar (µ). This figure validates effective validation during model training, stable convergence, and appropriate learning rate adaptation. The discriminative power of the model is illustrated by Subfigure (c), which shows the False Positive Rate (x-axis) vs. True Positive Rate (y-axis) via the ROC curve, which were close to each other is well demonstrated by the ROC curve (AUC = 0.992, Fig. 7c) and precision-recall characteristics (Fig. 7d), affirming that the model maintained good discriminative ability across all operational thresholds.

Fig. 7
figure 7

Efficiency analysis of deep learning-based fault classification. (a) Accuracy (b) loss convergence (c) ROC curve (Fault Class B) (d) Precision-recall curve (e) Confusion matrix (f) Classification confidence distribution.

Data from the confusion matrix (Fig. 7e) showcases an outstanding 485 out of 500 sample classification, resulting in only five errors classifying the two subtypes of faults B and C softmax distributions in Fig. 7f indicate the overwhelming dominance of high-confidence predictions, which showed that out of all samples, 89% were classified with over 95% certainty affirming that the system can be reliably deployed in industrial settings. Classification Confidence (%) vs. Sample Count is shown in Subfigure (f), which shows the distribution of reliability among samples. To ensure visual consistency and analytical clarity, all probability and rate metrics are consistently expressed as percentages (%), and legends in each subfigure clearly indicate training, validation, and testing outcomes.

Refining the decision tree shows notable increases in performance, enhancing accuracy by 60–80% across all fault types (Fig. 8a).It highlights mistake reduction across categories by plotting Fault Type (x-axis) against Misclassification Rate (%). Maximum accuracy is attained at depth 5 (Fig. 8b), Tree Depth (levels) vs. Accuracy (%) is shown to determine the ideal model complexity, where the probability distribution of classification confidence exhibits a steep increase (Fig. 8c) the degree of categorization certainty by plotting Confidence Probability (0–1 scale) against Fault Class. Gini impurity analysis reveals a strong negative correlation with decision confidence (Fig. 8d), and the ranking of feature importance identifies voltage and current measurements as the most relevant ones (Fig. 8e) the relative importance of input attributes by plotting the Feature Index against Importance Value (normalized). In terms of accuracy, the average absolute improvement of 5–8% across all fault types after refinement is the post-comparison accuracy (Fig. 8f), which confirms that the optimization process worked effectively. After refinement, the model has an accuracy between 93 and 98% and still has a clear structure due to the decision tree used.

Fig. 8
figure 8

Refinement efficiency analysis. (a) Mis-classification rate (b) tree depth vs. accuracy (c) classification confidences (d) Gini impurity vs. confidence (e) feature importance (f) accuracy comparison.

In addition, the efficiency of the Multimodal Data Fusion and a Hybrid Deep Learning Model (MDF-HDL) is compared with the existing approaches such as capsule network (CapsNet)15, Neural Architecture Search (NAS)17, Spatial-temporal recurrence neural network (STRGNN)35 and 1D convolutional neural networks (CNN)36 and the obtained results are shown in Table 2.

Table 2 Comparative analysis of MDF-HDL.

Table 2 shows that the MDF-HDL framework outperforms previous techniques in all important performance criteria. MDF-HDL outperforms CapsNet (95.24%), NAS (96.27%), STRGNN (94.35%), and 1D-CNN (92.98%) in fault classification and localization at 98.91%. Beyond accuracy, MDF-HDL has higher precision (0.9837) and recall (0.9931), resulting in an F1-score of 0.9893, indicating better false positive and negative balance. This performance advantage shows its ability to find flaws under difficult settings. MDF-HDL also reduces training time (65.1 min) compared to NAS (210.31 min) and CapsNet (120.3 min) while retaining a competitive inference performance of 12.5 ms for real-time applications. In terms of data efficiency, MDF-HDL achieves 86% accuracy with 10k samples, surpassing benchmark models that vary from 64 to 77% under the same data limitations. These results demonstrate that the proposed framework is practical for real-world multimodal fault diagnosis and localization tasks due to its state-of-the-art accuracy, faster training convergence, lower computational cost, and efficient use of limited training data.

The claim of maintaining low computational complexity even when integrating several techniques, including k-means clustering, deep learning layers, Kalman filtering, and decision trees, has been validated by a thorough computational complexity analysis. The analysis breaks down each component’s time and space complexity: Generally, k-means clustering works in \(\:\mathbf{O}(\mathbf{n}\:\times\:\:\mathbf{k}\:\times\:\:\mathbf{t}\:\times\:\:\mathbf{d}),\) where n is the number of samples, k the clusters, t iterations, and \(\:d\) the dimensionality; deep learning layers are optimized to control complexity by balancing model depth and parameter count; Kalman filtering incurs linear complexity \(\:O\left({m}^{2}\right)\) related to fused feature dimensions; and, lastly, decision trees optimized via Adam maintain effective classification with complexity approximately \(\:O\left(log\:n\right)\). The low computational cost claim was supported by experimental runtime profiling on a realistic hardware setup (NVIDIA RTX 3090 GPU, Intel i9-10900 K CPU), which verified an inference time of roughly 12.5 ms per sample. These outcomes demonstrate the system’s suitability for detecting realistic, real-time problems in complex power distribution networks. The supplemental resources offer full computational details and profiling.

Statistical results

The benchmark datasets underwent additional tests, which were conducted using the 5-fold cross-validation method. To accurately portray the consistency of the model, performance metrics such as accuracy, precision, recall, and F1-score are now provided as the mean ± standard deviation across all folds. As an illustration, the MDF-HDL model achieved an accuracy of 98.91% ± 0.23%, precision of 98.7% ± 0.27%, recall of 98.3% ± 0.30%, and F1-score of 98.5% ± 0.25%. Additionally, sensitivity assessments were conducted on critical hyperparameters, including the learning rate and batch size, to validate the model’s stable performance within practical parameter ranges. To address concerns regarding unpredictability and reinforce trust in the model’s application for fault detection in power distribution networks, these additional statistical evaluations provide a rigorous examination of the model’s reliability. To further strengthen the statistical validity and interpretability of the performance evaluation, 95% confidence intervals and error bars were incorporated into all reported performance metrics. Specifically, for each of the five cross-validation folds, the mean and standard deviation were computed for accuracy, precision, recall, and F1-score, and the corresponding error bars were plotted to visually represent the variability across folds. The MDF-HDL model achieved an average accuracy of 98.91% ± 0.23% (95% CI: [98.68%, 99.14%]), precision of 98.70% ± 0.27% (95% CI: [98.43%, 98.97%]), recall of 98.30% ± 0.30% (95% CI: [98.00%, 98.60%]), and F1-score of 98.50% ± 0.25% (95% CI: [98.25%, 98.75%]). These confidence intervals and error bars, now reflected in Figs. 7(a) and 8(f), provide a clearer depiction of statistical variability and confirm the robustness and consistency of the proposed model across multiple validation folds. This refinement ensures a transparent representation of performance stability and reinforces the reliability of the MDF-HDL framework for real-world fault detection and localization tasks. The MDF-HDL and baseline models were compared using paired t-tests and Wilcoxon signed-rank tests to further confirm model randomness and statistical reliability. Significant performance gains were seen in the results (p < 0.01 and p < 0.05), indicating that the gains are statistically significant and not the result of chance.

According to experimental results, the model ensures real-time performance at over 80 frames per second by achieving an average end-to-end latency of 12.5 ms per sample on a high-performance workstation (NVIDIA RTX 3090 GPU, Intel i9 CPU). Latency rises to 20–35 ms on mid-range GPUs like the RTX 3060, which still satisfies real-time needs. Near-real-time requirements for field applications are met by efficient deployment using TensorRT and FP16 quantization, which keeps latency below 100 ms even on tiny GPUs. In order to verify and maintain real-time operation under various hardware restrictions, optimization techniques (model pruning, quantization, and asynchronous execution) have been implemented, along with a thorough latency breakdown across preprocessing, feature extraction, fusion, and inference phases.

To guarantee dependability, an experimental uncertainty analysis was conducted using many trials and five-fold cross-validation. To demonstrate the openness and robustness of the results provided, performance metrics are displayed as mean ± standard deviation (e.g., accuracy: 98.91% ± 0.23%), which indicates the experimental uncertainty resulting from data and model variances.

Ablation study

Model performance improved significantly in the Kalman filtering ablation study on feature fusion. Without Kalman filtering, the MDF-HDL model achieved 96.23% ± 0.45% accuracy and 95.89% ± 0.47% F1-score. Using Kalman filtering for multimodal feature integration significantly improved accuracy to 98.91% ± 0.23% and F1-score to 98.50% ± 0.25%. Kalman filtering is essential for merging complementary modality features, which improves defect detection precision and reliability.

Compared to CNN-only feature extraction, Graph Neural Networks (GNN) improved performance. The CNN-only model had an accuracy of 97.12% ± 0.38% and an F1-score of 96.85% ± 0.40%, whereas the GNN-enhanced model had 98.18% ± 0.29% and 97.75% ± 0.31%. This suggests that GNN’s capacity to capture relational and structural information between features improves power grid fault detection categorization.

Further investigation showed that decision tree refinement corrected misclassifications from raw deep neural network (DNN) output. The raw DNN predictions have an accuracy of 97.45% ± 0.42% and an F1-score of 97.10% ± 0.44%. After decision tree post-processing, the accuracy and F1-score improved to 98.91% ± 0.23% and 98.50% ± 0.25%, respectively, highlighting the impact of the refinement on classification robustness and error reduction. These results confirm that the MDF-HDL model’s integrated approach maximizes fault classification and localization.

Limitations

Critical limitations are acknowledged in this study. The evaluation uses only publicly available benchmark datasets, not real-world or field-collected data. Sensor noise, fault characteristics, and data gaps can affect model robustness and accuracy in power grid failure detection. Second, high-quality multimodal data, especially LiDAR and optical pictures, are difficult to acquire and expensive, limiting implementation. Obtaining high-quality LiDAR and optical data for the identification of power distribution problems presents both technological and economical challenges. Centimeter-level accuracy requires expensive sensors, aerial platforms, and precise calibration; prices rise with larger grid areas and more frequent scans. Environmental variables like as changes in cloud cover, vegetation, and light can further reduce the reliability of data and need extensive preprocessing and alignment. Moreover, synchronization between multimodal sources necessitates specialized equipment and skilled personnel, and massive data volumes demand strong servers, storage, and GPUs. When combined, these elements lead to scale problems, lengthy implementation times, and high running expenses. Third, while the MDF-HDL model has low computational complexity, integrating deep learning layers, Kalman filtering, and decision tree refinement may limit scalability and real-time operation in large-scale or resource-limited grid environments. Finally, dataset bias and class distribution constraints may limit the model’s applicability to varied geographic and operational contexts. To improve practicality, future research will validate the model on operational grid data, optimize computing efficiency, and address dataset diversity.

Furthermore, because of their labeling procedures and data collection settings, the ArcGIS Power Line Classification Project and the Awesome 3D LiDAR Datasets may introduce intrinsic biases while being comprehensive and well-annotated. Because these datasets mostly cover certain geographic regions under controlled imaging conditions, they could not adequately capture the unpredictability of real-world power distribution networks due to changing weather, terrain, and sensor calibration settings. These dataset-specific biases might make the model overoptimized for benchmark conditions when applied to unfamiliar situations, thereby limiting its usefulness. To mitigate these effects and ensure a balanced class distribution and broader representation, the study employed stratified sampling, cross-validation, and multimodal data normalization. Further validation using a variety of field-collected datasets is required to confirm the robustness and adaptability of the proposed MDF-HDL architecture in operational grid settings.

Conclusion

The MDF-HDL system improves power distribution network fault detection and localization accuracy and efficiency by combining multimodal LiDAR and optical data fusion, deep hierarchical learning, decision tree refinement, and GIS mapping. This study enhances fault classification accuracy, reduces false alarms, and accelerates inference for real-time applicability. Deep neural network-based feature extraction and classification with decision tree post-processing outperformed CapsNet, NAS, STRGNN, and 1D-CNN on benchmark datasets with 98.91% accuracy. The model had 86% data efficiency at 10k samples, low computational cost (4.2 M parameters), and 12.5ms inference time, demonstrating potential scalability for real-world deployment. MDF-HDL also performed well in noisy environments and across various settings. High-quality LiDAR and optical data are difficult to use in low-visibility situations and increase computational overhead. Self-supervised learning will enhance flexibility, processing speed, and real-world validation, ensuring generalizability beyond synthetic datasets. This study shows promise for effective, scalable fault management in emerging smart grid systems, but more empirical evaluation is needed.