Introduction

Transmission lines are the main carrier of power transmission, and their safety and stability are directly related to the reliability of power supply. However, due to the wide distribution of transmission lines and the complex natural environment, foreign objects such as balloons, kites, plastics, and bird nests are easily attached to the lines. If these foreign objects are not discovered and cleaned in time, they will affect the normal operation of the transmission lines and even cause power outages or safety accidents1,2. The traditional manual inspection is not only high cost and low efficiency, but also easy to be affected by the environment and climate, and the rate of missing is high. Therefore, in order to improve the operation efficiency and safety of transmission lines, intelligent inspection technology has become an important means, among which image recognition technology is widely used in foreign body detection, line damage identification and other tasks3. However, in practical applications, the foreign object detection images of transmission lines are often affected by a variety of unfavorable factors, such as rain, fog, and wind disturbances, which will reduce the image quality and thus affect the accuracy of foreign object detection. Therefore, it is of great significance to improve the accuracy and reliability of foreign body detection by enhancing the image processing to improve the image quality4,5,6.

At present, image enhancement methods are divided into traditional image processing methods and deep learning-based methods. The traditional image deraining and defogging technology usually rely on physical model, they analyze the impact of raindrops or fog on the image, and use inversion algorithms to restore clear images7,8,9. The deblurring technology often use motion compensation methods to reduce image blur caused by camera shake10,11. Although these traditional methods can achieve good results when dealing with specific problems, they often need to rely on artificially designed features and rules, lack enough flexibility, and are difficult to meet the requirements of efficiency and accuracy. With the rapid development of deep learning technology, image enhancement methods based on deep neural networks have gradually become a hot topic of research12. Deep learning methods can handle more complex image enhancement tasks without manually designing features by automatically learning complex patterns and features in images. Deep learning methods are highly automated and adaptable, and can simultaneously consider the influence of various interference factors on the image, such as rain, fog, wind disturbance, etc. Through the learning of multi-layer features, it fully considers the mutual influence of these factors, so as to comprehensively optimize the image and restore clearer and more detailed images13,14,15. However, the current deep learning methods are not robust, and the generated images are prone to the problem of unclear details.

In view of the above problems, this paper analyzes the existing image enhancement methods and combines the characteristics of power line images to maximize the removal of rain, fog and blur from a single input image while retaining the image detail information, a texture guided transmission line image enhancement (TGTLIE) method is proposed. The novelty of the TGTLIE method lies in the design of texture inference network (TINet) and texture-based conditional generative adversarial network (TCGAN). First, TINet is used to learn the texture information guidance map of the transmission line image. On this basis, TCGAN is used to guide the removal of rain, fog and blur in the image. In addition, in the generative network of GAN, the neural gradient algorithm is designed to guides the GAN network to generate finer image details by calculating the gradient information of each pixel in the image and the dual path attention mechanism is designed to improve the accuracy of image generation by focusing on important areas and features in the image. At the same time, the discriminant network combines global and local feature discrimination to ensure that the generated image is more realistic in both overall and local details. Finally, the multi-stage joint loss function is used to optimize the training process, which can not only improve the accuracy of image generation, but also speed up the training speed and improve the training efficiency. In summary, the main contributions of this paper are as follows:

  1. (1)

    TINet is designed to mine texture information from the input transmission line images, and then a powerful TCGAN is built to achieve adaptive image deraining, defogging and deblurring.

  2. (2)

    The neural gradient algorithm is proposed to makes the network more robust in feature extraction and more stable and adaptive in image detail extraction.

  3. (3)

    The dual path attention mechanism is proposed to enable the network to focus on important details and structures in the image at different spatial levels, thereby improving the sharpness of the enhanced image.

  4. (4)

    Design global and local feature discrimination network to help the generator optimize the global structure and local details at the same time, so as to improve the image enhancement effect and retain the image details, making the result more natural and realistic.

  5. (5)

    Design a multi-stage loss function to optimize the training process, including Charbonnier loss, SSIM loss and global-local generative adversarial loss, which helps to gradually improve the image detail recovery ability, thereby achieving higher quality image enhancement effect.

Related work

Image deraining

Li et al. first used a Gaussian mixture model to approximate the priors of the background image and the rain streak layer, and then removed the rain streak based on the maximum a posteriori probability (MAP)16. Kim et al. first detected the rain streak region, and then used non-local mean filtering to remove the rain streaks17. Chen et al. first used a guided image filter to decompose the input rain pattern into low and high frequency parts, then used a set of mixed handcrafted features to separate rain streaks from the high-frequency part18. Tao et al. proposed a hybrid feature fu-sion network (MFFDNet) for for single image deraining. The network cleverly utilizes the advantages of CNN and Transformer networks to extract image features and fuse global and local features to produce more discriminative feature representations, which greatly improves the rain removal performance of the model19. Wang et al. proposed an effective pyramid feature decoupling network (PFDN), which can simul-taneously model rain-related features to remove rain streaks, and model rain-independent features to restore contextual detail information from multiple an-gles20. Yan et al. proposed a joint high-frequency channel and Fourier frequency domain guided network (FPGNet), which uses a Fourier-based multi-scale feature ex-traction module to learn the rain layer information in the rain map, and simultane-ously learns the natural distribution of rain streaks from the high-frequency channel to provide the model with rain streak prior information21.

Image defogging

Li et al. divided the image into small areas based on the histogram equalization method and optimized the image by local parameters to achieve cable tunnel image defogging22. Li et al. proposed a method combining multi-scale Retinex and wavelet transform to achieve effective image defogging by processing the I component of HIS space and applying logarithmic transformation23. Xin et al. combined the finite con-trast adaptive histogram equalization method with dark channel, bright channel and other methods to remove fog in transmission line images24. Zhang et al. proposed a CycleGAN image defogging method based on the residual attention mechanism. In the residual network, channel and spatial attention mechanism were used to form the at-tention residual fast, which was added to the two generation networks of CycleGAN, and the cycle-perceptual consistency loss is combined to achieve image dehazing25. Wang et al. proposed a defogging model based on the separation of features and col-laborative networks, it used neural networks to extract spatial information and de-tailed features of different depths, so that the restored images had natural colors and good details26. Zhou et al. proposed a transmission line inspection images defogging method Diff-EaT, which combines the diffusion model structure of transformer and a hybrid scale gated feedforward network to achieve image defogging27.

Image deblurring

Tan et al. incorporated non-local statistical priors and exploiting the robustness of wavelet tight frames to effectively solved the ill-posed problem in image deblur-ring28. Liang et al. realized the deblurring of power images by generating adversarial networks29. Liu et al. proposed a network based on channel and spatial attention mechanisms, and combined L 2 loss and edge loss as a comprehensive loss function for insulator image deblurring30. Wang et al. proposed an insulator target detection method based on WGAN image deblurring. By introducing residual network module and Wasserstein distance in the training process of WAGN, higher quality insulator images are generated and the target detection rate of fuzzy insulator images is im-proved31. Wang et al. proposed a Transformer based architecture Uformer for image deblurring, which uses non-overlapping windowing based self-attention to reduce the computational load, and uses deep convolution in the forward network to further im-prove its ability to capture local context, while optimizing the skip connection mecha-nism to effectively transfer information from the encoder to the decoder32.

Network structure

The main purpose of the transmission line image enhancement method proposed in this paper is to use the image texture information to guide the network to perform adaptive deraining, defogging and deblurring, so that the enhanced transmission line image looks more natural and real. Therefore, a new transmission line image en-hancement network TGTLIE is proposed, which consists of two sequential subnet-works TINet and TCGAN. Its network structure is shown in Fig. 1.

Fig. 1
figure 1

Network structure diagram.

Texture inference network (TINet)

The texture inference network (TINet) plays an important role in TGTLIE, which extracts rich texture features from input transmission line images to guide TCGAN for image enhancement. As shown in Fig. 1, TINet consists of an encoder and decoder. The encoder inputs the transmission line image \(\:{O}_{s}\), and then extracts the feature through a series of convolutional layers, deconvolution layers and multi-level feature attention layers to learn the high-level context information in the image and efficiently extract the texture information in the image. The texture analysis network can better understand the structural information and detail changes in the image. Finally, the decoder outputs the texture feature map \(\:{R}_{s1}\), which is used to guide processing of subsequent networks and guide the network to restore the clear texture structure in the image.

The encoder in TINet is consists of 4 convolution blocks and 4 multi-level feature attention layers. Each convolution block contains a 3 × 3 convolution layer with step 2 and a 3 × 3 convolution layer with step 1. Each convolutional layer is followed by a BatchNorm layer and a Silu activation function layer. The multi-level feature attention layer is introduced in 3.3. The decoder contains 4 deconvolution blocks and 4 multi-level feature attention layers.

In order to better mine the texture information of the input image, a multi-scale Charbonnier Loss is introduced into the decoder to measure the distance between the texture feature map and the true value of the output map at different scales. As shown below:

$$\:\begin{array}{c}{L}_{mc}={\sum\:}_{i=1}^{N}\sqrt{\left({\varepsilon}^{2}+{\left({R}_{si}-\:{M}_{si}\right)}^{2}\right)}\end{array}$$
(1)

Where \(\:{R}_{si}\) represents the \(\:i\)-th output feature map extracted by the decoder, \(\:{M}_{si}\) represents the true feature map with the same scale as \(\:{R}_{si}\), \(\:N\) represents the number of multiple scales, and \(\:\varepsilon\) is a small positive number, which is usually used to avoid the denominator being zero.

Texture-based conditional generative adversarial network (TCGAN)

The texture-based conditional generative adversarial network (TCGAN) uses the texture feature map generated by TINet to guide the network to perform adaptive image deraining, defogging and deblurring, accurately identify and process the details information in the image, and generate a more natural and real transmission line image. TCGAN consists of a generator and a global-local discriminator. The generator input transmission line image \(\:{O}_{s}\) and TINet output texture map \(\:{R}_{s1}\), and then through a series of convolutional layers, neural gradient algorithm layers, dual path attention layers and deconvolution layers, and finally generates high-quality images \(\:{P}_{s}\) using multi-scale context information through encod-decoder architecture.

The encoder in the generator is consists of 5 convolution blocks and 5 neural gradient algorithm layers. Each convolution block contains a 3 × 3 depthwise separable convolution layer with step 2 and a 3 × 3 convolution layer with step 1. Each convolutional layer is followed by BatchNorm layer and Relu activation function layer. The neural gradient algorithm layer is introduced in 3.4. The decoder has a similar structure, contains 5 deconvolution blocks, 5 dual path attention layers, and 5 neural gradient algorithm layers. Each deconvolution block consists of a deconvolution layer, a Concat layer and a 1 × 1 convolution layer. The decoder first upsamples the image by deconvolution, then fuses it with the low-level features through the Concat layer, and finally adjusts the channels through the 1 × 1 convolution layer. The dual path attention layer is introduced in 3.5.

In the past, researchers used the GAN architecture for image enhancement and only used the global image discriminator to distinguish whether the images generated by the generator were real or fake. However, this is not enough to obtain a good generator, because if some areas of the generated image contain artifacts, the single global discriminator cannot effectively prevent these local artifacts. Therefore, we introduced a local discriminator based on the global discriminator to form a global-local discriminator structure. The global discriminator is used to focus on the overall visual effect of the image, and the local discriminator is used to focus on the local details of the image. Through the joint training of the local and global discriminator, the generator is prompted to improve the image generation effect.

The global image discriminator takes the generated image \(\:{P}_{s}\) and the real image \(\:{I}_{s}\) as inputs, and contains 7 convolution blocks and 1 fully connected layer. Each convolution block contains two 3 × 3 convolution layers, and each convolution layer contains a BatchNorm layer and a Relu activation function layer. The local image discriminator takes the random clipping blocks of the generated image \(\:{P}_{s}\) and the real image \(\:{I}_{s}\) as input. Because the input image size is small, it only contains 4 convolution blocks and 1 fully connected layer.

In order to generate more realistic and clear images, we use global-local generation adversarial Loss and multi-scale Charbonnier Loss and SSIM loss to train the network. The global-local generation adversarial loss is:

$$\:\begin{array}{c}{L}_{{D}_{local}}\:={\sum\:}_{i=1}^{K}\{-{E}_{{I}_{s}\sim{P}_{real}}\:[log\:{D}_{local}\left({I}_{si}\right)]\:-{E}_{{P}_{s}\sim{P}_{fake}}\:[log(\:1-{D}_{local}({P}_{si}\left)\right)\left]\right\}\end{array}$$
(2)
$$\:\begin{array}{c}{L}_{{D}_{global}}\:=-{E}_{{I}_{s}\sim{P}_{real}}\:\left[log\:{D}_{global}\right({I}_{s}\left)\right]\:-{E}_{{P}_{s}\sim{P}_{fake}}\:\left[log\right(\:1-{D}_{global}\left({P}_{s}\right)\end{array}$$
(3)
$$\:\begin{array}{c}{L}_{G}\:={\sum\:}_{i=1}^{K}\{-{E}_{{P}_{s}\sim{P}_{real}}\:[log\:{D}_{local}\left({P}_{si}\right)\left]\right\}\:\:+\:\{-{E}_{{P}_{s}\sim{P}_{real}}[log\:{D}_{global}\left({P}_{s}\right)\left]\right\}\:\end{array}$$
(4)

Where \(\:{I}_{s}\) represents the real image, \(\:{I}_{s}\) represents the image generated by the generator, \(\:{I}_{si}\) represents the \(\:i\)-th random patch of the real image, \(\:\:{P}_{si}\) represents the \(\:i\)-th random patch of the generated image, and K represents the number of random blocks.

Multi-scale Charbonnier Loss and SSIM loss is:

$$\:\begin{array}{c}{L}_{mc}={\sum\:}_{i=1}^{N}\sqrt{\left({\varepsilon}^{2}+{\left({G}_{si}-\:{I}_{si}\right)}^{2}\right)}\end{array}$$
(5)
$$\:\begin{array}{c}l\left({G}_{si},\:{I}_{si}\right)=\:\frac{2\bullet\:\:{\mu\:}_{{G}_{si}}\:\bullet\:\:{\mu\:}_{{I}_{si}}+\:{c}_{1}}{{{\mu\:}_{{G}_{si}}}^{2}+\:{{\mu\:}_{{I}_{si}}}^{2}+\:{c}_{1}}\\\:c\left({G}_{si},\:{I}_{si}\right)=\:\frac{2\bullet\:\:{\sigma\:}_{{G}_{si}}\:\bullet\:\:{\sigma\:}_{{I}_{si}}+\:{c}_{2}}{{{\sigma\:}_{{G}_{si}}}^{2}+\:{{\sigma\:}_{{I}_{si}}}^{2}+\:{c}_{2}}\\\:s\left({G}_{si},\:{I}_{si}\right)=\:\frac{{\sigma\:}_{{G}_{si},{I}_{si}}\:+\:{c}_{3}}{{\sigma\:}_{{G}_{si}}\bullet\:\:{\sigma\:}_{{I}_{si}}+\:{c}_{3}}\\\:SSIM\left({G}_{si},\:{I}_{si}\right)=\:\:{\sum\:}_{i=1}^{N}l\:\bullet\:c\:\bullet\:s\\\:{L}_{mc}=1-\:\:SSIM\left({G}_{si},\:{I}_{si}\right)\end{array}$$
(6)

Where, \(\:{G}_{si}\) represents the output graph of the \(\:i\)-th scale extracted by the generator, \(\:{I}_{si}\) represents the real graph with the same scale as \(\:{G}_{si}\), \(\:N\) represents the number of multi-scales, \(\:\varepsilon\) is a small positive number, usually used to avoid zero denominator, \(\:\mu\:\) represents the mean, \(\:\sigma\:\) represents the variance, and \(\:{c}_{1}, {c}_{2}, {c}_{3}\) represent constants used for stable calculations.

Multi-level feature attention layer

In order to enhance the model’s ability to capture image details and improve the accuracy of image feature extraction, a multi-level feature attention layer is proposed for different scales and different features extraction, as shown in Fig. 2.

Fig. 2
figure 2

Multi-level feature attention layer diagram.

In the multi-level feature attention layer, different convolution kernels are used to extract features to capture features of different scales and levels. The smaller convolution kernels are more suitable for capturing local features of images, while larger convolution kernels can be used to obtain more extensive contextual information, so enhance the model’s understanding of images by using them to fuse multi-level and multi-scale feature information. In order to reduce the number of parameters, we use dilated convolution instead of traditional convolution, using 3 × 3 convolution with expansion rates of 2 and 3 instead of 5 × 5 and 7 × 7 convolution. After the module feature fusion, SE attention mechanism is used to obtain the global information of the entire feature map through global pooling operation, and then the full connection layer is used to calculate the weight of each channel, dynamically adjust the importance of features at each scale, and adaptive attention is paid to the most important feature information for image texture feature extraction, so as to improve the performance of the model. The multi-level feature attention layer can focus on different scale features of the image, so as to capture details information in the image more precisely and extract image features layer by layer. At the same time, the attention mechanism can focus on changes in local frequency features of the image, effectively retain and enhance details and textures, which helps TCGAN to use texture information to perform image defogging, dehazing and deblurring tasks. The calculation method is as follows:

$$\:\begin{array}{c}{F}_{a\:}={C}_{1}{\left(Concat\right(C}_{1}\left({F}_{i}\right),{C}_{3}\left({F}_{i}\right),{C}_{D2}\left({F}_{i}\right),{C}_{D3}\left({F}_{i}\right)\left)\right)\end{array}$$
(7)
$$\:\begin{array}{c}{F}_{M\:}={{C}_{3}\left(SE\right({Concat(C}_{3}(F}_{a\:}),{F}_{i})\left)\right)\end{array}$$
(8)

Where\(\:{F}_{i}\) represents the input feature,\(\:\:{C}_{i}\) represents the convolution operation,\(\:{C}_{Di}\) represents the dilated convolution operation,\(\:Concat(\bullet\:)\) represents the concatenation operation,\(\:SE(\bullet\:)\) represents the SE attention mechanism operation, and\(\:{F}_{M\:}\) represents the output feature.

Neural gradient algorithm Layer(NGAL)

Let τ represent the number of layers of the neural network, \(\:{m}_{t}\) represent the number of neurons in the \(\:t\) layer, \(\:t\in\:1\),\(\:\:2\), …τ. We said \(\:t\) layer for the output of the \(\:{x}^{\left(t\right)}\)= [\(\:{x}_{1}^{\left(t\right)}\),\(\:\:{x}_{2}^{\left(t\right)}\), …, \(\:{x}_{{m}_{t}}^{\left(t\right)}\)]\(\:{\text{R}}^{{m}_{t}}\), and \(\:{x}^{\left(0\right)}\) is the input of the network.

Suppose the network has \(\:N\) inputs, denoted \(\:{x}^{\left(0\right)}\left(N\right), N\)= \(\:1\),\(\:\:2\), …, \(\:n\), for the \(\:n\)-th input, the \(\:i\)-th output of the \(\:t\)-th layer is as follows:

$$\:\begin{array}{c}{x}_{i}^{\left(t+1\right)}\left(n\right)=\phi\:\left({v}_{i}^{\left(t\right)}\right)\end{array}$$
(9)
$$\:\begin{array}{c}{v}_{i}^{\left(t\right)}=\sum\:_{j=0}^{{m}_{t}}{\theta\:}_{i,j}^{\left(t\right)}{x}_{j}^{\left(t\right)}\left(n\right)+{z}_{i}^{\left(t\right)}\left(n\right)\end{array}$$
(10)

Where \(\:{x}_{\dot{J}}^{\left(t\right)}\left(n\right)\) represents the \(\:j\)-th input of the \(\:n\)-th data in the \(\:t\)-th layer; \(\:{\theta\:}_{i,j}^{\left(t\right)}\) represents the weight of the \(\:i\)-th input in the \(\:t\)-th layer; \(\:{v}_{i}^{\left(t\right)}\) represents the \(\:i\)-th output of the \(\:t\)-th layer; \(\:\phi\:\) represents the activation function, and \(\:{z}_{i}^{\left(t\right)}\left(n\right)\) represents the Gaussian disturbance with gradient added to the \(\:i\)-th neuron of the \(\:t\)-th layer by the \(\:n\)-th data.

The loss function is represented by \(\:L\), and for the \(\:n\)-th data \(\:{x}^{\left(0\right)}\left(n\right)\) labeled as \(\:Y\left(n\right)\), there is a loss value represented by \(\:\text{L}\left({x}^{\left({\uptau\:}\right)}\right(n),Y(n\left)\right)\). In our work, we optimize the size of the noise level \(\:{\sigma\:}_{i}^{\left(t\right)}\) of the central normal random Gaussian perturbation for each neuron, Namely \(\:{z}_{i}^{\left(t\right)}\left(n\right)={\sigma\:}_{i}^{\left(t\right)}{\epsilon\:}_{i}^{\left(t\right)}\left(n\right)\), Where \(\:{\epsilon\:}_{i}^{\left(t\right)}\left(n\right)\) is a standard normal random variable. The residual of the \(\:n\)-th data propagated backward through the neural network at the \(\:i\)-th neuron in the \(\:t\)-th r is defined as:

$$\:\begin{array}{c}{\delta\:}_{i}^{\left(t\right)}\left(n\right)=\left\{\begin{array}{c}{e}_{i}^{\left({\uptau\:}\right)}\left(n\right){\phi\:}^{{\prime\:}}\left({v}_{i}^{\left({\uptau\:}-1\right)}\left(n\right)\right)\:\:\:\:\:\:t=\tau\:\\\:{\phi\:}^{{\prime\:}}\left({v}_{i}^{\left(\text{t}-1\right)}\left(n\right)\right)\left(\sum\:_{j=0}^{{m}_{k}}{\theta\:}_{i,j}^{\left(t\right)}{\delta\:}_{j}^{\left(t+1\right)}\left(n\right)\right)\:\:\:\:\:t<\tau\:\end{array}\right.\end{array}$$
(11)

Where \(\:{e}_{i}^{\left({\uptau\:}\right)}\left(n\right)\) is defined as:

$$\:\begin{array}{c}{e}_{i}^{\left({\uptau\:}\right)}\left(n\right)={\left.\frac{\partial\:L\left(x,Y\left(n\right)\right)}{\partial\:{x}_{i}}\right|}_{x={x}^{\left({\uptau\:}\right)}\left(n\right)}\end{array}$$
(12)

Neural network backpropagation essentially provides information about all parameters \(\:{\theta\:}_{i,j}^{\left(t\right)}(t\:=\:1,\:2, \dots\:{\uptau\:}-1)\), estimate the path random derivative of the loss function \(\:L\). As shown by formula (13),\(\:\:j\in\:\{0,\:1,\dots\:,\:{m}_{t}\:\},i\in\:\{0,\:1,\dots\:\:{m}_{t+1}\}\).

$$\:\begin{array}{c}\frac{\partial\:L\left({x}^{\left({\uptau\:}\right)}\left(n\right),Y\left(n\right)\right)}{\partial\:{\theta\:}_{i,j}^{\left(t\right)}}={\delta\:}_{j}^{\left(t+1\right)}\left(n\right){x}_{j}^{\left(t\right)}\left(n\right)\end{array}$$
(13)

neural gradient algorithm calculation steps are as follows:

  1. a)

    First input training data \(\:P={\left\{\right({x}^{\left(0\right)}\left(n\right), Y\left(n\right)\left)\right\}}_{\text{n}=1}^{\text{N}}\), loss function \(\:L\).

  2. b)

    Construct neural network.

  3. c)

    Use Eq. (9) to calculate the output \(\:{x}^{\left({\uptau\:}\right)}\left(\text{n}\right)\).

  4. d)

    Calculate loss function \(\:\text{L}\left({x}^{\left({\uptau\:}\right)}\right(n), Y(n\left)\right)\).

  5. e)

    Use Eqs. (10) and (11) respectively to estimate the gradient of loss to weight and Gaussian perturbation.

  6. f)

    Update weights and noise levels.

  7. g)

    Repeat steps c through f until the parameters meet the requirements of the model.

Dual path attention layer (DPAL)

In order to enable the model to adjust the attention to the features of different spatial regions and different channels adaptively, enhance the model’s feature extrac-tion capability, and achieve more efficient image enhancement. Therefore, the dual path attention layer is proposed, and the structure is shown in Fig. 3.

Fig. 3
figure 3

Multi-level feature attention layer diagram.

It can be seen from the figure that it consists of two attention paths, the first path is based on the average and max pooling of all channels in the feature graph, and then the two results are combined to enhance the feature learning ability of the model. The functions of the two operations are different, but complementary. Average pooling aims to capture the overall features of the transmission line image, while max pooling aims to capture the prominent features (edges, textures) of the transmission line image. Subsequently, the attention weights are generated through convolution operations and Sigmoid activation functions, and feature weighting is performed on the feature map. The calculation process is as follows.

$$\:\begin{array}{c}MeanPooling\left(x\right)=\frac{1}{C}\sum\:_{c=1}^{c}F\left(i,j,c\right)\end{array}$$
(14)
$$\:\begin{array}{c}MaxPooling\left(x\right)={Max}_{c=1}^{C}F\left(i,j,c\right)\end{array}$$
(15)
$$\:\begin{array}{c}{F}_{1}=Concat\left(MeanPooling\left(x\right),MaxPooling\left(x\right)\right)\end{array}$$
(16)
$$\:\begin{array}{c}D=\sigma\:\left({Conv}^{1\times\:1}\left({F}_{1}\right)\right)\times\:F\end{array}$$
(17)

Where \(\:i,j\) represent the index value of the feature graph space, \(\:c\) represents the number of channels, \(\:\sigma\:\) represents the activation function, and \(\:{Conv}^{1\times\:1}\) represents the 1 × 1 convolution operation.

The second path directly compacts the number of channels in the input feature map from C to 1 through a 1 × 1 convolution, fuses the information of all channels together, and generates a feature map of “global channel information”. Then, through Sigmoid activation function, this feature map generates a channel attention map to highlight or suppress different channel information. The calculation process is as follows.

$$\:\begin{array}{c}E=\sigma\:\left({Conv}^{1\times\:1}\left(F\right)\right)\end{array}$$
(18)

Then, the spatial attention weighting result of the upper path is concatenated with the channel attention of the lower path, and then the channel number of the input feature map is compressed from C to 1 by a 1 × 1 convolution. and then the attention map is generated by the Sigmoid activation function, finally, the feature weighting is performed on the original feature map. The calculation process is as follows.

$$\:\begin{array}{c}{F}_{final}=\sigma\:\left({Conv}^{1\times\:1}\left(Concat\left(D,E\right)\right)\right)\times\:F\end{array}$$
(19)

\(\:{F}_{final}\) represents the final generated feature map, and \(\:D,E\) represent the results of different path respectively. By concatenating the results of the two paths, the channel attention of the second path can be directly combined with the spatially weighted feature map to form a richer feature map, which contains the combination information of spatial attention and channel attention, without limiting the complementarity of the two kinds of attention, thus enhancing the diversity of feature expression. Finally, the model generates the overall weight map through 1 × 1 convolution and Sigmoid activation. Combining these two attention mechanisms, the model can adaptively adjust the features of different spatial regions and the importance of different channels, so as to achieve more efficient and accurate image enhancement effects.

Experiment

Dataset

In this work, we use datasets collected by drones and extended to existing data. Among them, there are 700 rainy images, 600 foggy images and 800 blurry images. Each type of image data is divided into train set, var set and test set according to the ratio of 7:2:1, as shown in Table 1.

Table 1 Data set category and quantity.

Implementation details

In this paper, the TGTLIE method is trained in a phased way. First, TINet is trained, and then its parameters are fixed, and the TCGAN network is trained. This experiment uses the Pytorch framework to train the model on the Ubuntu system NVidia RTX-4090. In the TINet network training process, Batch size was 2, Adam optimizer was used, the initial learning rate was 1 × 10 − 4, and the training rounds were 50. In the training process of TINet network, the Batch size is 4, the Adam optimizer is used, the initial learning rate is 5 × 10 − 3, and the training rounds are 200. We scaled the pixel resolution of the input image to 512 × 512 pixel resolution for network training. The Peak Signal-to-Noise Ratio (SSIM) and Structural Similarity (PSNR) were used as experimental evaluation indicators to evaluate the performance of network training.

Ablation experiment

Network architecture analysis

In order to verify the effect of texture inference network (TINet) on guiding image enhancement and the effect of local discriminator on preventing local artifacts in im-ages. We performe ablation experiments on the TGTLIE variant. (1) TCGAN: removes TINet network from TGTLIE and do not use texture information to guide; (2) TINet + TCGAN w/o LD: Remove the local discriminator from TCGAN. The quantitative ex-perimental results are shown in Table 2, and the qualitative experimental results are shown in Fig. 4.

Table 2 Results of network architecture ablation experiment.

As can be seen from Table 2, the image enhancement performance of TGTLIE and TINet + TCGAN w/o LD is always better than that of TCGAN. The PSNR and SSIM of TGTLIE are improved by 1.294dB, 1.567dB, 3.210dB and 0.016, 0.013, 0.039 respectively compared with TCGAN. The PSNR and SSIM of TINet + TCGAN w/o LD are improved by 0.623dB, 1.018dB, 1.032dB and 0.005, 0.005, 0.018 respectively compared with TCGAN. It is shown that the guidance of texture information can effectively improve the performance of model image enhancement. The PSNR and SSIM of TGTLIE are improved by 0.671dB, 0.549dB, 2.178 dB and 0.011, 0.008, 0.021 respectively compared with TINet + TCGAN w/o LD,, respectively, indicating that the local discriminator can focus on the local details of the image and improve the image enhancement effect.

Fig. 4
figure 4

Visualization results of network architecture ablation experiment.

As can be seen from Fig. 4, TCGAN lacks the guidance of texture information, and the object details in the generated images are relatively fuzzy. For example, the texture structure of the tower in the generated image of rain image is unclear, surface artifacts exist in the generated image of fog image, and kites and cables are not clearly displayed in the generated image of blur image. TINet + TCGAN w/o LD lacks local discriminator, and local details in the generated map are fuzzy, such as local artifacts in the generated map of rain map, color shadows around mountain fires in the generated map of fog map, and partial missing kites and cables in the generated map of blur map. The texture structure and detail information of all generated images by TGTLIE are clearer, rain lines, fog and fuzzy areas are effectively removed, and local details of the image are more complete and clear. This proves the effectiveness of the texture inference network and the local discriminator, making the generated images clearer in overall visual effect.

Network module analysis

In order to verify the effectiveness of the designed components in TGTLIE, different ablation experiments are performed on the dataset by using different components, including the neural gradient algorithm layer (NGAL) and the dual path attention layer (DPAL). (1) TGTLIE w/o NGAL, DPAL: neural gradient algorithm layer (NGAL) and dual path attention layer (DPAL) are not used in TGTLIE, only convolution operation is used. (2) TGTLIE w/o NGAL: No neural gradient algorithm layer (NGAL) is used in TGTLIE; (3) TGTLIE w/o DPAL: dual path attention layer (DPAL) is not used in TGTLIE. The results are shown in Table 3.

Table 3 Results of network module ablation experiment.

It can be seen from the table, when TGTLIE does not use the NGAL and DPAL, the PSNR and SSIM of the model are 31.622dB, 33.065dB, 27.684dB and 0.940, 0.947, 0.881, respectively. After adding DPAL, the PSNR and SSIM of TGTLIE are increased by 0.732dB, 0.756dB, 1.102 dB and 0.007, 0.005, 0.027 respectively, indicating that the DPAL can make the model pay more attention to important areas in the image and improve the image enhancement effect. After adding NGAL, the PSNR and SSIM of TGTLIE increased by 1.103dB, 1.189dB, 1.661dB and 0.011, 0.008, 0.030, respectively, indicating that the NGAL enhances the robustness of the model and enables the model to better retain the image structure details, and improves the model’s feature extraction ability. After using the NGAL and the DPAL at the same time, the image enhancement effect of TGTLIE is optimal. This shows the effectiveness and necessity of the various components proposed in our method for transmission line image enhancement.

Attention module analysis

In order to verify the effectiveness of the dual path attention layer (DPAL) designed in TGTLIE in terms of computational efficiency and accuracy. Different ablation experiments are being conducted by using DPAL, SE, CoordAttention (CA), and CBAM, respectively. The results are shown in Table 4.

Table 4 Results of attention module ablation experiment.

It can be seen from the table, PSNR and SSIM are improved after DPAL, SE, CA and CBAM are introduced into the model. It shows that the introduction of attention mechanism can improve the image enhancement performance of the model. The detection speed of DPAL is slightly lower than that of SE, but compared with SE, CA and CBAM, DPAL has fewer parameters and floating point computation, and better detection performance. Taking these factors into account, DPAL has better overall performance and can improve the enhancement of the model without adding too much complexity to the model.

Contrast experiment

In order to verify the advanced performance of TGTLIE in image enhancement of transmission lines and accurately evaluate the performance of TGTLIE, quantitative experiments were conducted with CBDNet, Uformer, MSPFN and DGNL-Net, and the results are shown in Table 5.

Table 5 Experimental results of different methods.

It can be seen from Table 5, the PSNR and SSIM values of TGTLIE achieve the best performance in image enhancement under different environmental interferences, the PSNR and SSIM values are 33.218dB, 34.921dB, 30.725dB and 0.956, 0.962, 0.926, respectively. Compared with DGNL-Net, the PSNR and SSIM of TGTLIE are improved by 0.777dB, 0.667dB, 3.410 dB and 0.006, 0.007, 0.049, respectively. Compared with Uformer, the PSNR and SSIM of TGTLIE are improved by 1.594dB, 1.109dB, 1.343 dB and 0.011, 0.011, 0.009, respectively. It is proved that the proposed method TGTLIE has good image enhancement effect and can recover clear images, verifying the effectiveness and advancement of TGTLIE in transmission line image enhancement.

Qualitative experiment

In order to comprehensively evaluate the performance of the proposed method, in addition to relying on objective quantitative indicators, qualitative visualization experiments are also indispensable. Through intuitive visual observation, we can reveal the feeling of the image in terms of detail preservation, color restoration, noise suppression, etc. We conducted qualitative experiments with CBDNet, Uformer, MSPFN and DGNL-Net to demonstrate the performance of the TGTLIE in different image types. The experimental results are shown in Fig. 5.

Fig. 5
figure 5

Visualization results of different methods.

As can be seen from Fig. 5, there are artifacts on the surface of the image enhanced by CBDNet method, and the image details are fuzzy and the texture is not clear. Although the enhanced image by MSPFN method reduces the artifacts on the image surface, it is easy to cause the image color deviation. Although the image enhanced by Uformer and DGNL-Net methods has no artifacts, the local details of the image are missing, especially in the processing of blurred images. The TGTLIE can better remove rain streaks, fog and deformation, and the enhanced image has no artifact interference, accurately restore the true color of the image, and retain the details in the image. The enhanced image is clearer, true and more pleasant in visual experience. The excellent performance in quantitative and qualitative evaluation consistently proves the great effectiveness of the TGTLIE in transmission line image enhancement, which lays a good foundation for the subsequent transmission line monitoring.

Object detection application experiment

In order to verify the effectiveness of TGTLIE in the actual application of transmission line monitoring after image enhancement, this paper conducted object detection experiments before and after image enhancement. Through the transmission line foreign body detection algorithm ETSD-YOLO developed by us before33, the detection results in different environments are shown in Fig. 6.

Fig. 6
figure 6

Obeject detection results before and after image enhancement.

As can be seen in Fig. 6, after the transmission line image is disturbed by rain, fog and blur, the false detection rate and missing rate of the image also increase, and the detection algorithm mistakenly identifies non-existent targets in the image. For example, the nest object cannot be detected in the rain map, the smoke object in the fog map is mistakenly detected, and the mountain fire and smoke object in the blur image are missed. After enhanced by the TGTLIE, all the objects in the image are detected, and the detection accuracy does not decrease. This shows that TGTLIE can effectively remove the interference of rain, fog and blur in the image, improve the quality of the image, make the image clearer and more realistic, and contribute to the data analysis and application of the transmission line monitoring.

Adaptability experiment

In order to verify the robustness and generalization of the proposed method for transmission line image enhancement, experimental validations were conducted in various transmission line infrastructure environments, snowy weather, and with low-resolution images. The experimental results are presented in Fig. 7.

Fig. 7
figure 7

Experimental results of adaptability in different environments.

As shown in Fig. 7, Figure (A) shows the inspection image of transmission line in a rural area during rainy weather. After enhanced by the method in this paper, the image is clearly displayed, and the Trash foreign object is accurately detected. Similarly, Figure (B) shows the inspection image of transmission line in urban areas, where the Kite foreign object is also accurately detected. Figure (C) shows the inspection image of transmission line under extreme snowy weather in winter. After enhancement by the method in this paper, the Nest foreign object is accurately detected. Figure (D) shows the inspection image of transmission line captured in low-resolution foggy weather. Although there are some artifacts and slight deformation of foreign objects in the image after enhancement using the proposed method, the Kite foreign object is still accurately detected. These results indicate that the method in this paper achieves good image enhancement effects in various practical scenarios, improves image quality, and enhances the accuracy of power line monitoring, thereby proving the strong robustness and generalization ability of the proposed method.

Conclusion

In this paper, a texture guided transmission line image enhancement method (TGTLIE) is proposed to improve image quality and the accuracy of foreign object detection, aiming at the interference of natural environmental factors (such as rain, fog, blur, etc.) in transmission line inspection. Firstly, texture inference network (TINet) is designed, which can effectively extract texture information from the input disturbed images, then the texture information is used as prior knowledge to guide the subsequent image enhancement process. Then, we use texture-based conditional generation adversarial network (TCGAN) to realize adaptive image de-rain, de-fog and de- blur processing. In order to further improve the detail quality and robustness of image generation, we introduce neural gradient algorithm and the dual path attention in TCGAN. These techniques help the model pay more attention to important details in the image and reduce noise interference. At the same time, in order to prevent local artifacts, a global-local discriminator structure is designed in the discriminant network, and the generated images are inspected globally and locally. During the training process, the multi-stage joint loss function is used, including Charbonnier loss, SSIM loss and global-local generative counter loss, which helps the model to improve image quality and detail performance while maintaining image authenticity. Through quantitative and qualitative experimental, TGTLIE can effectively remove noise interference under different meteorological conditions, significantly improve image quality, and provide effective technical support for intelligent inspection and fault warning of transmission lines. In the future, we will continue to optimize TGTLIE to improve its real-time, robustness and generalization to meet the challenges of a more complex and changing natural environment.