A hybrid zero-reference and dehazing network for joint low-light underground image enhancement

Du, Qing; Zhang, Shihao; Wang, Zhipeng; Liang, Jincheng; Yang, Shijiao

doi:10.1038/s41598-025-95366-3

Download PDF

Article
Open access
Published: 24 March 2025

A hybrid zero-reference and dehazing network for joint low-light underground image enhancement

Qing Du¹,
Shihao Zhang¹,
Zhipeng Wang¹,
Jincheng Liang¹ &
…
Shijiao Yang¹

Scientific Reports volume 15, Article number: 10135 (2025) Cite this article

1887 Accesses
Metrics details

Subjects

Abstract

The raw images captured by underground vision sensors in underground mine settings are disturbed by dim lighting, high dust levels, and complex electromagnetic conditions, suffering from high noise, low illumination, and low-resolution contamination, which further affects the supervision of the vision sensors. However, existing image enhancement methods relying on synthesized datasets are not suitable for improving images in real underground mine settings. This study focuses on addressing these challenges. We collected a large number of underground mine images and proposed a novel image enhancement approach. Inspired by visual image processing techniques, this approach combines low-light enhancement and dehazing methods to address the issues of uneven lighting and fog distortion. Specifically, the proposed Zero-Reference Depth Curve Estimation-Dehazing Network (Z-DCE-DNet) aims to enhance underground mine images. It addresses two key aspects: (1) enhancing low-light images by incorporating higher-order loss curves into the DCE-Net backbone and introducing a new loss function to optimize network learning for improved low-light image quality; (2) addressing the color distortion and blur caused by low light enhancement through post-processing using convolutional neural networks, with AOD-Net enhancing the clarity of downhole images. Extensive experimental results demonstrate that the Z-DCE-DNet method produces visually superior enhanced images, and comparative analyses of multiple object detectors reveal that the enhanced images lead to improved detection outcomes.

A depth iterative illumination estimation network for low-light image enhancement based on retinex theory

Article Open access 12 November 2023

Comparative analysis of dehazing algorithms on real-world hazy images

Article Open access 28 March 2025

Zero-shot learning enables instant denoising and super-resolution in optical fluorescence microscopy

Article Open access 16 May 2024

In underground mines, poor lighting conditions often lead to insufficient illumination in images captured by camera sensors, degrading the video quality. Additionally, the aesthetic appeal of images captured by underground vision devices is compromised, and the information transmission is unsatisfactory. Therefore, image enhancement in the context of underground mining should be a subject of research. It can effectively improve the visibility and visual quality of underground mining scenes. Moreover, enhanced visibility emphasizes scenes and objects more prominently, providing a stronger foundation for advanced computer vision tasks(e.g^1,2,3,4). in mining, such as recognizing personnel behavior and detecting restricted areas.

Low-light enhancement is the primary task of underground mining images, which is a significant research direction in image processing. The latest advancements in low-light enhancement are mainly based on deep learning solutions⁵, incorporating various learning strategies such as supervised learning, reinforcement learning, unsupervised learning, zero-shot learning, and semi-supervised learning, along with training datasets(e.g^6,7,8,9,10). These methods provide effective benchmarks for mining image enhancement. Supervised learning(e.g^11,12,13,14). , reinforcement learning(e.g^{15,16,17,18,19}). , and unsupervised learning all require the use of paired training datasets²⁰. This requirement poses challenges in acquiring underground images that accurately depict the real scenes captured by mining visual equipment. Concerning generalization, these methods exhibit limited generalization capabilities and unstable training.

In addition, image enhancement models trained on synthetic images typically do not generalize well to real-world images. Many enhancement methods based on unsupervised learning aim to overcome the limitations of training on synthetic data. These methods take low-light images as input and generate high-quality curves as output to enhance the images. However, the curves used in this method may reduce image contrast and surface color, resulting in a blurry final image. Existing methods applied in underground mines still require further improvement.

To address these issues, we propose a lightweight method, named Z-DCE-DNet, for enhancing mine images that meets the practical requirements of mining operations. The Z-DCE-DNet framework consists of two stages: unsupervised low-light enhancement operation and dehazing sharpening processing. In the low-light enhancement stage, we use realistically captured underground images for unsupervised training, relying on the DCE-Net backbone. By using a carefully designed loss function, we can obtain the most suitable image enhancement parameters. Additionally, in the context of underground image enhancement, we retrain the images restored with illumination and then use a series of lightweight CNN network frameworks for dehazing, based on a physical scattering model. The effectiveness of our method is validated through multiple comprehensive sets of experiments.

To obtain clearer images of mines, we combine two image enhancement strategies: weak light enhancement⁶and image dehazing⁷, for image processing tasks in underground mine. The main contributions of this paper are as follows:

Images of Underground Mines Dataset (IUMD): Due to the lack of underground images in real-life scenarios. With the varying exposure levels in underground environments, we gathered surveillance video data from diverse underground scenes to create a more extensive dataset.
Unsupervised Image Enhancement Strategies: To enhance the illumination of underground images, we improved the unsupervised learning method of DCE-Net. After enhancing the image, we revisited the color deviation issue and achieved improved image illumination by optimizing the non-reference loss function. Subsequently, to address the image blurriness caused by low light enhancement, we proposed a post-processing method based on AOD-Net to obtain clearer images.
Comparison Experiment of Image Enhancement Methods: An ablative experiment was conducted to validate the effectiveness of the proposed method. Additionally, we analyzed and compared Z-DCE-D with current mainstream image enhancement methods. The experimental results indicate that our method outperforms other models in terms of subjective visual quality.
Comparison of Object Detectors: To validate the benefits of enhanced images for subsequent visual tasks underground, we employed the current popular object detection algorithm. The performance of the safety helmet detection model in images was compared before and after enhancement. Results from multiple experimental comparisons showed that the enhanced images could further improve the performance of the object detection model.

The structure of this article is as follows: Sect. 2 introduces our dataset and provides a review and summary of the relevant literature. Section 3 delves into the details of the proposed framework. Performance evaluation is presented in Sect. 4. The article concludes with a summary in Sect. 5.

Related work

In this section, we will first introduce our dataset and then summarize the most common image enhancement and image defogging methods. We reviewed the existing methods for multiscale feature representation and compared them with our proposed method.

Images of underground mines dataset, IUMD

Low-light image enhancement (LLIE) aims to improve the perception or interpretability of an image captured in a low-light environment²¹. Supervised learning-based image recovery and enhancement methods rely heavily on paired datasets of synthesized low-light and normal images, such as LOL¹⁸, Exclusive Dark²², SID⁹, and RELLISUR²³ among others.

Typically, pairing data is captured by automatically dimming lights, changing camera settings or retouching synthetic data. The data has increased difficulties when used in underground mine scenarios: (1) Capturing images of the same visual scene at the same time in an underground mine is difficult. (2) Synthesising damaged images from clean images can sometimes be helpful, but such synthesis results are usually not photorealistic enough, resulting in various artifacts when the trained model is applied to real low-light images.

To do this, we set up the IUMD. We collected the actual surveillance video from 12 mines control rooms in Hunan, Shandong, and Guangdong provinces in China, as well as the images taken in the field under different light intensities. The section of the IUMD is shown in Fig. 1. There are several complex scenes like pit stop and track, a total of 1647 images under different lighting conditions, and the image size was set at 512 × 512. All volunteers who participated in photo documentation were informed about the data usage and agreed to the research content presented in this manuscript.

Low light enhancement

Paired training data sets are required in volume for learning strategies, which presents great difficulties in the acquisition of subsurface images and may not reflect the real scene obtained by mine vision equipment. However, a zero-shot-based learning strategy is used to emphasize that this approach does not require paired or unpaired training data.

ExCNet is the first zero-shot CNN backlight image restoration method²⁴. Through training and testing, the S-curve of the backlit image can be estimated images. This scheme can be applied to most backlight shooting scenes and backlight image restoration processing. RRDNet enhances the three components of light, emission rate²⁵, and noise of low-quality images by combining retinal reconstruction loss, texture enhancement loss, and light-guided noise estimation loss. This method does not require pre-trained image samples or prior training and can be used to restore underexposed images. Zero-DCE formulates low-light enhancement as a task of image-specific curve estimation, which takes low-light images as input and produces higher-order curves as output²⁶. These curves are adjusted to take into account the pixel value range, monotonicity, and differentiability of the input image to obtain an enhanced image, based on which Zero-DCE + + is proposed as a lighter version²⁷.

Image dehazing

Although the captured scene images are illuminated, problems such as low contrast, low saturation, loss of detail and color deviation after illumination are more severe in underground mines. Post-processing methods to improve image defogging are considered. Traditional image dehazing methods are mainly based on dark channel prior²⁸ and color attenuation prior²⁹. Better environmental conditions are required to apply the above methods. Deep learning-based methods are more flexible. MS CNN trained the network to transfer the matrix in a coarse-to-fine way to estimate haze, which is one of the earliest studies to solve the haze removal problem by CNN³⁰. DehazeNet is a trainable end-to-end system that explicitly learns the mapping relationships between raw haze images³¹. DCPDN embedded the physical scattering model into the network, allowing the network to jointly estimate transmission graphs, atmospheric light, and haze removal images³². AOD-Net was designed to produce recovered images by reformulating the physical scattering model, and the same idea was later extended to video deblurring³³. PSD is a principled composite to the real dephasing framework that starts from the dephasing model backbone of pre-trained synthetic data and uses real fuzzy images to fine-tune the model in an unsupervised manner²³. The adjustment is made through experience to obtain a clearer image.

Image low-light enhancement and dehazing methods are crucial in image processing, and these techniques have demonstrated significant effectiveness in real-world scenarios. Inspired by previous work, we combine these two image processing methods, employing dehazing enhancement after low-light enhancement strategy to obtain clearer underwater images.

Method

This section provides an in-depth discussion of the underground image enhancement method proposed in Z-DCE-DNet. As shown in Fig. 2, the complete proposed algorithm is summarized in two stages. In the first stage, we enhance the weak light in the images captured in underground mines. The second stage involves sharpening the images enhanced for luminance. Finally, a clearer image has been obtained. The detailed steps of each stage are elaborated in the following sections.

Framework overview

The overall framework of the Z-DCE-DNet is shown in Fig. 2, which includes two stages of image enhancement for low light conditions and image dehazing.

(1) Low light enhancement module (Fig. 2-a): The backbone of the system is DCE-Net²⁶, a deep network for zero-shot curve estimation. The input to DCE-Net is a set of low-light images acquired downhole, which are iteratively enhanced for a given input image (i.e., the enhanced image is used as the input for the next iteration, and that input is incrementally enhanced). The output is a set of pixelated curve parameter plots corresponding to higher-order curves that are iteratively adjusted to obtain the best brightness enhancement image.

(2) Image dehazing module (Fig. 2-b): The mine image processed only by the low-light enhancement method pays more attention to enhancing the brightness of the image, resulting in a loss of clarity of the brightened image. Inspired by the strong physical background of dehazing, the AOD-Net is used to deblurring. Estimate K(x) from the input fuzzy image$\:I\left(x\right)$, and then use $\:K\left(x\right)$ as the input parameter to estimate $\:J\left(x\right)$, that is, generate a clean image. The detailed network architecture is shown in Fig. 2.

Low light enhancement module

DCE-Net is designed to learn the mapping between borehole images and their best-fit curve parameter mappings, using a normal CNN with seven convolutional layers with symmetric cascades²⁶. Each layer consists of 32 3 × 3 convolutional kernels with a step size of 1, followed by the ReLU activation function. The last convolution layer produces 24 parameter graphs for 8 iterations (n = 8), where each iteration requires three curve parameter graphs for the three channels R, G & B.

A distinguishable set of non-reference losses is proposed in Zero-DCE, which enables us to evaluate the quality of enhanced images²⁶. The following four types of losses are used to train DCE-Net: (1) Spatial Consistency Loss. (2) Exposure Control Loss; (3) Color constancy loss; (4) Illumination smoothness Loss.

Direct visual effects show the impact of each loss function on downhole image enhancement in Fig. 3. To reduce the computational load of the model, we redefine the non-reference loss function.

Exposure control loss

To suppress the under-exposure/over-exposure area, we designed an exposure control loss $\:{L}_{exp}$ to control the exposure level. Loss $\:{L}_{exp}$ can be expressed as

$$\:{L}_{exp}=\frac{1}{M}{\sum\:}_{k=1}^{M}\left|{Y}_{k}-E\right|$$

(1)

Where, $\:\text{M}$ represents the number of non-overlapping local regions with a size of 16 × 16, and $\:\text{Y}$ is the average intensity value of the local region in the enhanced image. We set $\:\text{E}$ to 0.4 empirically.

Color constancy loss

The assumption of color constancy in the gray world is followed, that is, the colors in each sensor channel are on average gray across the entire image. Color constancy loss $\:{L}_{col}$ can be expressed as

$$\:{L}_{col}={\sum\:}_{\forall\:\left(p,q\right)\in\:\epsilon\:}{\left({J}^{p}-{J}^{q}\right)}^{2},$$

$$\:\epsilon\:=\left\{\left(R,G\right),\left(R,B\right),\left(G,B\right)\right\}$$

(2)

Where, $\:{\text{J}}^{\text{p}}$ represents the average intensity value of $\:\text{p}\:$channels in the enhanced image, $\:(\text{p},\text{q})$ represents a pair of channels.

Illumination smoothness loss

To maintain the monotone relationship between adjacent pixels, A loss of light smoothness is added to each curve parameter plot A. $\:{L}_{tvA}$ is defined as

$$\:{L}_{tvA}=\:\frac{1}{N}\sum\:_{n=1}^{N}\sum\:_{c\in\:\xi\:}{\left(\left|{\nabla\:}_{x}{A}_{n}^{c}\right|+\left|{\nabla\:}_{y}{A}_{n}^{c}\right|\right)}^{2},$$

$$\:\xi\:=\left\{R,G,B\right\}$$

(3)

Where N is the number of iterations, $\:{\nabla\:}_{x}$ is the horizontal gradient, and $\:{\nabla\:}_{y}$is the vertical gradient.

Total loss

to reduce the calculation amount of the model, we deleted the spatial consistency loss in the original network, and we redefined the complete loss as.

$$\:{L}_{total}={L}_{exp}+{{W}_{col}L}_{col}+{{W}_{tvA}L}_{tvA}$$

(4)

Where, $\:{\text{W}}_{\text{c}\text{o}\text{l}}$ and $\:{\text{W}}_{\text{t}\text{v}\text{A}}$ are loss weights that used to balance $\:{\text{L}}_{\text{e}\text{x}\text{p}}$, $\:{\text{L}}_{\text{t}\text{v}\text{A}}$ and $\:{\text{L}}_{\text{t}\text{v}\text{A}}$.

Image dehazing module

The reformulated non-reference loss function to measure the enhanced image quality implicitly and drive the learning of the network is proved to be effective, but the simple function-driven enhancement brings problems such as overexposure, color bias and aggravating blur for downhole images.

To solve the above picture quality problems, the post-processing method of low-light enhancement and deblurring was taken into account in the underground image. AOD-Net was adopted as the backbone of the defogging enhancement module to be explained in Fig. 2- (b), which included five convolutional layers, parallel convolution with filters of different sizes, and the coarse-scale network features were connected with the intermediate layer of the fine-scale network. Finally, multi-scale features are formed by fusing filters of different sizes³³.

AOD-Net based a reformulated atmospheric scattering model estimates $\:K\left(x\right)$ from input $\:I\left(x\right)$, followed by a clean image generation module that uses $\:K\left(x\right)$ as its input adaptive parameter to estimate $\:J\left(x\right)$. The K-estimation module is a key component of AOD-Net and is responsible for estimating depth and relative haze levels. The model is expressed as follows:

$$\:J\left(x\right)=K\left(x\right)I\left(x\right)-K\left(x\right)+b,$$

$$\:K\left(x\right)=\frac{\frac{1}{t\left(x\right)}\left(I\left(x\right)-A\right)+\left(A-b\right)}{I\left(x\right)-1},$$

(5)

$$\:t\left(x\right)={e}^{-\beta\:d\left(x\right)}$$

where, $\:I\left(x\right)$ is the fuzzy image after low-light enhancement, $\:J\left(x\right)$ is the scene emissivity(i.e. the ideal “clean image”), $\:b$ is the constant bias of the default value 1, the parameter $\:A$ represents the global atmospheric light, $\:t\left(x\right)$ is the transmission matrix, $\:\beta\:$ is the scattering coefficient of the atmosphere, and $\:d\left(x\right)$ is the distance between the object and the camera.

During the image dehazing stage, we employ Mean Square Error (MSE) as the loss function. The MSE loss directly calculates the pixel-level difference between the generated image and the target image, which can be mathematically expressed as:

$$\:{L}_{MSE}=\frac{1}{N}\sum\:_{i=1}^{N}{\left(J\left({x}_{i}\right)-\widehat{j}\left({x}_{i}\right)\right)}^{2}$$

(6)

where, $\:\widehat{j}\left(x\right)$ denotes the ground truth image, N represents the total number of pixels in the image, and i indicates the pixel position index.

Experiment

In this section, the experiment configurations and results using the Z-DCE-DNet for Underground Mine image enhancement are presented. Firstly, the PyTorch deep learning framework was used, running on an NVIDIA GeForce RTX 1650 GPU, applied to underground mine images with a resolution of 512 × 512 for performance measurement. Various algorithms, including object detectors like YOLOv5, YOLOv8, and SSD, were utilized to compare the effectiveness of object grasping in images before and after enhancement. Figures 4 and 5 display visual comparisons of enhanced images. Performance metrics such as PSNR and SSIM values confirm the visual quality of the proposed enhancement technique. The detection results from the object detectors, along with performance metrics, indicate the effectiveness of the proposed method.

Experiments settings

Z-DCE-DNet was trained on a dataset of 1,647 authentic underground mining images. By incorporating diverse underground scenes into the training set, we fully utilized the advantages of wide dynamic range adjustment. For the IUMD experimental dataset, the images were randomly partitioned into training, validation set and test sets at an 7:2: 1 ratio, with all images standardized to a resolution of 512 × 512.

We realized our framework on Windows 10 system and PyTorch deep learning framework; the GPU is set to NVIDIA GeForce RTX 1650, video memory is set to 4 GiB, batch size is set to 4, and epoch is set to 100. Standard Gaussian functions with zero mean and 0.02 standard deviation are used to initialize the filter weights for each layer. The deviation is initialized to a constant. We use an ADAM optimizer with default parameters and a fixed learning rate of 1e-4 for network tuning. To balance the scale of the losses.

Ablation study

Ablation experiments based on Zero-DCE networks have been investigated to obtain optimal luminance-enhanced downhole images. A series of experiments demonstrates the methodological research in this paper. We input UMID into the training Z-DCE network to obtain the model Epoch99.pth for verification and testing of downhole images. We randomly verified a set of downhole images shown in Fig. 4. The results obtained by Zero-DCE have a significant brightening effect, but the output images are colored in the visual effect.

We made the following considerations to improve the network performance to solve the color bias problem caused by low light enhancement : (1) Starting from the RGB color channel, we reduced the value proportion of the R channel and redefined the color function inside the Zero-DCE network, (2) For the definition of a group of non-reference loss functions in Formula 4, we re-adjust the size of the weights $\:{\text{W}}_{\text{c}\text{o}\text{l}}$ and $\:{\text{W}}_{\text{t}\text{v}\text{A}}$, respectively set to 5 and 20, to get the results of the output of Improve-Zero-DCE to solve the color bias problem existing in the source code.

Observing the image output in Improve- Zero-DCE, we found that the image after the color correction was too white and the image was blurred. Such an image was not clear and was related to the dust blur present in the underground environment itself. We applied the AOD-Net image deblurring scheme to obtain a cleaner and clearer image. The experiments were verified in Fig. 4, and Z-DCE-DNet showed a superior ability to handle extreme lighting conditions.

The experimental results in Table 1 demonstrate the effectiveness of our proposed Z-DCE-DNet method through comprehensive quantitative evaluation. Compared with baseline methods, Z-DCE-DNet achieves the best performance with PSNR of 13.97 dB and SSIM of 0.643, indicating better preservation of image content and structural information. Furthermore, it obtains the lowest NIQE score of 4.14, suggesting superior perceptual quality of the enhanced images. Although Z-DCE shows better BRISQUE performance, the overall superiority of Z-DCE-DNet across multiple metrics validates its effectiveness for low-light image enhancement.

Table 1 Quantitative comparison of image enhancement results in Fig. 4. PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural similarity index Measure) are full-reference metrics where higher values indicate better quality. PSNR measures the ratio between the maximum possible signal power and noise power, while SSIM assesses structural similarity between images.niqe (Natural image quality Evaluator) and BRISQUE (Blind/Reference-free image Spatial quality Evaluator) are no-reference metrics where lower values indicate better quality. These metrics evaluate perceptual image quality without requiring reference images, which is particularly useful for real-world enhancement tasks.

Full size table

Contrast experiments

We compared our method with several advanced methods: including two conventional methods (BIMEF³⁴, LIME³⁵, a CNN-based method RetinexNet¹⁵, and a GAN-based method EnlightenGAN²⁰).

Three downhole images from each scene were randomly selected for verification. Yellow, pink and blue block diagrams were used to compare the effects of each algorithm model. As shown in Fig. 5, the BIMEF model can perform color processing well in low-light image enhancement, while Retinex-Net and EnlightenGAN two methods have a huge color bias problem after enhancement. Compared with the results of multiple experimental data, our method can achieve better visual effects for image processing of each scene in the mines.

A quantitative comparison of various enhancement methods, encompassing both objective metrics and architectural characteristics in Table 2. The experimental results reveal notable performance variations across different assessment criteria.

Quantitatively, while Z-DCE achieves optimal PSNR performance (32.45) and LIME demonstrates superior SSIM metrics (0.8870), our proposed Z-DCE-DNet exhibits balanced performance across all evaluation metrics. Particularly noteworthy is its exceptional performance in no-reference metrics, achieving the optimal NIQE score (6.22) and BRISQUE value (44.89), which are crucial for real-world mining applications where reference images are unavailable.

From an architectural perspective, the methods exhibit distinct computational characteristics. Conventional approaches such as BIMEF and LIME employ non-neural methodologies with iterative optimization processes. In contrast, deep learning-based methods demonstrate varying degrees of architectural complexity. EnlightenGAN incorporates a sophisticated GAN framework with U-Net architecture and self-attention mechanisms, necessitating substantial computational resources. RetinexNet implements a multi-stage network architecture with deep reconstruction components, resulting in increased memory requirements.

Our proposed Z-DCE-DNet distinguishes itself through its efficient architectural design, combining DCE-Net and AOD-Net in a streamlined framework. This approach achieves competitive enhancement performance while maintaining minimal computational overhead, as evidenced by the ‘Network Architecture’ and ‘Parameters’ specifications in Table 2. The lightweight design principles employed in Z-DCE-DNet make it particularly advantageous for deployment in practical mining environments where computational efficiency is paramount.

Table 2 Comprehensive comparison of image enhancement methods in terms of quantitative metrics and computational characteristics. The best and second-best results are marked in red and Blue, respectively.

Full size table

Object detection experiment

To verify the subsequent surveillance effectiveness of the enhanced images in underground mine video surveillance, further target detection was conducted, and a trained underground safety helmet detection model³⁶ was adopted to compare the original image detection with the enhanced image detection using our method. The results are shown in Fig. 6. Our method yielded superior visual effects, enhancing the visibility of the target safety helmets more prominent. Furthermore, upon comparing the two sets of experimental charts, it can be observed that the enhanced images exhibited superior detection accuracy in the results of the same safety helmet detector. Especially in terms of detector confidence and precise detection, the enhanced images demonstrated a significant improvement.

Based on the above experimental content, we further investigated the impact of enhanced images on target detection in underground mine. The experimental content is as follows: We utilized enhanced images in YOLOv5, YOLOv8, and SSD target detectors for model training and compared the Precision, Recall, mAP, and F1-score of each model.

Performance metrics

Average Precision (AP), mean Average Precision (mAP), and harmonic mean (F1) are commonly used in target detection as standards for evaluating the accuracy performance of detection models. The calculation formulas are shown in the following equation:

$$\:AP={\int\:}_{0}^{1}PdR$$

(6)

$$\:\text{m}\text{A}\text{P}=\frac{\sum\:_{\text{i}=1}^{\text{k}}\text{A}{\text{P}}_{\text{i}}}{\text{k}}$$

(7)

$$\:\text{F}1=2\times\:\frac{\text{P}\times\:\text{R}}{\text{P}+\text{R}}$$

(8)

In the formula, AP represents the average precision for a single class, mAP is the average of all class AP values, F1 is the harmonic mean of P and R, where P is precision, R is recall, and k is the number of detected categories; the calculation formulas for P and R are as follows:

$$\:P=\frac{TP}{TP+FP}$$

(9)

$$\:\text{R}=\frac{\text{T}\text{P}}{{\text{N}}_{\text{G}\text{T}}}$$

(10)

In the formula, TP represents the number of true positive prediction boxes, FP represents the number of false positive prediction boxes, $\:{\text{N}}_{\text{G}\text{T}}$ represents the count of ground truth values.

Discussion and analysis

The influence of enhanced images on target detection models in underground mines was further investigated based on the above experimental content. The enhanced images were used to train the YOLO series and SSD target detection models, and the performance results of each model were compared, as shown in Table 3. The experiment proves that the enhanced image object features are more significant, which is beneficial to the work of vision equipment and has a positive effect on the development of vision software. The comparison of the loss curves of the model training is shown in Fig. 7.

Table 3 Performance pairs of training results of images before and after enhancement on target detection model. Red indicates better results.

Full size table

While training models, we often analyze the model by examining the loss function curve of the target detector to optimize the training parameters for a more robust model with improved generalization capabilities. For the safety helmet detector based on enhanced underground images, training was conducted for 100 epochs, and the loss curves of each model are shown in Fig. 7. Initially, all models converge, but due to differences in the complexity of their network layers, they exhibit varying loss values. Secondly, among all models, enhanced images demonstrate better convergence performance.

Conclusion

This thesis discusses the prior knowledge of deep convolutional neural networks. We investigate the Z-DCE-DNet for the restoration of underground images. We combine low-light image enhancement with post-processing image denoising. The pixel range of the image is adjusted by improving the network parameters and the non-reference loss function. After a wealth of experiments, it is proved that our method can not only achieve certain results in visual effects, but also enhance the features of the detected objects in the subsequent target detection, and is more conducive to the training of the target detection model.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

Gupta, D. K., Arya, D. & Gavves, E. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12362–12371.
Jiao, Y. et al. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21643–21652.
Wang, C. Y., Bochkovskiy, A. & Liao, H. Y. M. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7464–7475.
Wu, T., Cao, M., Gao, Z., Wu, G. & Wang, L. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14720–14729.
Li, C. et al. Low-light image and video enhancement using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 9396–9416 (2021).
Article MATH Google Scholar
Chen, C., Chen, Q., Xu, J. & Koltun, V. in Proceedings of the IEEE conference on computer vision and pattern recognition. 3291–3300.
Cai, J., Gu, S. & Zhang, L. Learning a deep single image contrast enhancer from multi-exposure images. IEEE Trans. Image Process. 27, 2049–2062 (2018).
Article ADS MathSciNet MATH Google Scholar
Chen, C., Chen, Q., Do, M. N. & Koltun, V. in Proceedings of the IEEE/CVF International conference on computer vision. 3185–3194.
Jiang, H. & Zheng, Y. in Proceedings of the IEEE/CVF international conference on computer vision. 7324–7333.
Xue, T., Chen, B., Wu, J., Wei, D. & Freeman, W. T. Video enhancement with task-oriented flow. Int. J. Comput. Vision. 127, 1106–1125 (2019).
Article MATH Google Scholar
Lore, K. G., Akintayo, A., Sarkar, S. & LLNet A deep autoencoder approach to natural low-light image enhancement. Pattern Recogn. 61, 650–662 (2017).
Article ADS MATH Google Scholar
Lv, F., Lu, F., Wu, J. & Lim, C. in BMVC. 4 (Northumbria University).
Lu, K. & Zhang, L. T. B. E. F. N. A two-branch exposure-fusion network for low-light image enhancement. IEEE Trans. Multimedia. 23, 4093–4105 (2020).
Article MATH Google Scholar
Wang, S., Zheng, J., Hu, H. M. & Li, B. Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Trans. Image Process. 22, 3538–3548 (2013).
Article ADS PubMed MATH Google Scholar
Wei, C., Wang, W., Yang, W. & Liu, J. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560 (2018).
Wang, Y. et al. in Proceedings of the 27th ACM international conference on multimedia. 2015–2023.
Fan, M., Wang, W., Yang, W. & Liu, J. in Proceedings of the 28th ACM international conference on multimedia. 2317–2325.
Land, E. H. An alternative technique for the computation of the designator in the retinex theory of color vision. Proceedings of the national academy of sciences 83, 3078–3080 (1986).
Jobson, D. J., Rahman, Z. & Woodell, G. A. Properties and performance of a center/surround retinex. IEEE Trans. Image Process. 6, 451–462 (1997).
Article ADS CAS PubMed MATH Google Scholar
Jiang, Y. et al. Enlightengan: deep light enhancement without paired supervision. IEEE Trans. Image Process. 30, 2340–2349 (2021).
Article ADS PubMed MATH Google Scholar
Martell, M. J., Sammarco, J. J. & Macdonald, B. D. Effects of light spectrum on luminance measurements in underground coal mines. IEEE Trans. Ind. Appl. 55, 6670–6677 (2019).
Article CAS MATH Google Scholar
Buchsbaum, G. A Spatial processor model for object colour perception. J. Franklin Inst. 310, 1–26 (1980).
Article MATH Google Scholar
Chen, Z., Wang, Y., Yang, Y. & Liu, D. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7180–7189.
Zhang, L. et al. in Proceedings of the 27th ACM international conference on multimedia. 1623–1631.
Zhu, A. et al. in. IEEE International Conference on Multimedia and Expo (ICME). 1–6 (IEEE). (2020).
Guo, C. et al. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1780–1789.
Li, C., Guo, C. & Loy, C. C. Learning to enhance low-light image via zero-reference deep curve Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 44, 4225–4238 (2021).
MATH Google Scholar
Berman, D. & Avidan, S. in Proceedings of the IEEE conference on computer vision and pattern recognition. 1674–1682.
Fattal, R., Single Image & Dehazing ACM Transaction on Graphics (TOG). vol-27, Issue-3, August (2008).
Tan, R. T. in 2008 IEEE conference on computer vision and pattern recognition. 1–8 (IEEE).
Cai, B., Xu, X., Jia, K., Qing, C. & Tao, D. Dehazenet: an end-to-end system for single image haze removal. IEEE Trans. Image Process. 25, 5187–5198 (2016).
Article ADS MathSciNet MATH Google Scholar
Zhang, H. & Patel, V. M. in Proceedings of the IEEE conference on computer vision and pattern recognition. 3194–3203.
Li, B., Peng, X., Wang, Z., Xu, J. & Feng, D. An all-in-one network for dehazing and beyond. arXiv preprint arXiv:1707.06543 (2017).
Ying, Z., Li, G. & Gao, W. A bio-inspired multi-exposure fusion framework for low-light image enhancement. arXiv preprint arXiv:1711.00591 (2017).
Guo, X., Li, Y. & Ling, H. L. I. M. E. Low-light image enhancement via illumination map Estimation. IEEE Trans. Image Process. 26, 982–993 (2016).
Article ADS MathSciNet MATH Google Scholar
Du, Q. et al. Intelligent detection method of working personnel wearing safety helmets in underground mines. J. Mine Autom. 49, 134–140 (2023).

Download references

Funding

This work was supported in part by the Research Foundation of Hunan Province Innovation Foundation for Postgraduate, China Grant CX20240826.

Author information

Authors and Affiliations

School of Resources Environment and Safety Engineering, University of South China, Hengyang, 421001, Hunan, China
Qing Du, Shihao Zhang, Zhipeng Wang, Jincheng Liang & Shijiao Yang

Authors

Qing Du
View author publications
Search author on:PubMed Google Scholar
Shihao Zhang
View author publications
Search author on:PubMed Google Scholar
Zhipeng Wang
View author publications
Search author on:PubMed Google Scholar
Jincheng Liang
View author publications
Search author on:PubMed Google Scholar
Shijiao Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

Q. D.: Writing – original draft, Validation, Methodology, Formal analysis, Conceptualization, Founding. Z. W. and S.Z. review the main manuscript text. J. L. prepared figures 2. S. Y.: Writing – review & editing, Data, Methodology.All authors reviewed the manuscriptFormal analysis, Conceptualization, Founding.

Corresponding author

Correspondence to Shijiao Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Du, Q., Zhang, S., Wang, Z. et al. A hybrid zero-reference and dehazing network for joint low-light underground image enhancement. Sci Rep 15, 10135 (2025). https://doi.org/10.1038/s41598-025-95366-3

Download citation

Received: 12 December 2024
Accepted: 20 March 2025
Published: 24 March 2025
DOI: https://doi.org/10.1038/s41598-025-95366-3

Subjects

Abstract

Similar content being viewed by others

A depth iterative illumination estimation network for low-light image enhancement based on retinex theory

Comparative analysis of dehazing algorithms on real-world hazy images

Zero-shot learning enables instant denoising and super-resolution in optical fluorescence microscopy

Related work

Images of underground mines dataset, IUMD

Low light enhancement

Image dehazing

Method

Framework overview

Low light enhancement module

Exposure control loss

Color constancy loss

Illumination smoothness loss

Total loss

Image dehazing module

Experiment

Experiments settings

Ablation study

Contrast experiments

Object detection experiment

Performance metrics

Discussion and analysis

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links