Introduction

Hazy weather is a common atmospheric phenomenon that adversely effects on both human activities and machine vision systems. When captured in foggy conditions, images generally appear visually hazy and blurred. Such degraded images are unsuitable for applications that necessitate accurate environmental information for safe operation1, such as unmanned aerial drones, autonomous vehicles, and intelligent infrastructure. As a result, there has been a growing interest in single-image haze removal as a critical low-level computer vision task in recent years. The atmospheric scattering model provides a simplified approximation for the haze imaging process, which is written as:

$$\begin{aligned} I(x) = J(x)t(x) + A[1 - t(x)] \end{aligned}$$
(1)

To restore the haze-free image, the atmospheric scattering model can be converted into the following form:

$$\begin{aligned} J(x)= & A + \frac{{I\left( x \right) - A}}{{t\left( x \right) }} \end{aligned}$$
(2)
$$\begin{aligned} t(x)= & {e^{ - \beta d(x)}} \end{aligned}$$
(3)

where I(x) and J(x) denote the haze image and clear image respectively; t(x) is the transmission, \(\beta\) is the scattering coefficient of the atmosphere, d(x) is the scene depth; A represents the global atmosphere light value.

The main challenge in atmospheric scattering model-based haze restoration algorithms is how to estimate the transmission map t(x) and atmospheric light A accurately. Existing approaches generally calculate the intermediate parameters via the prior knowledge2,3,4. However, these methods are limited as prior knowledge can not accurately describe the characteristics of the scenes, which will lead to inaccurate estimation of the parameters, resulting in the issues of blurring and color distortion in outputs.

Recent years have witnessed the significant success of convolutional neural networks (CNN) in high-level computer vision applications5,6,7. Building on the prior work, Cai et al.8 introduced a CNN-based dehazing algorithm. Since then, CNN has become the predominant method in the field of image haze removal9,10,11. While efforts have been made to improve performance, there are still some limitations in current dehazing methods. Firstly, many studies prioritize performance over computation overhead, leading to the algorithms that may be too computationally intensive for practical deployment. Secondly, training deeper networks requires more advanced numerical instability avoidance techniques. Thirdly, unlike high-level visual tasks such as image segmentation and object detection that extract semantic features through CNN, image dehazing requires prediction of fine-grained pixel-level details, making direct use of state-of-the-art (SOTA) CNN suboptimal.

Fig. 1
Fig. 1
Full size image

The dehazing results of the proposed method. (a) Aerial haze images; (b) The enhanced results of RKNet.

Actually, compared with RGB space, YUV space can provide a more accurate description of haze degradation characteristics12. To address the aforementioned issues, we first analyze the haze distribution and explore the feasibility of the dehazing strategy in YUV space. Then, we integrate the Runge-Kutta (RK) method-inspired scheme into haze removal network designs and name the model as RKNet. Experimental results on benchmarks demonstrate that RKNet outperforms existing SOTA methods, achieving a superior balance between computing cost and performance. Figure 1 presents the dehazing results of RKNet on LHID dataset13, and it can be seen that compared with the original haze image, the reconstructed image has a significant improvement in visual clarity. To our best knowledge, this is the first attempt to directly incorporate an RK method-inspired scheme into the design of image dehazing network.

The main contributions of this study are summarized as follows:

  • We explore the distribution characteristics of haze in YUV space and propose a joint processing method for Y-channel dehazing and UV-channel enhancement.

  • Inspired by the Runge-Kutta method, we introduce a novel aerial image dehazing network named RKNet, which achieves a remarkable balance between performance and computational cost.

  • Extensive experiments are performed on both synthetic and real-world haze images, demonstrating the superiority of the proposed RKNet over the current SOTAs.

Related work

Image haze removal

Single image haze removal as a classical computer vision task has garnered significant interest from researchers. Existing dehazing algorithms can be broadly categorized into two groups: prior knowledge-driven methods and CNN-based methods.

Prior knowledge-driven methods

The prior knowledge-driven methods are the reverse procedure of the atmospheric scattering model described as Eq. (2). They estimate the necessary parameters in atmospheric scattering model by using the prior assumptions which are derived from the statistical characteristics of the haze images. He et al.2 established the dark channel prior (DCP) theory, which is built on a statistical observation that there are some dark pixels close to zero in at least one color channel within the non-sky area of an outdoor haze-free image. Bui et al.3 introduced the color ellipsoid prior (CEP), where they construct color ellipsoids that are statistically fitted to haze pixel clusters in RGB space. They then calculate the transmission values based on the geometry of these color ellipsoids. Yadav et al.14 proposed a hazy image restoration method, which estimates the approximate depth map of hazy images to derive scene-specific dark channels and transmissions, incorporates histogram normalization for post-processing to enhance image quality. Kaur et al.4 proposed the gradient channel prior (GCP), which estimates the transmission map via the image gradient. Liu et al.15 presented a straightforward method for estimating the transmission map based on the rank-one prior (ROP). Following the proposed ROP theory, they16 took the foreground and the background into consideration, developing ROP\(^+\) to further improve the performance of the algorithm.

Although these algorithms can improve the clarity of haze images to some extent, there are still some issues worth considering. Prior knowledge-driven methods rely on fixed assumptions grounded in mathematical statistics, which may not be effective in some real-world scenarios. In addition, inaccurate parameter estimation will lead to artifacts, halos, and color distortion in the restored results.

CNN-based methods

Early CNN-based dehazing algorithms still rely on the atmospheric scattering model17,18,19. These methods employ CNN to estimate the transmission map and atmospheric light value from haze images, which are then used to inversely infer the model and obtain clear restored images. For instance, Cai et al.8 proposed a dehazing network embedded with bilateral rectified linear units to estimate the medium transmission map. Ren et al.20 developed a coarse-to-fine multiscale deep neural network to predict the refined transmission map. Su et al.21 proposed a fusion network integrating a physical model and image translation for image dehazing, incorporating the estimation of the transmission map through a deep network to guide image translation. Zhang et al.22 introduced a pyramid densely connected transmission map estimation network based on generative adversarial networks (GAN). However, the transmission map is susceptible to noise interference, which will weaken the robustness of the algorithms.

To address the issue, researchers are starting to focus on end-to-end image dehazing methods that directly establish the mapping between haze images and clear images without relying on the physical model. Qin et al.23 introduced a feature attention module (FA) that combines channel attention with pixel attention mechanism, embedding it into the dehazing network. Wu et al.9 proposed an autoencoder-like dehazing network inspired by contrastive learning. Wang et al.24 introduced an unsupervised contrastive learning paradigm called UCL-Dehaze for image dehazing, which leverages unpaired real-world hazy and clean images through adversarial training and a new pixel-wise self-contrastive perceptual loss. Bai et al.11 extracted feature information from the haze image itself to guide network training. Su et al.25 proposed an ent-to-end dehazing network with a parameter-shared architecture that trains on synthetic and real haze images simultaneously, using multi-prior pseudo clean image and a physical-model-guided domain transfer mechanism to reduce domain gaps. Although CNN-based haze removal methods have shown some performance success, the SOTA designs heavily rely on empirical structure, involving numerous attempts and tricks. Image enhancement as a foundational low-level vision task, hinges on recovering fine-grained details like edges, contours, and textures obfuscated by haze, and over-reliance on abstract high-level features undermines sharpness and clarity, making preservation of shallow low-level features critical for visually faithful results26,27. We thus consider that simplifying the dehazing algorithm into a low-level work would be a beneficial approach.

ODE-inspired network design

With the advancement of CNN, scholars have started to explore the theoretical properties of CNN from the perspective of ordinary differential equations (ODE). Weinan et al.28 were the first to observe the connection between residual network (ResNet)29 and ODE. They demonstrated that deep neural networks can be treated as discrete dynamical systems, and established the similarities between ResNet and the discretization of ODE. Chang et al.30 went beyond mere explanation and utilized numerical ODEs to construct reversible neural networks while conducting stability analysis. Inspired by the linear multi-step method for solving ODEs, Lu et al.31 proposed a linear multi-step architecture (LM-architecture). They integrated the LM-architecture into ResNet and ResNeXt32, and observed significantly improved accuracy compared to the original network on public benchmarks.

Fig. 2
Fig. 2
Full size image

Comparisons of clear image and RS haze image in RGB and Y, U, V spaces. (a) Clear image in RGB space; (b) Y component of clear image; (c) U component of clear image; (d) V component of clear image; (e) Haze image in RGB space; (f) Y component of haze image; (g) U component of haze image; (h) V component of haze image.

In the field of low-level image translation, recent studies33,34,35,36 have established a connection between ResNet and the explicit/implicit Euler method approximation of ODE. However, the research on ODE-based dehazing algorithms is still in its infancy, and there are numerous issues that warrant further exploration. Given that image haze removal is a low-level computer vision task with the constraint of the highly similar structure between haze and haze-free images, it often relies on intuitive approaches. Building on the inspiration from the aforementioned work, we introduce a simple yet effective image dehazing algorithm.

The proposed method

The progress in single image dehazing can be attributed to the advancements in deep learning, which provides a powerful framework for nonlinear calculations in haze removal algorithms. In general, CNN-based algorithms consist of multiple cascaded layers and can effectively transform complex information flow after iterative training. In a dynamical system, there exists a fixed rule that describes the evolution of input states in a geometric space over time. Similarly, the adaptively selected layers in a CNN can be treated as time nodes in a dynamical system, with the final output state constrained by the ground truth (GT). Previous work28 has characterized this as a controllable issue and conducted a simplified analysis of the low-dimensional version, concluding that a solution can be obtained using numerical methods if the issue is sufficiently smooth. Encouraged by this, we apply the Runge-Kutta (RK) method to our haze removal network design from the perspective of the dynamical system. In this section, we will first analyze the distribution of haze in YUV space and then discuss the details of RKNet.

YUV space analysis and the improved strategy

In accordance with the prescribed guidelines outlined in the ITU-R BT.601-7 standards established by the Radio Communication Division of the International Telecommunication Union (ITU-R), the conversion from RGB to YUV in full range37 is outlined as follows:

$$\begin{aligned} Y= & 0.299*R+0.587*G+0.114*B \end{aligned}$$
(4)
$$\begin{aligned} U= & \frac{B-Y}{1.772}=-0.169^{*}R-0.331^{*}G+0.500^{*}B \end{aligned}$$
(5)
$$\begin{aligned} V= & \frac{R-Y}{1.402}=0.500^{*}R-0.419^{*}G-0.081^{*}B \end{aligned}$$
(6)

Recent research has proved that YUV space can provide a more effective way for image dehazing12. Specifically, they separated the luminance (Y) and chrominance (UV) components and found that the energy of the haze is concentrated in Y channel, while UV channel is less affected. Figure 2 presents a comparison of indicators in RGB and YUV space for RS haze images from the SateHaze1k dataset. In the upper and lower rows, from left to right, are the RGB space views of clear and haze images respectively, along with the corresponding decomposed views of the Y, U, and V components. It can be observed that the RGB haze images suffer from low contrast and local texture blur. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) between the clear and degraded images are 13.31dB and 71.92\(\%\), respectively. Converting the images to the YUV space and computing objective quality metrics for each component reveals that the pixel and structural differences caused by haze are predominantly manifested in the Y component. The PSNR and SSIM for the Y component are 14.67dB and 75.02\(\%\), respectively. Through visual comparison, it can be seen that the brightness of the Y channel in haze images is higher, presenting an overall misty appearance, while the UV color channels are less affected, evidenced by their higher objective metric values.

Fig. 3
Fig. 3
Full size image

Demonstration of color quantification differences. (a) Clear image; (b) Haze image; (c) Concatenated image; (d) Color quantization of clear image; (e) Color cluster of clear image; (f) Color quantization of haze image; (g) Color cluster of haze image; (h) Color quantization of concatenated image; (i) Color cluster of concatenated image.

On the basis of the preceding analysis, the method38 only remove the haze in Y channel and retain the UV channel of the original image. Such Y-channel-based strategy can accurately extract the haze distribution via the variations of luminance component, achieving superior dehazing performance. However, existing Y-channel dehazing algorithms are not the optimal strategies. In detail, enhancing only Y channel and directly inheriting UV channel information from the original image may disrupt the synergy between Y, U, V channels. That will weaken the chromaticity feature and compress the image’s contrast.

When we concatenate Y channel of the clear image presented in Fig. 3 (a) with the UV channel of the haze image presented in Fig. 3 (b) and convert it to an RGB image, we obtain a low-colorful haze-free image shown in Fig. 3 (c). It is evident that Fig. 3 (a) and Fig. 3 (c) have significant differences. To further visualise the distances, we perform color quantization on them and applied K-means clustering on the quantized results to extract the main 15 color points of the images. Compared to Fig. 3 (d), Fig. 3 (f) and Fig. 3 (h) have a more concentrated quantization distribution due to lower color richness and contrast. To further discuss the impacts of haze, we conduct a mathematical analysis of the atmospheric scattering model to elucidate the degradation characteristics observed in Y channel and UV channel. We consolidate Eqs.(4)-(6) into the matrix expression form:

$$\begin{aligned} \begin{bmatrix}Y\\ U\\ V\end{bmatrix}=\begin{bmatrix}0.299& 0.587& 0.114\\ -0.169& -0.331& 0.500\\ 0.500& -0.419& -0.081\end{bmatrix}\begin{bmatrix}R\\ G\\ B\end{bmatrix} \end{aligned}$$
(7)

The expression of Eq. (1) can be reformulated as:

$$\begin{aligned} I_{c}=J_{c}t+A_{c}(1-t),c\in \{R,G,B\} \end{aligned}$$
(8)

As the scattering coefficient of the atmosphere \(\beta\), exhibits minimal variation across channels, we can effectively disregard the distance component and consider t (i.e., \(e^{-{\beta }d}\)) as a constant. In the case of a color-balanced haze image, where the principle \(A_R\approx A_G\approx A_B\) is consistently observed39, upon substituting Eq. (7) into Eq. (8), we can derive the mapping relationship by rearranging the expressions:

$$\begin{aligned} \begin{aligned}I_Y&=(0.299J_R+0.587J_G+0.114J_B)t+A(1-t)\\ &=J_Yt+A(1-t),\\I_U&=(-0.169J_R-0.331J_G+0.500J_B)t{=}J_Ut,\\I_V&=(0.500J_R-0.419J_G-0.081J_B)t=J_Vt.\end{aligned}\end{aligned}$$
(9)

Eq. (9) reveals that the atmospheric light A is exclusively associated with the luminance component Y, while the chrominance components U and V are subject to linear attenuation by the transmission map t and have no direct dependence on A. This result arises naturally from the mathematical derivation which is rooted in color space transformations and the atmospheric scattering model rather than being an a priori assumption. It aligns with the physical nature of haze: atmospheric light primarily contributes to the visual haze-like characteristics such as overall brightness obscuration and lacks significant chromaticity2, and this is consistent with the observed behavior of chrominance components in hazy scenes as visualized in Figure 2. We then depict the conversion process between haze image \(I_{YUV}\) and its corresponding clear image \(J_{YUV}\) as follows:

$$\begin{aligned} \begin{aligned} I_{_{YUV}}=[I_{_Y},I_{_U},I_{_V}]=[J_{_Y}t+A(1-t),J_{_U}t,J_{_V}t]=J_{_{YUV}}t+[A(1-t),0,0] \end{aligned} \end{aligned}$$
(10)

As discussed, haze-induced alterations in luminance are global, driven by atmospheric light, whereas chrominance degradation stems from transmission-dependent attenuation. This analysis underscores that an effective YUV-based dehazing algorithm must address haze interference in the Y channel while accounting for the collaborative dynamics across the three channels.

RK method inspired dehazing network

As illustrated in Fig. 4, the proposed RKNet consists of two parts: haze removal and chrominance enhancement. The haze removal sub-net utilizes a pyramid structure, performing 2x upsampling and 2x downsampling on the original luminance component \(I_Y\in \mathbb {R}^{H\times W\times 1}\), that aims to capture both local and global distribution features of the haze in Y domain. We embed the RK3 block in each scale branch to calculate the mapping relationship between state nodes in latent space, following which the pixel attention module23 is employed to adaptively allocate weights for the output status of RK3 block. Finally, the feature information flow from the three branches is integrated using element-wise addition, resulting in the haze-free map \(J^{\prime }_{Y}\in \mathbb {R}^{H\times W\times 1}\).

Fig. 4
Fig. 4
Full size image

The architecture of the proposed RKNet.

For the attenuated chrominance component \(I_{UV}\in \mathbb {R}^{H\times W\times 2}\), if we directly splice it with \(J^{\prime }_{Y}\), the channels feature may not be well fused. The converted RGB image will suffer from color degradation and visually tend towards greyscale. To address the issue, we introduce the chrominance enhancement (CE) block to balance the chromaticity component. Specifically, we embed the RK3 block and the channel attention module23 into CE block to dynamically assign the weight ratios for the associated representations. The output of CE block is marked as \(J^{\prime }_{UV}\in \mathbb {R}^{H\times W\times 2}\), and we finally splice it with \(J^{\prime }_{Y}\) and convert the spliced result back to RGB space. Overall, the proposed RKNet has a significantly concise structure without redundant branches. Such single-channel haze removal strategy greatly reduces the computational complexity of the algorithm and presents a promising direction for future lightweight dehazing research.

Converting RK method to component block

The explicit RK method is a common approach used in numerical ODE, the following formulas can describe arbitrary s stages of it mathematically:

$$\begin{aligned} y_{k+1}=y_{k}+\sum _{i=1}^{s}c_{i}G_{i} \end{aligned}$$
(11)
$$\begin{aligned} \left\{ \begin{matrix} {{G}_{1}}=hf\left( {{x}_{k}},{{y}_{k}} \right) \\ {{G}_{i}}=hf\left( {{x}_{k}}+{{\lambda }_{i}}h,{{y}_{k}}+\sum \limits _{j=1}^{i-1}{{{\beta }_{ij}}{{G}_{j}}} \right) \\ \sum \limits _{i=1}^{s}{{{c}_{i}}=1,\quad \sum \limits _{j=1}^{i-1}{{{\beta }_{ij}}={{\lambda }_{i}}}} \\ \end{matrix} \right. \end{aligned}$$
(12)

where G is the adaptively selected nonlinear representation unit; h is the stride length; c, \(\beta\) and \(\lambda\) are the constants, and can be calculated by the Taylor-expansion formula.

In the Runge-Kutta framework, higher-order methods generally reduce local truncation errors by incorporating more intermediate computations but at the expense of increased computational load40,41. The 3rd-order Runge-Kutta (RK3) method emerges as an optimal compromise. It balances the need for accuracy, offering substantially lower truncation errors than lower-order alternatives, with computational efficiency, avoiding the excessive computational demands of higher-order schemes. This balance makes RK3 particularly well-suited for applications where both precision and computational feasibility are critical. Based on Eq. (11), the mathematical expression of the RK3 method can be formulated as:

$$\begin{aligned} {{y}_{k+1}}={{y}_{k}}+\frac{1}{6}{{G}_{1}}+\frac{2}{3}{{G}_{2}}+\frac{1}{6}{{G}_{3}} \end{aligned}$$
(13)
$$\begin{aligned} \left\{ \begin{matrix} {{G}_{1}}=hf({{x}_{k}},{{y}_{k}}) \\ {{G}_{2}}=hf({{x}_{k}}+\frac{h}{2},{{y}_{k}}+\frac{1}{2}{{G}_{1}}) \\ {{G}_{3}}=hf({{x}_{k}}+h,{{y}_{k}}-{{G}_{1}}+2{{G}_{2}}) \\ \end{matrix} \right. \end{aligned}$$
(14)

We transform Eqs.(13, 14) into an intuitive network module. As depicted in Fig. 5, we denote the input state at the current moment as \(y_k\), using \(y_{k+1}\) to represent the output state after a series of numerical computations. The specific structure of G will be further discussed in section "Discussion on the structure of G".

Fig. 5
Fig. 5
Full size image

The framework of RK3 block based on the 3rd-order explicit Runge-Kutta method. G1, G2 and G3 are the pre-defined G.

Overall loss function

To optimize RKNet, we employ a combination of the \(\mathcal {L}_{1}\) loss and the contrastive regularization loss9 as the objective function. \(\mathcal {L}_{1}\) loss is commonly employed in low-level vision tasks, given N training sampling \(\{{I}_i,{J}_i\}_{i=1}^N\), we describe it as:

$$\begin{aligned} { \mathcal {L}_1}=\dfrac{1}{N}\sum _{i=1}^N\Vert J_{i}-Net(I_{i})\Vert \end{aligned}$$
(15)

Furthermore, we incorporate the contrastive regularization loss at the perceptual level to enhance the network’s capability in extracting latent representations. Mathematically, it can be expressed as:

$$\begin{aligned} { \mathcal {L}_{cr}}=\sum \limits _{i=1}^{M}{{{\alpha }_{i}}F\left\{ \Phi _i \left( I \right) ,\Phi _i \left( J \right) ,\Phi _i \left[ Net(I) \right] \right\} } \end{aligned}$$
(16)
$$\begin{aligned} F=\frac{D\left\{ \Phi (J),\Phi \left[ Net(I) \right] \right\} }{D\left\{ \Phi (I),\Phi \left[ Net(I) \right] \right\} } \end{aligned}$$
(17)

here \(\Phi\) denotes the fixed pre-trained VGG19 model adopted to extract the perceptual features; M is the number of the hidden layer; \(\alpha _i\) is the trade-off weights; D(xy) indicates the \(\mathcal {L}_{1}\) distance between x and y.

Based on the above considerations, the total loss function of RKNet is formulated as the following:

$$\begin{aligned} \mathcal {L}_T={\lambda _1}{\mathcal {L}_{1}}+{\lambda _2}{\mathcal {L}_{cr}} \end{aligned}$$
(18)

here \(\lambda _1\), \(\lambda _2\) are the hyperparameters for balancing the pixel loss and contrastive regularization loss.

Experiments

Implementation details

In the training stage, we adopt 5000 pairs of synthetic haze images from RESIDE42 as the training data. All the inputs of the network are cropped to a size of \(256\times 256\times 3\), and their pixel values are normalized. The training process is performed using PyTorch 1.9.0 on an NVIDIA GeForce RTX2070 GPU, and the network is trained for a total of 100 epochs. To optimize RKNet, we set \(\lambda _1\) and \(\lambda _2\) to 0.7 and 0.3, respectively. The trade-off weights \(\alpha _i\) are determined using the approach proposed in9. We utilize the Adam optimizer with an initial learning rate of \(1\times 10^{-3}\) to update the network parameters throughout the training process. It’s worth noting that the conversion from RGB to YUV spaces is fully reversible, ensuring the valuable feature of the input is preserved.

Fig. 6
Fig. 6
Full size image

The visual comparison on aerial haze images43. CEEF44, BCDP45 and FADE46 are the traditional dehazing methods; PMD-Net47, T-Net48 and IHRNet49 are the learning-based dehazing methods.

Comparison with SOTA algorithms

Evaluating on synthetic datasets

We compare the proposed RKNet with several SOTA haze removal algorithms involving CEEF44, BCDP45, FADE46, PMD-Net47, T-Net48, and IHRNet49. The experiments are performed on two synthetic hazy benchmarks, including RICE43 and RESIDE42. The visual comparison on general daytime haze images is presented in Fig. 6. It is evident that CEEF44 exhibits underexposure, BCDP45 exposes color distortion issues in the sky area, and the results of FADE46 appear visually dimmed. As the learning-based methods, PMD-Net47, T-Net48, and IHRNet49 fail to effectively eliminate the haze effect.

Figure 6 provides the visual comparison on aerial haze images. It is evident that previous algorithms have primarily focused on general scenes, which may not be well-suited for aerial haze images. CEEF44 is capable of eliminating the haze effect; however, excessive enhancement leads to a lack of visual hierarchy in the results. The contrast of BCDP’s results45 is suppressed, and the outputs of FADE46 lose many color features. PMD-Net47 and IHRNet49 exhibit varying degrees of residual haze in a visually grey-masked manner. T-Net48 performs poorly on aerial haze images, displaying evident color shift issues in its outputs. The visual comparison on general daytime haze images is presented in Fig. 7. It is evident that CEEF44 exhibits underexposure, BCDP45 exposes color distortion issues in the sky area, and the results of FADE46 appear visually dimmed. As the learning-based methods, PMD-Net47, T-Net48, and IHRNet49 fail to effectively eliminate the haze effect. Compared to these algorithms, RKNet can effectively eliminate haze interference while preserving the original brightness and color feature of the images. That is credited to the reasonable design of the improved dehazing strategy in YUV space.

Fig. 7
Fig. 7
Full size image

The visual comparison on daytime haze images42. Please zoom in for a better illustration.

Table 1 The comparison of the metrics on aerial hazy dataset.
Table 2 The comparison of the metrics on daytime hazy dataset.

To evaluate the performance of the algorithms, three evaluation metrics are utilized: PSNR, SSIM, and CIEDE2000. These metrics are responsible for measuring the pixel error, structure error, and color error of the images, respectively. Higher values of PSNR and SSIM, and lower values of CIEDE2000 indicate better algorithm performance. The metrics on RICE43 as shown in Table 1. And the results of objective metrics on RESIDE42 are presented in Table 2. For comparison purposes, the best metrics are marked in bold and the second-best are underlined. Remote sensing images generally exhibit complex and diverse texture patterns, which makes the task of aerial image haze removal more challenging. Comparing Table 1 and Table 2, we notice that the overall performance of the algorithms on RICE is relatively lower. For T-Net48, the metrics are not satisfactory due to its limited generalisation ability. In contrast, RKNet demonstrates better adaptability to various scenarios and achieves optimal performance in the quantitative evaluation.

Evaluating on real-world haze images

Despite being trained on synthetic datasets, RKNet demonstrates excellent generalization performance on real-world haze images. Figure 8 showcases the visual comparison of the algorithms. Although CEEF44, BCDP45, and FADE46 can partially remove haze, their over-enhancement tends to distort color feature of the outputs. In the bottom row of Fig. 8, T-Net48 and IHRNet49 expose aberrant color blocks in the sky region, along with residual haze remaining. In comparison, RKNet not only comprehensively eliminates haze but also generates visually-natural results with fewer artifacts.

Fig. 8
Fig. 8
Full size image

The visual comparison on real-world haze images42.

Table 3 The comparison of model complexity.

Time complexity analysis

In this section, we conduct experiments to analyze the time complexity of the methods. Table 3 presents the multiply–accumulate operations (MACs) and the number of parameters of the algorithms. Among the comparison models, RKNet has the lowest value of MACs and the fewest number of parameters. To further evaluate the computational efficiency, we then perform time-consuming experiment on a PC with an Intel(R) Core (TM) i5-9400 CPU@2.90GHz, 16GB RAM, and GPU@NVIDIA GeForce RTX2070. As is depicted in Table 4, we calculate the average time-consuming by the methods in handling 540p, 720p, and 1080p haze images. One can see that compared with the prior algorithms, RKNet demonstrates higher computational efficiency.

Table 4 Comparison of time-consuming for different image sizes (in second).

Discussion on the structure of G

As the adaptively selected nonlinear representation unit, there is a considerable space for exploring the structure of G. In this section, we aim to explore the performance of RKNet by experimenting with different designs of G.

Figure 9 illustrates the three structural schemes we considering for the ablation study. Each scheme consists of two convolutional layers, and the structure of the versions is defined by adjusting the position of the convolution and activation function. To determine the optimal strategy, we evaluate the candidates and calculate the objective indicators based on RESIDE42. As shown in Table 5, RKNet(v3) achieves the best metrics while maintaining the same computational complexity and number of parameters. Therefore, we incorporate G-v3 as the adaptively selected nonlinear representation unit in RKNet.

Fig. 9
Fig. 9
Full size image

The candidate structures of G, where ’Conv’ denotes the convolutional layer and ’PReLU’ represents the activation function.

Table 5 Metrics comparison of the candidate structures for G.
Fig. 10
Fig. 10
Full size image

The visual comparison of the dehazing strategies. (a) Haze images. (b) Only processing Y-channel. (c) Co-processing YUV channel. (d) Separate processing Y-channel and UV-channel.

Discussion on the dehazing strategy in YUV space

To explore the optimal dehazing strategy in YUV space, we conduct the ablation experiment in this section. Specifically, we mainly focus on these candidate schemes: 1) Enhancing Y-channel using the haze removal network and inheriting the UV-channel from the inputs. 2) Training the haze removal network to directly reconstruct haze-free images by taking Y, U, V channels as inputs. 3) Enhancing Y-channel using the haze removal network and refining the UV-channel via CE block, and then doing channel splicing.

Figure 10 presents the visual comparison of the candidate strategies. In Fig. 10 (b), when we exclusively remove the haze in the Y-channel while directly inheriting the UV-channel from the original images, the visual results appear dim and colorless. Figure 10 (c) indicates that by cooperatively processing the Y, U, V channels as inputs for the haze removal network, the brightness features are compromised, leading to visually darker outputs. Figure 10 (d) demonstrates that the strategy of independently enhancing the Y and UV channels can generate visually more natural results with richer color information. We employed BIQI, NIQE, and SSEQ as objective metrics to quantitatively analyze the average performance of the three schemes on 200 real haze images, where lower metric values indicate better algorithmic performance. As shown in Table 6, the scheme that performs dehazing on the Y channel and collaborative enhancement on the UV channels achieved optimal performance in quantitative evaluations.

Table 6 Metrics comparison of the candidate schemes.
Fig. 11
Fig. 11
Full size image

The structure of the ODE-inspired components. (a) Residual block. (b) RK2 block.

Comparative analysis of ODE-inspired component

To investigate the influence of different ODE-inspired architectures on network outputs, we conduct the ablation study by individually incorporating two more candidate blocks into the backbone model. As shown in Fig. 11 (a), the Residual block is a classic residual structure interpretable as a discrete approximation of ODE; the RK2 block’s structure, presented in Fig. 11 (b), is constructed based on the second-order Runge-Kutta method to model continuous feature evolution via iterative updates.

Visual comparisons of these ODE-inspired components on real haze images are provided in Fig. 12. Specifically, Fig. 12 (b) shows that the network with embedded Residual blocks fails to effectively mitigate the effects of haze, resulting in visually obscure and blurry outputs. In Fig. 12 (c), although the network with embedded RK2 blocks reduces haze to some extent, the corresponding outputs exhibit noticeable artifacts, particularly in sky regions. Figure 12 (d) demonstrates that the proposed RK3 module not only effectively removes haze but also balances the synergistic relationship between brightness and chromaticity, yielding visually more natural reconstructed results.

Fig. 12
Fig. 12
Full size image

Visual comparison of the components with different ODE-inspired structures in real-world haze images. (a) Haze images. (b) Results of the backbone model with embedded Residual blocks. (c) Results of the backbone model with embedded RK2 blocks. (d) The results of the backbone model with embedded RK3 blocks.

Extension applications

Application in object detection

In hazy weather, the low-quality images captured by imaging facilities seriously affect the accuracy of high-level vision algorithms. Here, we present an experimental analysis to demonstrate the effectiveness of incorporating RKNet in improving the accuracy of object detection algorithm under hazy conditions. To conduct this experiment, we utilized the popular object detection algorithm YOLOv750 as the baseline for our evaluation.

Fig. 13
Fig. 13
Full size image

Effect of RKNet on the performance of YOLOv750. (a) Detection results of the haze images. (b) Detection results of the clear images reconstructed by RKNet.

Figure 13 (a) shows the object detection results in hazy scenes, while Fig. 13 (b) displays the detection results after being processed by RKNet. It is observed that the number of detected objects increased noticeably, indicating that the removal of haze boosted the visibility and made previously undetectable objects discernible. The results are particularly significant in high-density haze regions, where objects were previously completely obscured by the haze. Furthermore, the probabilities for dehazed objects exhibits overall improved, reflecting enhanced confidence of the algorithm in the detected object categories. This enhancement can be attributed to RKNet effectively eliminating the visual artifacts caused by haze, resulting in more accurate object detection outcomes. The experimental results underscore the potential of RKNet in addressing the challenges posed by hazy weather conditions in real-world object detection applications.

Fig. 14
Fig. 14
Full size image

The reconstructed examples of sandstorm images39 using the fusion strategy, where (a) are the original sandstorm images, (b) are the color balanced images and (c) are the clear images enhanced by RKNet.

Sandstorm image reconstruction

Fusion strategy51 has been proven to be feasible for sandstorm image enhancement task. Following that line, we conduct a simple color equalization pre-processing51 and explore the feasibility of utilizing RKNet as the dust removal module in fusion strategy-based sand-dust image enhancement method. Given the sandstorm images as shown in Fig. 14 (a), we can see from Fig. 14 (b) that the color equalization method effectively balances the hue of the images and eliminates the interference of sand, but the dust still exists. In Fig. 14 (c), RKNet can further improve the clarity of the images by eliminating dust interference while retaining the local details.

Underwater image enhancement

Although the proposed algorithm is designed for haze image reconstruction, it has potential applications in complex underwater environments. As shown in Fig. 15, we provide several enhanced examples of underwater images. The top row is the original underwater images and the bottom row is the enhanced results using the proposed method. It is evident that RKNet successfully eliminates the haziness in the images and restores the color characteristics of the scene. From the aesthetic perspective, RKNet can effectively improve the visual quality of the degraded underwater images.

Fig. 15
Fig. 15
Full size image

The enhanced examples of underwater images52 using RKNet, where (a) are the underwater images and (b) are the clear images reconstructed by RKNet.

Conclusion

In this paper, we first analysed the weaknesses in current YUV-based dehazing methods and introduced an improved haze removal strategy in YUV space. Following the optimised strategy, we proposed a novel image dehazing network called RKNet, which consists of Y domain haze removal and UV domain chrominance enhancement two parts. Based on the 3rd-order Runge-Kutta method, we designed a RK3 block and integrated it into RKNet to reduce parameter redundancy and computational costs. The experimental results on synthetic and real-world benchmarks demonstrate that RKNet outperforms the SOTA algorithms both qualitatively and quantitatively. Moreover, the proposed algorithm has the potential to improve the performance of object detection model and also generalizes well in other low-level image processing tasks, such as sandstorm image reconstruction and underwater image enhancement.