Mamba-convolution hybrid network for underwater image enhancement

Chen, Hailan; Wang, Yijian; Wu, Lihua; Hu, Hao; Yan, Jiaquan; Xu, Haiping; Lei, Guowei

doi:10.1038/s41598-025-15404-y

Download PDF

Article
Open access
Published: 30 August 2025

Mamba-convolution hybrid network for underwater image enhancement

Hailan Chen¹,
Yijian Wang^2,3,
Lihua Wu⁴,
Hao Hu²,
Jiaquan Yan²,
Haiping Xu^2,5 &
…
Guowei Lei¹

Scientific Reports volume 15, Article number: 31975 (2025) Cite this article

1842 Accesses
Metrics details

Subjects

Abstract

Underwater imagery frequently exhibits low clarity and is subject to significant color distortion as a result of the inherent conditions of the marine environment and variations in illumination. Such degradation in image quality fundamentally undermines the efficacy of marine ecological monitoring and the detection of underwater targets. To address this issue, we present a Mamba-Convolution network for Underwater Image Enhancement (MC-UIE). Concretely, we first use a standard convolution layer with a 3$\times$3 kernel to obtain initial image feature maps. Then, we develop an iterable Mamba-Convolution Hybrid Block (M-C HB) to enhance the global and local dependency of image feature maps. The core of the M-C HB is the 2D Selective Scan (SS2D) and Feature Attention Module (FAM), which can more efficiently learn the global and local dependency of images. After that, a Cross Fusion Mamba Block (CFMB) is designed to fuse image feature maps of different levels. Finally, extensive qualitative and quantitative experiments on mainstream datasets demonstrate that the proposed method significantly outperforms existing methods in color, illumination, and detail restoration. Our code and results are available at: https://github.com/WYJGR/MC-Net/.

A generative adversarial network with multiscale and attention mechanisms for underwater image enhancement

Article Open access 22 January 2025

MHF-UIE a multi-task hybrid fusion method for real-world underwater image enhancement

Article Open access 24 May 2025

Underwater image enhancement using colour balancing and morphological residual processing through gamma correction

Article Open access 14 January 2026

Introduction

As human exploration and resource development of the oceans continue to expand, underwater robots have become a popular field^1,2. However, the complex wavelength degradation seriously hinders the application of underwater robots in ocean exploration^3,4, due to images obtained by underwater robots showing severe degradation, such as color casts, blurry structures and details, and poor visibility.

In the underwater environment, where the light captured by an imaging camera is primarily composed of three components: the direct component (light reflected from the target without scattering), the forward scattering component (light reflected from the target scattered at a small angle), and the backward scattering component (light reflected from floating particles). Underwater images are regarded as a linear combination of these three components. The forward scattering component blurs underwater image structures, while the backward scattering masks edges and details. Meanwhile, color casts of underwater images are mainly derived from different absorption rates of different wavelengths. Different wavelengths of visible light are attenuated at different rates. Specifically, the red light disappears fastest due to its longest wavelength or minimal energy. The blue or green light exhibits the opposite behavior. Consequently, captured underwater images commonly present bluish or greenish tones.

To improve the visual quality of underwater images, numerous Underwater Image Enhancement (UIE) methods have been proposed. Fu et al.⁵ proposed a lightweight UIE method based on probabilistic networks, which enhanced underwater images by learning the enhancement distribution of degraded underwater images. Li et al.⁶ proposed a lightweight convolutional neural network (CNN) for UIE, and reconstructed clear underwater images based on the prior knowledge of underwater scenes. Wang et al.⁷ designed the UIEC$\hat{\,}$2-Net driven by both RGB and HSV color spaces, which performed denoising and color correction in the RGB color space, and adjusted brightness and saturation in the HSV color space. Li et al.⁸ presented a UIE network (Ucolor) via medium transmission-guided multi-color space embedding. This method integrated a multi-color space encoder and a medium transmission-guided decoder, in which incorporated characteristics of different color spaces into a unified structure and select representative color features to enrich the network input. Although these above-mentioned methods achieve obvious enhancement improvement, their enhancement performance is limited due to the limited receptive field of CNN architecture. To address this issue, we present a novel Mamba-Convolution network for UIE (MC-UIE), where the Mamba architecture provides efficient global dependencies, while the CNN architecture offers efficient local dependencies. Concretely, we use a standard convolution layer with a 3$\times$3 kernel to obtain initial image feature maps. Then, we develop an iterable Mamba-Convolution Hybrid Block (M-C HB) to enhance the global and local dependency of image feature maps. After that, a Cross Fusion Mamba Block (CFMB) is designed to effectively fuse the image feature maps of different levels. Finally, we also utilize a standard convolution layer with a 3$\times$3 kernel to obtain the enhanced underwater image.

The main contributions of our work are summarized as follows:

1.
We develop a Mamba-Convolution network for UIE (MC-UIE), which effectively combines the advantages of Mamba and CNN architectures for learning enhancement mapping between raw underwater images and clear underwater images.
2.
We design an iterable Mamba-Convolution Hybrid Block (M-C HB) and a Cross Fusion Mamba Block (CFMB), where M-C HB provides richer local dependency information than the naive Mamba module, while CFMB offers a more effective feature maps fusion of different levels.
3.
Extensive qualitative and quantitative experiments on mainstream datasets demonstrate the impressive performance of our proposed method.

Related works

CNN-based UIE methods

These CNN-based UIE methods are characterized by linear computational complexity and general modeling capabilities. Fu et al.⁹ introduced an Unsupervised Underwater Image Restoration method (USUIR) that utilizes the relationship between a raw underwater image and a re-degraded version of the same image. Fu et al.¹⁰ proposed a UIE method utilizing normalization schemes across both spatial and channel dimensions, called SCNet, to enhance image quality while addressing the diverse degradation caused by water. Li et al.8 presented a multi-color space embedding network guided by medium transmission (Ucolor), which successfully improves the visual quality of underwater images by leveraging the advantages of multi-color space embedding. Zhou et al.¹¹ developed a Hybrid Contrastive Learning Regularization Network (HCLR-Net) for UIE, which is based on a unique hybrid contrastive learning regularization approach that allows the network to establish a more resilient sample distribution. Zhang et al.¹² proposed a Synergistic Multiscale Detail Refinement via Intrinsic Supervision (SMDR-IS) for UIE, which addresses the limited scale-related features in current UIE methods. However, UIE methods based on CNN are limited in modeling long-range dependencies, resulting in poor generalizability.

Transformer-based UIE methods

These Transformer-based UIE methods achieve excellent global enhancement capabilities with self-attention mechanisms and image serialization as their core. Huang et al.¹³ proposed a Swin Transformer module based on adaptive group attention (AGA) for UIE, which can dynamically select visually complementary channels based on the dependencies, reducing the number of further attention parameters. Peng et al.¹⁴ proposed a U-shape transformer and a multi-color space loss function that combined the advantages of RGB, LCH, and LAB color spaces to enhance the underwater contrast and saturation. Ren et al.¹⁵ designed a U-Net-based reinforced Swin-Convs Transformer for simultaneous underwater image enhancement and superresolution. Shen et al.¹⁶ proposed a dual-attention transformer-based UIE method to address the characteristics of uneven degradation and loss of color channels in underwater images. Transformer-based UIE methods excel at modeling long-range dependencies, which often involve a large number of parameters and complex self-attention mechanisms. This poses challenges for efficiency due to the quadratic computational complexity relative to image size.

Mamba-based UIE methods

The Mamba architecture based on state space models (SSM) excels at modeling long distances while maintaining linear complexity, which enables the Mamba-based UIE method to better map the original underwater image to the reference underwater image. As a pioneering work, Chen et al.¹⁷ proposed the first Mamba network for UIE, which is characterized by extreme lightweight. Guan et al.¹⁸ proposed a Mamba network with linear computational complexity for UIE to address the non-uniform degradation and color channel loss in underwater image processing. Lin et al.¹⁹ presented a PixMamba based on the patch-level Efficient Mamba Net (EMNet) and the pixel-level PixMamba Net (PixNet) to improve the overall clarity, color balance, and global consistency of underwater images. An et al.²⁰ proposed a hybrid network called UWMamba that integrates SSM and convolution for enhancing underwater images. Dong et al.²¹ proposed a novel UIE framework called O-Mamba to deal with the cross-color channel dependency problem in underwater images caused by the differential attenuation of light wavelengths. Although these Mamba-based methods attain significant improvements, the 2D selective scan (SS2D) used in these methods cannot fully consider the spatial information of the image, resulting in limited performance.

In contrast, the proposed MC-UIE is designed to extract richer local dependency information by combining the Mamba module and convolution-based feature attention module.

Preliminaries

The core of Mamba architecture is the State Space Model (SSM) which is inspired by continuous linear time-invariant (LTI) systems. The SSM is renowned for its powerful sequence-to-sequence modeling capabilities, which maps a 1D input signal $x(t) \in \mathbb {R}$ to a 1D output signal $y(t) \in \mathbb {R}$ through an N-D latent state $h(t) \in \mathbb {R}^N$, as follows:

$$\begin{aligned} h^{\prime }(t)&= A h(t)+B x(t), \end{aligned}$$

(1)

$$\begin{aligned} y(t)&= C h(t), \end{aligned}$$

(2)

where $A \in \mathbb {R}^{N \times N}$, $B \in \mathbb {R}^{N \times 1}$, $C \in \mathbb {R}^{1 \times N}$ are parameters of neural networks in deep learning. To deal with the discrete input $X \in (x_0,x_1,...)$, such as images, the Eq. 2 can be discretized by the zero-order hold:

$$\begin{aligned} \overline{\textbf{A}}&= \exp (\Delta \textbf{A}), \end{aligned}$$

(3)

$$\begin{aligned} \overline{\textbf{B}}&= (\Delta \textbf{A})^{-1}(\exp (\Delta \textbf{A})-\textbf{I}) \cdot \Delta \textbf{B}, \end{aligned}$$

(4)

where $\Delta$ is the step size. After that, the Eq. 2 can be reformulated as:

$$\begin{aligned} h_t&= \overline{\textbf{A}} h_{t-1}+\overline{\textbf{B}} x_t, \end{aligned}$$

(5)

$$\begin{aligned} y_t&= C h_t. \end{aligned}$$

(6)

Then SSM can be efficiently computed by RNN, Eq. 2 also be reformulated and computed as a convolution:

$$\begin{aligned} \begin{aligned}&\overline{\textbf{K}}=(\textbf{C}\overline{\textbf{B}},\textbf{C}\overline{\textbf{A}\textbf{B}},\ldots ,\textbf{C}\overline{\textbf{A}}^{L-1}\overline{\textbf{B}}),\\&y=x*\overline{\textrm{K}}, \end{aligned} \end{aligned}$$

(7)

where L is the length of the image sequence x, $\overline{\textbf{K}}$ represents the SSM convolution kernel, and $*$ denotes convolution operation.

Proposed method

The overall architecture of the proposed Mamba-Convolution hybrid network for Underwater Image Enhancement (MC-UIE) is shown in Fig. 1. MC-UIE is composed of two 3$\times$ 3 convolution layers and seven Mamba-Convolution Hybrid Blocks (M-C HB). In Fig. 1a, the output size is shown around each module. MC-UIE is a multi-scale encoder-decoder network with skip connections by Cross Fusion Mamba Blocks (CFMB) across encoder-decoder layers at different levels. This multi-scale encoder-decoder network was inspired by the UNet architecture, which allows the model to effectively capture both local fine details and global contextual cues, which are crucial for restoring realistic underwater scenes. Specifically, we adopt a four-scale structure to progressively extract features from high to low resolution. This configuration enables sufficient receptive field expansion and hierarchical feature learning, while avoiding excessive downsampling that may lead to spatial detail loss. Compared to shallower designs, it improves the model’s ability to handle varying degradation patterns. On the other hand, introducing more scales would inevitably increase model complexity and computational cost, with diminishing returns in feature utility due to over-compression or over-aggregation. Thus, the adopted scale number represents a balanced trade-off between enhancement effectiveness and computational feasibility, supporting the overall goal of practical and robust underwater image enhancement. In addition, the proposed CFMB is specifically designed for the fusion of shallow spacial and deep semantic features, dynamically aggregating multi-scale features for image restoration. Given an underwater image, we first use a 3 $\times$ 3 convolution layer (Conv3) to build initial image feature maps. Then, Mamba-Convolution Hybrid Blocks (M-C HB) are used iteratively to enhance the global and local dependency of image feature maps. After up sampling, we use CFMB to fusion image feature maps of encoder-decoder layers at different levels. At last, the enhanced image is produced by a Conv3 module.

In what follows, we detail key components of MC-UIE, including the Mamba-Convolution Hybrid Block (M-C HB), Cross Fusion Mamba Blocks (CFMB), and loss function, respectively.

Mamba-convolution hybrid block

The naive mamba block has limitations when applied to image processing tasks, as its local dependency modeling is inferior to the convolutional architecture²². In order to make mamba blocks more effectively applied to image processing images, we propose a Mamba-Convolution Hybrid Block (M-C HB), as shown in Fig. 1b. The proposed Mamba-Convolution Hybrid Block (M-C HB) integrates the global modeling capability of mamba blocks and the local modeling capability of the feature attention module (FAM)²³. The executive process of M-C HB can mathematically express as:

$$\begin{aligned} X&=Linear(x), \end{aligned}$$

(8)

$$\begin{aligned} z&=FAM(SiLU(X)), \end{aligned}$$

(9)

$$\begin{aligned} t&=LN(SS2D(SiLU(Conv3(X)))), \end{aligned}$$

(10)

$$\begin{aligned} y&=t \otimes z, \end{aligned}$$

(11)

where $Linear(\cdot )$ is the linear projection layer, $Conv3(\cdot )$ represents the convolution operation with a kernel size of 3$\times$3, $SiLU(\cdot )$ is the Sigmoid Linear Unit, and $FAM(\cdot )$ is the Feature Attention Module²³, as shown in Fig. 1c. The $FAM(\cdot )$²³ treats different features and pixel regions unequally, which can provide additional flexibility in dealing with different types of information, which can be formulated as follows:

$$\begin{aligned} t&= x \otimes Conv3(ReLU(Conv3(AP(x))))\end{aligned}$$

(12)

$$\begin{aligned} y&=t \otimes Conv3(ReLU(Conv3(t))) \end{aligned}$$

(13)

where the x is input feature maps, AP denotes an average pooling layer, $ReLU(\cdot )$ is the Rectified Linear Unit, and $\otimes$ denotes element-wise multiplication. SS2D is the naive 2D selective scan, as shown in Fig. 1d.

Cross fusion mamba block

The naive mamba block has limitations when applied to image processing tasks, as its local dependency modeling is inferior to the convolutional architecture²². In previous literature²⁴, the 1 $\times$1 convolution layer was often used to fuse image feature maps of encoder-decoder layers at different levels. Although this fusion strategy is fast, it cannot fully utilize the image feature maps of different levels. Therefore, we propose a Cross Fusion Mamba Block (CFMB) to effectively fuse the image feature maps of different levels, as shown in Fig. 1e. Concretely, we first adopt a random channel mix (RCM) on $F_1$ and $F_2$, which does not require additional parameters or computational operations, to achieve a fast exchange of features, as expressed:

$$\begin{aligned} RCM(F_1, F_2)&= \left\{ \begin{array}{ll} exchange(F^i_1,F^i_2) & \text{ if } M(B, C) = 0 \\ F^i_1,F^i_2 & \text{ if } M(B, C) = 1 \end{array}\right. , \end{aligned}$$

(14)

where M(B, C) denotes the 0-1 mask used for channel exchange, B is the batch size, and C refers to channels. $M(B, C)=0$ indicates no exchange, while $M(B, C)=1$ indicates to exchange feature channel of $F^i_1$ and $F^i_2$. The superscript i is a random index of the feature maps. The core idea of RCM is to introduce a stochastic yet structure-preserving perturbation by randomly shuffling feature channels across groups. This randomness improves the robustness of the model by encouraging it to rely on more global and discriminative cues rather than fixed channel dependencies. The RCM helps the model explore diverse channel-wise interactions between the two inputs, leading to more effective feature fusion and better recovery of semantic and structural information from underwater images. The exchanged features then undergo processing through their respective M-C HB. After repeating the above steps, we concatenate these feature maps and then utilize a 1 $\times$ 1 Convolution layer to obtain the desired number of channels.

Loss function

We use the L1 loss function $L_{1}$ to measure the difference between the enhanced underwater image x and the reference underwater image y as follows:

$$\begin{aligned} L_{1}=\frac{1}{HW}\sum _{i,j}^{H,W}|y_{i,j}-x_{i,j}|, \end{aligned}$$

(15)

where H and W represent the height and width of an underwater image.

Experimental results

To demonstrate the superior performance of our MC-UIE, we present the qualitative evaluation, quantitative assessment, ablation study, complexity comparison, and application test.

Experiment settings

Implementation details

We implement the proposed MC-UIE on PyTorch 2.1 framework with an Intel(R) i9-12900K CPU, 64GB of RAM, and an NVIDIA RTX 4090 GPU. We set the learning rate to 10$\phantom{0}^{-4}$ and use the Adam optimizer for network optimization. The model is trained for 50 epochs with a batch size of 4, and all input images are resized to $256\times 256$.

Compared methods

We compared the proposed method with eight leading UIE methods including two physical-based methods (BRUIE²⁵ and HLRP²⁶), two CNN-based methods (USUIR⁹ and PUIE-Net⁵), one Transformer-based method (U-Shape¹⁴), three Mamba-based methods (WaterMamba¹⁸, UWMamba²⁰, and PixMamba¹⁹) For a fair comparison, we employ the source code provided by the authors, retrain each method on our training set, and save the best enhancement results.

Benchmark datasets

We utilize two publicly available real-world UIE datasets ( LSUI¹⁴, and UIEB²⁷ ) for training and testing. The LSUI dataset provides a large collection of real underwater images across diverse scenarios, making it suitable for training data-driven enhancement models. The UIEB dataset, on the other hand, includes real underwater images along with a curated subset of reference images, facilitating both full-reference quality evaluations. For training, we adopt 3794 images from the LSUI¹⁴ dataset and 800 images from the UIEB²⁷ dataset. For testing, we use the rest 485 images from the LSUI¹⁴ dataset (Test-485) and the rest 90 images from the UIEB²⁷ dataset (Test-90).

Table 1 The average PSNR²⁸, SSIM²⁹, UCIQE³⁰, and UIQM³¹ of different methods on TEST-485 and TEST-90. The best performances are marked in bold, respectively.

Full size table

Table 2 The average PSNR²⁸ and SSIM²⁹ scores from the ablation study on TEST-90. The best performance is marked in bold.

Full size table

Evaluation metrics

We employ two full-reference and two non-reference metrics to measure the performance of different methods on Test-485, and Test-90. Two widely used full-reference image quality metrics are the Peak Signal-to-Noise Ratio (PSNR)²⁸ and the Structural Similarity Index Measure (SSIM)²⁹. PSNR quantifies the pixel-level differences between the enhanced image and the ground truth, while SSIM evaluates perceptual similarity by incorporating luminance, contrast, and structural information. In both cases, higher scores indicate better fidelity to the reference image in terms of both content and structure. Meanwhile, two non-reference metrics, the Underwater Color Image Quality Evaluation (UCIQE)³⁰ and the Underwater Image Quality Measure (UIQM)³¹, are used to assess enhanced underwater image quality without reference images. UCIQE measures image quality based on colorfulness, sharpness, and contrast, while UIQM integrates colorfulness, sharpness, and contrast into a single quality score. For both metrics, higher values suggest improved perceptual quality in underwater environments.

Qualitative evaluation

We show visual comparisons of different UIE methods on Test-485 and Test-90, as shown in Figs. 2 and 3. Raw underwater images are shown in Fig. 2a and 3b. As shown in Figs. 2b, 3b, 2c, and 3c, BRUIE²⁵ and HLRP²⁶ handle various color casts, but enhanced images of BRUIE²⁵ tend to lack authentic underwater color, while HLRP²⁶ introduces noticeable over-brightness. USUIR⁹ improves the contrasts of the image, but its images exhibit unnatural color balance in Figs. 2d and 3d. PUIE-Net⁵ and U-Shape¹⁴ effectively improve the visibility of underwater images but its results show poor clarity, as evidenced in Figs. 2e, 3e, 2f and 3f. As shown in Figs. 2g, 3g, 2h, and 3h, WaterMamba¹⁸ and UWMamba²⁰ significantly restore the details of low-light underwater images, but WaterMamba¹⁸ fail to eliminate color cast, while UWMamba²⁰ introduces local color cast. PixMamba¹⁹ achieves satisfactory visual results in restoring underwater image visibility and eliminating color casts, but it also results in local color casts as shown in Figs. 2i and 3i. In contrast, in Figs. 2j and 3j, the proposed MC-UIE effectively handles color casts, restores image details, and improves image visibility. To sum up, our MC-UIE can effectively handle both conventional degraded underwater images and extremely degraded low-light underwater images.

Quantitative assessment

We evaluate the performance of different UIE methods using PSNR²⁸, SSIM²⁹, UCIQE³⁰ , and UIQM³¹ on Test-485 and Test-90, as shown in Table 1. For PSNR²⁸ and SSIM²⁹, the proposed MC-UIE yields the best scores, which indicates that our results closely resemble the reference images in terms of both content and structure. For UCIQE³⁰, MC-UIE achieves the best score, which demonstrates that the proposed approach mitigates non-uniform color bias, reduces blurriness, and enhances contrast. Besides, MC-UIE obtains the best UIQM³¹ scores, which indicate exceptional performance in terms of colorfulness, sharpness, and contrast enhancement.

Ablation study

As shown in Fig. 4, we analyze the importance of M-C HB and CFMB module through comprehensive ablation studies on bluish, greenish, yellowish, and low-visibility degeneration scenes, including 3 settings: 1) replacing the M-C HB with a naive mamba block (Our-settingI), 2) replacing the CFMB with a 1 $\times$ 1 convolution layer (Our-settingII), 2) replacing the L1 loss function with an L2 and structural similarity loss function in⁸ (Our-settingIII), 3) full method (Our). Our-settingI underperforms in mitigating color casts and visibility. Our settingII improves color and increases visibility but fails to fully restore color casts. Our settingIII restores more image details but fails to efficiently remove color casts. Our full method achieves true-to-life colors while enhancing sharpness and visibility. Furthermore, we show a quantitative comparison in Table 2. As shown, there is a significant improvement in PSNR²⁸ and SSIM²⁹ scores from Our-setting I to Our-settingII. Compared with Our-settingII and Our-settingIII, our full method (Our) achieves the highest PSNR²⁸ and SSIM²⁹ scores, underscoring the superior performance of our MC-UIE in restoring underwater content and structure.

Table 3 The complexity comparison of different methods on a 256 $\times$ 256 underwater image. The best performance is marked in bold.

Full size table

Application tests

We demonstrate the utility of our MC-UIE on several underwater application tests, including depth estimation, edge detection, keypoint detection, saliency detection, and image segmentation. We employ the non-local prior³⁷ for underwater depth estimation, utilize the Canny operator³⁸ for underwater edge detection, employ the SIFT keypoint detection³⁹ for underwater keypoint detection, adopt BASNet⁴⁰ for underwater saliency detection, and apply a superpixel-based clustering algorithm⁴¹ for underwater image segmentation. We evaluate the performance of underwater application tests of different methods using the following metrics: absolute relative error (REL) for depth estimation³², average precision (AP) for edge detection³³, the number of detected keypoints for keypoint detection³⁴, mean absolute error (MAE) for saliency detection³⁵, and global consistency error (GCE) for image segmentation³⁶. A lower REL³² score reflects more accurate depth predictions, with fewer discrepancies from the reference depth map. Conversely, a higher AP³³ score suggests that the detected edges align more closely with the ground truth edges. Similarly, a lower MAE³⁵ score indicates that the predicted saliency map deviates less from the ground truth mask. For image segmentation, a lower GCE³⁶ score means the predicted segmentation results are more consistent with the reference annotations. All results are shown in Fig. 5. Compared to other competitors, our enhanced results achieve more accurate depth maps, indicating the superiority of our MC-Net on restoring more reliable depth structures. Moreover, our MC-UIE yields more edge and keypoint numbers, which suggests that our MC-Net recovers richer local details and sharper structures. compared to other methods, the segmentation results of our MC-Net are more consistent and accurate, and the saliency maps by our MC-Net contain more salient objects and better boundaries. These results suggest that MC-UIE can effectively boost underwater image segmentation and saliency detection.

Complexity analysis

We compare the complexity of all UIE methods, including the running time (s), trainable parameters (M), and FLOPs (G). The results are shown in Table 3. It can be seen that our MC-UIE has relatively few parameters, FLOPs, and running time. WaterMamba¹⁸ has the fewest parameters, but its FLOPs are relatively high and its running time (0.0244) is slower than our MC-UIE (0.0126). Although USUIR⁹ has fewer FLOPs and running time, its enhancement performance is worse than our MC-UIE. Comprehensively, our MC-UIE can achieve the best enhancement performance with relatively less complexity and shorter running time.

Conclusion

This work proposes a Mamba-Convolution network for Underwater Image Enhancement (MC-UIE). MC-UIE inherits the powerful global dependencies modeling of Mamba architecture and the local dependencies modeling of Convolution architecture to improve enhancement performance. In order to obtain richer global and local dependencies of image features, a Mamba-Convolution Hybrid Block (M-C HB) is proposed, which integrates the global modeling capability of mamba blocks and the local modeling capability of the CNN-based feature attention module. Moreover, a Cross Fusion Mamba Block (CFMB) is proposed to fuse image feature maps of encoder-decoder layers at different levels. Final experiments show the superior performance of our MC-UIE in restoring color, illumination, and detail of underwater images.

Data availability

The dataset used in this article is publicly available and can be accessed at https://li-chongyi.github.io/proj_benchmark.html and https://github.com/LintaoPeng/U-shape_Transformer_for_Underwater_Image_Enhancement.

References

Chen, W., Cai, B., Zheng, S., Zhao, T. & Gu, K. Perception-and-cognition-inspired quality assessment for sonar image super-resolution. IEEE Trans. Multimed. 10, 1–13 (2024).
CAS Google Scholar
Chen, W. et al. Reference-free quality assessment of sonar images via contour degradation measurement. IEEE Trans. Image Process. 28, 5336–5351 (2019).
Article ADS MathSciNet PubMed Google Scholar
Xu, X. et al. Exploring underwater image quality: A review of current methodologies and emerging trends. Image Vis. Comput. 14, 105389 (2024).
Google Scholar
Wang, M., Zhang, K., Wei, H., Chen, W. & Zhao, T. Underwater image quality optimization: Researches, challenges, and future trends. Image Vis. Comput. 146, 104995 (2024).
Article Google Scholar
Fu, Z., Wang, W., Huang, Y., Ding, X. & Ma, K. Uncertainty inspired underwater image enhancement. In Proc. Eur. Conf. Comput. Vis. (ECCV) 10, 465–482 (2022).
Google Scholar
Li, C., Anwar, S. & Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 98, 107038 (2020).
Article Google Scholar
Wang, Y., Guo, J., Gao, H. & Yue, H. Uiec 2-net: Cnn-based underwater image enhancement using two color space. Signal Process Image Commun. 96, 116250 (2021).
Article Google Scholar
Li, C. et al. Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Trans. Image Process. 30, 4985–5000 (2021).
Article ADS PubMed Google Scholar
Fu, Z. et al. Unsupervised underwater image restoration: From a homology perspective. Proc. AAAI Conf. Artif. Intell. (AAAI) 36, 643–651 (2022).
Google Scholar
Fu, Z., Lin, X., Wang, W., Huang, Y. & Ding, X. Underwater image enhancement via learning water type desensitized representations. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2764–2768 (2022).
Zhou, J. et al. HCLR-Net: Hybrid contrastive learning regularization with locally randomized perturbation for underwater image enhancement. Int. J. Comput. Vis. 1–25 (2024).
Zhang, D., Zhou, J., Guo, C., Zhang, W. & Li, C. Synergistic multiscale detail refinement via intrinsic supervision for underwater image enhancement. Proc. AAAI Conf. Artif. Intell. 38, 7033–7041 (2024).
Google Scholar
Huang, Z., Li, J., Hua, Z. & Fan, L. Underwater image enhancement via adaptive group attention-based multiscale cascade transformer. IEEE Trans. Instrum. Meas. 71, 1–18 (2022).
Google Scholar
Peng, L., Zhu, C. & Bian, L. U-shape transformer for underwater image enhancement. IEEE Trans. Image Process. 32, 3066–3079 (2023).
Article ADS PubMed Google Scholar
Ren, T. et al. Reinforced swin-convs transformer for simultaneous underwater sensing scene image enhancement and super-resolution. IEEE Trans. Geosci. Remote Sens. 60, 1–16 (2022).
Google Scholar
Shen, Z., Xu, H., Luo, T., Song, Y. & He, Z. Udaformer: Underwater image enhancement based on dual attention transformer. Comput. Graphics 111, 77–88 (2023).
Article Google Scholar
Chen, Z. & Ge, Y. Mambauie &sr: Unraveling the ocean’s secrets with only 2.8 flops. ArXiv arXiv:2404.13884 (2024).
Guan, M. et al. Watermamba: Visual state space model for underwater image enhancement. ArXiv arXiv:2405.08419 (2024).
Lin, W.-T., Lin, Y.-X., Chen, J.-W. & Hua, K.-L. Pixmamba: Leveraging state space models in a dual-level architecture for underwater image enhancement. In Proc. Asi. Conf. Comput. Vis. (ACCV), 176–191 (2025).
An, G., He, A., Wang, Y. & Guo, J. Uwmamba: Underwater image enhancement with state space model. IEEE Signal Process. Lett. 31, 2725–2729 (2024).
Article Google Scholar
Dong, C., Zhao, C., Cai, W. & Yang, B. O-mamba: O-shape state-space model for underwater image enhancement. ArXiv arXiv:2408.12816 (2024).
Yu, W. & Wang, X. Mambaout: Do we really need mamba for vision? ArXiv (2024).
Qin, X., Wang, Z., Bai, Y., Xie, X. & Jia, H. Ffa-net: Feature fusion attention network for single image dehazing. Proc. AAAI Conf. Artif. Intell. (AAAI) 34, 11908–11915 (2020).
Google Scholar
Zamir, S. W. et al. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 5728–5739 (2022).
Zhuang, P., Li, J. & Chongyi, W. Bayesian retinex underwater image enhancement. Eng. Appl. Artif. Intell. 101, 10244 (2021).
Article Google Scholar
Zhuang, P., Wu, J., Porikli, F. & Li, C. Underwater image enhancement with hyper-laplacian reflectance priors. IEEE Trans. Image Process. 31, 5442–5455 (2022).
Article ADS PubMed Google Scholar
Li, C. et al. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 29, 4376–4389 (2020).
Article ADS Google Scholar
Korhonen, J. & You, J. Peak signal-to-noise ratio revisited: Is simple beautiful? In Fourth International Workshop on Quality of Multimedia Experience, 37–38 (2012).
Horé, A. & Ziou, D. Image quality metrics: Psnr vs. ssim. In International Conference on Pattern Recognition, 2366–2369 (2010).
Yang, M. & Sowmya, A. An underwater color image quality evaluation metric. IEEE Trans. Image Process. 24, 6062–6071 (2015).
Article ADS MathSciNet PubMed Google Scholar
Panetta, K., Gao, C. & Agaian, S. Human-visual-system-inspired underwater image quality measures. IEEE J. Oceanic Eng. 41, 541–551 (2016).
Article ADS Google Scholar
Hambarde, P., Murala, S. & Dhall, A. UW-GAN: Single-image depth estimation and image enhancement for underwater images. IEEE Trans. Instrum. Meas. 70, 1–12 (2021).
Article Google Scholar
Jing, J., Liu, S., Wang, G., Zhang, W. & Sun, C. Recent advances on image edge detection: A comprehensive review. Neurocomputing 503, 259–271 (2022).
Article Google Scholar
Lowe, D. G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004).
Article Google Scholar
Qin, X. et al. BASNet: Boundary-aware salient object detection. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 7471–7481 (2019).
Lei, T. et al. Superpixel-based fast fuzzy C-Means clustering for color image segmentation. IEEE Trans. Fuzzy Syst. 27, 1753–1766 (2019).
Article Google Scholar
Berman, D., treibitz, T. & Avidan, S. Non-local image dehazing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1674–1682 (2016).
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. X8, 679–698 (1986).
Article Google Scholar
Lowe, D. G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004).
Article Google Scholar
Qin, X. et al. Basnet: Boundary-aware salient object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7471–7481 (2019).
Lei, T. et al. Superpixelbased fast fuzzy c-means clustering for color image segmentation. IEEE Trans. Fuzzy Syst. 27, 1753–1766 (2019).
Article Google Scholar

Download references

Acknowledgements

This work is supported by Natural Science Foundation of Fujian Province (2023J01807, 2023J05250), School Start-up Fund of Jimei University (ZQ2021028), 2024 University-level Special Project of Minjiang University (K-MJKJ24006), Fujian Association for Science and Technology (FJKX-2024XKB012), Fuzhou Science and Technology Project of China (2024-S-003), Science and Education Joint Special Project of Minjiang University (MJKJ24001).

Author information

Authors and Affiliations

School of Science, Jimei University, Xiamen, 361021, China
Hailan Chen & Guowei Lei
Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, School of Computer and Big Data, Minjiang University, Fuzhou, 350121, China
Yijian Wang, Hao Hu, Jiaquan Yan & Haiping Xu
School of Ophthalmology and Optometry, Wenzhou Medical University, Wenzhou, 325000, China
Yijian Wang
Fujian Earthquake Agency, Fuzhou, 350003, Fujian, China
Lihua Wu
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
Haiping Xu

Authors

Hailan Chen
View author publications
Search author on:PubMed Google Scholar
Yijian Wang
View author publications
Search author on:PubMed Google Scholar
Lihua Wu
View author publications
Search author on:PubMed Google Scholar
Hao Hu
View author publications
Search author on:PubMed Google Scholar
Jiaquan Yan
View author publications
Search author on:PubMed Google Scholar
Haiping Xu
View author publications
Search author on:PubMed Google Scholar
Guowei Lei
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceived and designed the experiments: H.C., L.W., J.Y., G.L., and H.X.; Performed the experiments: Y.W. and H.H.; Analyzed the data: H.C., L.W., J.Y., G.L., and H.X.; Wrote and reviewed the paper: H.C., Y.W., L.W., H.H., J.Y., G.L., and H.X.;

Corresponding authors

Correspondence to Jiaquan Yan or Haiping Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, H., Wang, Y., Wu, L. et al. Mamba-convolution hybrid network for underwater image enhancement. Sci Rep 15, 31975 (2025). https://doi.org/10.1038/s41598-025-15404-y

Download citation

Received: 27 January 2025
Accepted: 07 August 2025
Published: 30 August 2025
Version of record: 30 August 2025
DOI: https://doi.org/10.1038/s41598-025-15404-y

Subjects

Abstract

Similar content being viewed by others

A generative adversarial network with multiscale and attention mechanisms for underwater image enhancement

MHF-UIE a multi-task hybrid fusion method for real-world underwater image enhancement

Underwater image enhancement using colour balancing and morphological residual processing through gamma correction

Introduction

Related works

CNN-based UIE methods

Transformer-based UIE methods

Mamba-based UIE methods

Preliminaries

Proposed method

Mamba-convolution hybrid block

Cross fusion mamba block

Loss function

Experimental results

Experiment settings

Implementation details

Compared methods

Benchmark datasets

Evaluation metrics

Qualitative evaluation

Quantitative assessment

Ablation study

Application tests

Complexity analysis

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links