SMFR-Net: simple multi-domain flare removal network

Liu, Shaofeng; Chen, Guorong; Zhang, Weijie; Zhang, Qingru; Zhang, Jinmei; Wang, Jian

doi:10.1038/s41598-025-21378-8

Download PDF

Article
Open access
Published: 24 October 2025

SMFR-Net: simple multi-domain flare removal network

Shaofeng Liu¹,
Guorong Chen^1,2,
Weijie Zhang¹,
Qingru Zhang¹,
Jinmei Zhang¹ &
…
Jian Wang¹

Scientific Reports volume 15, Article number: 37251 (2025) Cite this article

1195 Accesses
Metrics details

Subjects

This article has been updated

Abstract

When strong light enters the lens, multiple internal reflections and scattering can cause flare, significantly degrading image quality and affecting the performance of downstream vision tasks. In practical photography, flare is often caused by multiple strong light sources, resulting in artifacts such as bright streaks or diffuse halos, which often cover large areas of the image. To effectively remove flare, the network needs to have a large receptive field. However, although the native Transformer architecture has global modeling capability, its computational complexity grows with the square of the image resolution, making it difficult to apply on resource-constrained devices. The windowed attention mechanism, as a compromise, improves computational efficiency but limits the receptive field to within the window, making it difficult to achieve true global perception. To address these issues, we propose a simple multi-domain image flare removal network-SMFR-Net, which achieves state-of-the-art (SOTA) performance with 7.981M parameters. Specifically, SMFR-Net consists of an encoder that jointly models the frequency and spatial domains, and a decoder with a simplified structure. The encoder first enhances global contextual awareness using a frequency domain module with Fourier Transform, then further expands the receptive field through a spatial domain module combined with multi-scale dilated convolutions, and introduces a Channel-Spatial Attention Mechanism to precisely locate the flare regions. The decoder, based on this, discards frequency domain modeling and simplifies the structure to reduce redundant computation. Furthermore, we design a structure-aware composite loss function for the network to improve overall performance. Experimental results show that SMFR-Net outperforms existing methods on the Flare7K++ real-world test set, synthetic test set, and several real-world scenes across most metrics, demonstrating superior flare removal performance and good application potential with its simple and efficient structure.

An image deblurring method using improved U-Net model based on multilayer fusion and attention mechanism

Article Open access 04 December 2023

SVTSR: image super-resolution using scattering vision transformer

Article Open access 30 December 2024

High-resolution single-photon imaging with physics-informed deep learning

Article Open access 22 September 2023

Introduction

In natural scene photography and computer vision perception, lens flare is a common and complex imaging degradation phenomenon, usually caused by strong light sources (such as the sun or artificial lights) entering the camera, resulting in unwantedreflections and scattering within the lens system. These interfering light rays do not participate in the normal imaging process but are projected onto the sensor surface along abnormal paths, disrupting the structure and brightness distribution of the image. Depending on the manifestation and cause of the flare in the image, it is typically classified into two types: stray flare and reflected flare¹. The former is usually caused by scattering phenomena due to dust, stains, or scratches on the lens surface, often appearing as bright streaks or overexposed regions extending along the light path; the latter is typically caused by multiple reflections within the lens group, forming bright spots with regular geometric shapes, such as polygonal halos or star-shaped light spots. Both types of flare interfere with the structural information of the image, degrade visual quality, and significantly affect downstream tasks such as semantic segmentation², object detection³, and monocular depth prediction⁴. As shown in Fig. 1, the presence of flare causes severe interference with downstream tasks: it induces significant structural recognition errors in semantic segmentation and results in a loss of depth information in depth estimation, underscoring the great importance of its effective suppression for downstream applications.

To mitigate the impact of lens flare on image quality, early research primarily focused on optical optimization at the hardware level. Camera systems typically introduce anti-reflective coatings (AR coatings) on optical element surfaces to reduce reflectivity, thereby suppressing multiple reflections and interference caused by strong light sources within the lens system. These coatings are based on the principle of phase-cancellation interference and can effectively reduce the intensity of reflected light within specific wavelength ranges, thus enhancing image contrast. However, AR coatings generally function only under specific incident angles and spectral bands, and their high cost limits large-scale deployment. Another common approach is the use of lens hoods or the optimization of lens barrel designs to physically block off-axis incoming light, thereby reducing the interference of stray light in the imaging process. These methods can partially suppress the occurrence of bright artifacts, but their effectiveness is constrained by scene composition and light source positions, making them less adaptable to complex and dynamic natural lighting conditions. Moreover, hardware-based approaches are essentially pre-capture suppression strategies, which are incapable of addressing flare artifacts in already captured images. As a result, they exhibit inherent limitations in practical applications.

To overcome the limitations of hardware-based methods in adaptability and post-processing, researchers have proposed a variety of software-based approaches for image flare removal. Most traditional methods adopt a two-stage strategy: first detecting potential flare regions in the image, followed by restoration and reconstruction of the affected areas. Early works typically relied on explicit modeling based on image brightness, shape, or spatial features. For example, Chabert et al.⁶ constructed candidate flare regions using multi-thresholding and contour feature extraction, and completed reconstruction via sample-based image inpainting; Vitoria et al.⁷ detected overexposed local features to generate masks for flare suppression; Asha et al.⁸ focused on strong highlights in the background caused by sunlight or flickering sources to formulate a targeted flare-filling strategy. Although such methods perform significantly well in handling flare with regular shapes or single-type artifacts, they rely on handcrafted features and struggle to handle real-world flare phenomena that are complex, asymmetric, and spatially variant.In addition, some approaches attempted to model the point spread function (PSF) and restore occluded regions via deconvolution⁹. However, these methods usually assume spatial invariance and circular symmetry of the flare patterns, which limits their applicability in practice. Due to the diversity of flare in terms of intensity, shape, and position, as well as the ambiguous boundaries between flare and naturally bright regions, traditional image-processing-based methods often suffer from high false detection rates and weak generalization, making them insufficient for robust image quality restoration in complex scenes.

In recent years, deep learning methods have achieved remarkable progress in image restoration and other visual tasks, providing new solutions to the problem of lens flare removal. Wu et al.¹⁰ combined physical modeling to synthesize the first training dataset for flare removal and proposed the SIFR method based on the U-Net architecture¹¹, enabling end-to-end training. However, due to the relatively simplified data generation rules, there exists a significant domain gap between the synthesized samples and real-world scenes, which limits the generalization capability of the model. To alleviate the difficulty of acquiring paired data, Qiao et al.¹² proposed an unsupervised generative training framework, which employs a dual-mask prediction mechanism to separately model the light source and flare regions, and incorporates light source information to guide the flare removal process. This enables effective training on unpaired data. Subsequently, Dai et al.¹ constructed the widely used nighttime flare removal dataset Flare7K, and further extended it to Flare7K++¹³ by incorporating real captured flare patterns, significantly enhancing the model’s adaptability to multiple types of strong scattering degradations. With the Transformer architecture¹⁴ and its variant Swin Transformer¹⁵ gaining widespread attention in image modeling tasks, a series of image restoration networks such as Uformer¹⁶ and Restormer¹⁷ have been developed. These methods have achieved outstanding performance in tasks such as image denoising, deraining, and general image restoration. Building upon this, Zhang et al. proposed FF-Former¹⁸, which introduces the Fast Fourier Convolution (FFC) module to construct Spatial Frequency Blocks (SFB). Through frequency-domain modeling, the method enhances the model’s ability to perceive global dependencies and effectively improves restoration quality in nighttime scenes with strong lens flare. In another direction, Kotp et al.¹⁹ proposed a two-stage architecture that integrates depth estimation and image restoration. They utilize the scene depth map predicted by the DPT network as structural guidance and feed it together with the input image into the Uformer¹⁶ network, thereby enhancing the model’s ability to distinguish between real image content and flare artifacts. This approach improves image reconstruction accuracy and generalization in real-world scenes, demonstrating the potential of depth-aware guidance in lens flare removal tasks. In addition to multimodal guidance, other advanced paradigms have also shown great potential in the field of image restoration. Among them, diffusion models, which have attracted significant attention in recent years²⁰, are increasingly being applied to low-level visual tasks due to their powerful generative priors and high-quality sample generation capabilities. For instance, WaveDM²¹ innovatively combines wavelet transforms with the diffusion process, enabling more effective restoration of image structure and texture details by denoising across different frequency sub-bands. Furthermore, advanced attention mechanisms have also demonstrated their importance in specific removal tasks. DeSeal²² serves as a case in point, designing a semantic-aware attention mechanism that can precisely locate and remove seals from document images while preserving the background content.

Although existing methods have achieved certain success, balancing large-scale flare modeling with computational efficiency remains a core challenge in current research. On one hand, Convolutional Neural Networks (CNNs), due to the limitation of local receptive fields, typically require deep stacking or multi-scale strategies to expand their perceptual range, which is inefficient and yields limited effectiveness when dealing with large-area, diffuse flares. On the other hand, some advanced architectures with powerful global modeling capabilities also face computational efficiency bottlenecks. For instance, diffusion models, which are based on an iterative sampling process, often require hundreds to thousands of inference steps to generate high-quality results, leading to extremely high inference latency. Meanwhile, the standard Transformer architecture¹⁴, despite its capability for single-step global modeling and long-range dependency capture, suffers from a self-attention mechanism with $O(N^2)$ quadratic computational complexity, making it difficult to apply to high-resolution images and severely restricting its practical deployment. To reduce computational costs, Swin Transformer¹⁵ restricts attention computation to local windows. While this alleviates the computational bottleneck to some extent, it also sacrifices the global receptive field, making it difficult to perform interaction and modeling across the entire image. To address these issues, this paper proposes a simple multi-domain flare removal network-SMFR-Net (Simple Multi-domain Flare Removal network). The model is designed with simplicity and efficiency in mind, and enhances the receptive field through collaborative modeling in both the frequency and spatial domains. While maintaining a parameter count of only 7.981M, it achieves high-quality image reconstruction and superior performance. The main contributions of this paper are as follows:

This paper proposes a structurally concise multi-domain architecture for image flare removal–SMFR-Net. The encoder integrates a Frequency Domain Modulation (FDM) module and a Multi-Scale Grouped Dilated Convolution (MGDC) module to achieve joint modeling of frequency and spatial domain features. Additionally, a lightweight Channel-Spatial Attention Module (CSAM) is designed to enhance the model’s responsiveness to flare regions. The decoder adopts an asymmetric structure and is simplified at this stage by retaining only the MGDC module and introducing components such as the Simple Channel Attention (SCA) module, effectively controlling model complexity.
A novel structure-aware composite loss function tailored for flare removal is proposed, which leads to significant improvements in quantitative evaluation metrics.
SOTA performance: Extensive benchmark results demonstrate that the proposed method outperforms existing approaches and exhibits strong generalization ability in real-world scenarios.

Methods

In this section, we provide a detailed introduction to the SMFR-Net we proposed. First, we present its overall encoder-decoder architecture. Then, we delve into the core building block of the network: the Simple Multi-domain Encoder Block (SMEBlock), specifically designed for flare removal, and its key components. Finally, we introduce the simplified decoder module,the Simple Multi-scale Decoder Block (SMDBlock), along with the composite loss function used for optimization.

Overall architecture

As shown in Fig. 2, SMFR-Net adopts a structurally simple encoder-decoder architecture and introduces a global residual learning strategy²³ to stabilize the training process and improve image reconstruction accuracy. Given an input flare image $I_{\text {flare}}$, the network learns a residual mapping function $F(\cdot )$, and the restored clean image $I_{\text {clean}}$ is defined as:

$$\begin{aligned} I_{\text {clean}} = I_{\text {flare}} + F(I_{\text {flare}}) \end{aligned}$$

(1)

Unlike traditional symmetric architectures, SMFR-Net adopts a task-oriented, modularly differentiated design in its encoder and decoder. The encoder focuses on enhancing feature representation capabilities, while the decoder is dedicated to efficient structural reconstruction and detail restoration, thereby achieving a good balance between model performance and computational efficiency. In the encoding path, SMFR-Net stacks multiple SMEBlocks to progressively extract features, with a bottleneck module placed in the middle of the backbone to integrate high-level semantic information. The decoding path, in turn, uses multiple SMDBlocks to gradually restore the spatial structure. Skip connections are introduced to fuse shallow and deep features, effectively mitigating the problems of gradient vanishing and information degradation. Finally, the output features are spatially upsampled via PixelShuffle operations to restore the original image resolution.

SMEBlock

This section mainly introduces the SMEBlock, the core unit in the encoding path of SMFR-Net (as shown in Fig. 3(a)). The SMEBlock is composed of three modules: FDM, MGDC, and CSAM. It adopts a two-stage design: first, it performs a preliminary extraction of global features in the frequency domain through the FDM module; then, it further extracts global features and local details in the spatial domain by combining the MGDC and CSAM modules.

Frequency Domain Modulation module (FDM)

The Fast Fourier Transform (FFT) maps an image from the spatial domain to the frequency domain, such that each frequency component contains the image’s global information. Therefore, FFT naturally possesses an infinite theoretical receptive field, which gives it great potential for modeling long-range dependencies and global artifacts (such as large-scale flare). To fully leverage the advantage of global perception in the frequency domain while ensuring model simplicity, we propose a Frequency Domain Modulation (FDM) module that operates entirely in the frequency domain and is applied directly to the input features. As shown in Fig. 3(c), this module aims to perform global modeling on the input features at a low computational cost. The core idea of FDM is to process only the magnitude spectrum, which primarily encodes content and contrast information, while keeping the phase spectrum unchanged to preserve the crucial structural and positional information. The module first applies a 2D Fast Fourier Transform to the input features $X \in \mathbb {R}^{B \times C \times H \times W}$ to obtain their complex frequency-domain representation:

$$\begin{aligned} F(x) = Me^{j\Phi } \end{aligned}$$

(2)

where M represents the magnitude spectrum, and $\Phi$ is the phase spectrum. Subsequently, we perform adaptive channel re-weighting on the magnitude spectrum M through a simple channel attention mechanism to focus on feature channels with more abundant information. The enhanced magnitude spectrum $\hat{M}$ is calculated as follows:

$$\begin{aligned} \hat{M} = M \odot \sigma (\text {Conv}_{1\times 1}(\text {GAP}(M))) \end{aligned}$$

(3)

where GAP($\cdot$) represents global average pooling, $\text {Conv}_{1\times 1}(\cdot )$ refers to a $1 \times 1$ convolution, $\odot$ denotes element-wise multiplication, and $\sigma (\cdot )$ represents the Sigmoid function. The enhanced magnitude spectrum $\hat{M}$ is then passed into a lightweight MLP composed of two $1 \times 1$ convolutions and a LeakyReLU activation function $\delta$ to extract deeper features:

$$\begin{aligned} M_{\text {processed}} = \text {Conv}_{1\times 1}(\delta (\text {Conv}_{1\times 1}(\hat{M}))) \end{aligned}$$

(4)

In order to adaptively adjust the response based on frequency position, we designed the Frequency Distance Adjustment Mechanism (FDAM). This mechanism first defines a fixed, normalized frequency distance map $D_{freq}$, which represents the distance from each frequency point (u, v) to the spectral center (DC component). Then, a very lightweight convolutional network $f_{\theta }$ is used to learn from this distance map, generating a learnable modulation weight map $W_{freq} = f_{\theta }(D_{freq})$. The final magnitude spectrum is fine-tuned via the following equation:

$$\begin{aligned} M_{\text {out}} = M_{\text {processed}} \odot (1 + \gamma \cdot W_{\text {freq}}) \end{aligned}$$

(5)

where the hyperparameter $\gamma$ (set to 0.1 in this paper) controls the adjustment strength. Finally, we reconstruct the complex spectrum with the modulated magnitude $M_{\text {out}}$ and the original phase $\Phi$, and then transform it back to the spatial domain via the inverse Fourier Transform $\mathscr {F}^{-1}$ to obtain the frequency-enhanced features $x_{\text {freq}}$:

$$\begin{aligned} x_{\text {freq}} = \mathscr {F}^{-1}(M_{\text {out}}e^{j\Phi }) \end{aligned}$$

(6)

This feature is then passed through a learnable gating fusion mechanism to be combined with the original input x, yielding the final output of the FDM, $x_{\text {out}}$:

$$\begin{aligned} x_{\text {out}} = \sigma (\alpha ) \cdot x_{\text {freq}} + (1 - \sigma (\alpha )) \cdot x \end{aligned}$$

(7)

Multi-scale Grouped Dilated Convolution (MGDC)

After processing through the FDM, this paper further designs a multi-scale grouped dilated convolution module to simultaneously expand the receptive field and enhance the perception of local fine-grained details. This module is constructed based on dilated convolution²⁴ and achieves multi-scale structural information extraction by setting different dilation rates for parallel convolutional branches. The overall structure is shown in Fig. 3(e). The MGDC module first introduces a learnable channel-wise bias term $\beta \in \mathbb {R}^{C}$ to the features $X \in \mathbb {R}^{B \times C \times H \times W}$ processed by the FDM, obtaining the shifted features $\tilde{X} = X + \beta$. Subsequently, $\tilde{X}$ is simultaneously sent into three parallel sets of $3 \times 3$ grouped dilated convolutions. The dilation rates for these convolutions are set to 1, 3, and 5, respectively, aiming to capture multi-scale structural information from local details to a larger context. To effectively control computational complexity, we set the number of groups for each convolution to a fixed value $g=4$, which means each convolutional kernel only operates on its assigned channel subset, thereby reducing computational overhead. The outputs of the convolutions are then processed by the RPRELU²⁵ activation function, fused, and finally passed into a Layer Normalization layer to further enhance feature stability and network convergence speed. The overall expression for this module can be simplified as:

$$\begin{aligned} Y = \text {LN}\left( \sum _{i=1}^{3} \text {RPRELU}\left( \text {Conv}_{3\times 3}^{(d_i, g)}(X + \beta )\right) \right) \end{aligned}$$

(8)

where $d_i \in \{1, 3, 5\}$ represents the dilation rate of different branches, and $\text {Conv}_{3\times 3}^{(d_i, g)}$ denotes the convolution operation with dilation rate $d_i$ and number of groups g. After the MGDC processing, we use SimpleGate(SG)²⁶ to replace the activation function.

Channel-Spatial Attention Module (CSAM)

To enhance the model’s perceptual capability for flare regions, this paper proposes a lightweight Channel-Spatial Attention Module (CSAM). As shown in Fig. 3(f), this module combines the lightweight design of SCA²⁶ with the spatial modeling capability of CBAM²⁷, employing parallel channel and spatial attention branches. This design ensures computational efficiency while effectively enhancing the modeling ability for flare regions, thereby improving the model’s adaptability to complex lighting scenes. The channel attention branch is based on the SCA structure, introducing a $1 \times 1$ convolution and a Sigmoid gating mechanism, which makes the response weight of each channel learnable, expressed as:

$$\begin{aligned} M_c = \sigma (\text {Conv}_{1\times 1}(\text {AvgPool}(X))) \end{aligned}$$

(9)

The spatial attention branch borrows from the design of CBAM²⁷. It employs channel-wise average pooling and max pooling for feature compression. The concatenated results are then passed through a $3 \times 3$ convolution to extract spatial saliency for generating the spatial attention map:

$$\begin{aligned} M_s = \sigma \left( \text {Conv}_{3\times 3}\left( \left[ \text {AvgPool}_c(X) \Vert \text {MaxPool}_c(X)\right] \right) \right) \end{aligned}$$

(10)

Finally, the two attention maps are applied to the input feature map through element-wise weighting, yielding the fused output:

$$\begin{aligned} X' = (X \cdot M_c) + (X \cdot M_s) \end{aligned}$$

(11)

SMDBlock

Compared to the encoder, which focuses on expanding the receptive field, our decoder design places greater emphasis on feature reconstruction and detail restoration, while also striving for simplicity and efficiency. To this end, we drew inspiration from the block design in NAFNet²⁶. This design deconstructs complex processing modules into two more concise core components: a global attention module and a feed-forward network (FFN). Its computation process can be represented by the following equations:

$$\begin{aligned} z_1&= \text {Attention}(\text {LayerNorm}(x)) + x \end{aligned}$$

(12)

$$\begin{aligned} z_2&= \text {FFN}(\text {LayerNorm}(z_1)) + z_1 \end{aligned}$$

(13)

where x is the input feature and $z_2$ is the output feature. On this basis, we further incorporate functionally specific enhancement modules to improve the model’s performance and adaptability. As shown in Fig. 3(b), the input feature $X \in \mathbb {R}^{B \times C \times H \times W}$ first undergoes normalization and then enters the global branch pathway. Initially, a $1 \times 1$ convolution is applied to expand the channel dimension, followed by the MGDC module to achieve multi-scale global contextual modeling.The output is then passed through the SG module²⁶ and combined with the SCA for channel re-weighting,before a final a $1 \times 1$ convolution restores the original dimension, forming the first residual branch.The FFN path adopts the intermediate features after normalization and sequentially applies a $1 \times 1$ convolution, the SG module, and another $1 \times 1$ convolution to construct the pre-FFN path, forming the second residual branch. These two residual branches are scaled by learnable factors $\beta$ and $\lambda$, respectively, and then added to the input feature X as residual connections to obtain the final output Y. The computation process of the decoder module is formulated as:

$$\begin{aligned} Y = X + \beta \cdot F_{\text {global}}(X) + \lambda \cdot F_{\text {ffn}}(X) \end{aligned}$$

(14)

Here, $F_{\text {global}}(\cdot )$ and $F_{\text {ffn}}(\cdot )$ represent the mapping functions of the global and FFN branches, respectively, defined as:

$$\begin{aligned} & F_{\text {global}}(X) = \text {Conv}_{1\times 1}(\text {SCA}(\text {SG}(\text {MGDC}(\text {Conv}_{1\times 1}(\text {LN}(X)))))) \end{aligned}$$

(15)

$$\begin{aligned} & F_{\text {ffn}}(X) = \text {Conv}_{1\times 1}(\text {SG}(\text {Conv}_{1\times 1}(\text {LN}(X)))) \end{aligned}$$

(16)

where $\text {LN}(\cdot )$ denotes 2D Layer Normalization and $\text {Conv}_{1\times 1}(\cdot )$ denotes a $1 \times 1$ convolution. The learnable scaling factors $\beta$ and $\lambda$ are used to balance the contribution weights of the global and FFN branches in the final output.

Loss function design

To enhance the model’s reconstruction capability in regions affected by strong light interference and improve overall perceptual quality, we designed a structure-aware composite loss function for the training phase. Specifically, this loss function consists of three components: L1 loss, perceptual loss²³, and multi-scale structural similarity loss (MS-SSIM)²⁸. The final loss is defined as follows:

$$\begin{aligned} \mathscr {L}_{\text {total}} = \lambda _1 \cdot \mathscr {L}_{\text {pixel}} + \lambda _2 \cdot \mathscr {L}_{\text {percep}} + \lambda _3 \cdot \mathscr {L}_{\text {MS-SSIM}} \end{aligned}$$

(17)

where $\lambda _1=0.5$, $\lambda _2=0.5$, and $\lambda _3=0.2$ are the weighting coefficients for each respective loss term.

We employ the Mean Absolute Error (MAE) as the basic pixel-wise loss, encouraging the network to perform precise reconstruction in the pixel space. It is defined as:

$$\begin{aligned} \mathscr {L}_{\text {pixel}}(\hat{I}, I) = \frac{1}{N}\sum _{i=1}^{N}|\hat{I}_i - I_i| \end{aligned}$$

(18)

where $\hat{I}$ and I denote the predicted image and the corresponding ground truth (GT) image, respectively. To enhance the semantic consistency and subjective visual quality of the flare-removed results, we introduce a perceptual loss based on VGG19. This loss extracts features from multiple layers of the predicted and ground truth images, and computes the L1 distance in the feature space:

$$\begin{aligned} \mathscr {L}_{\text {percep}} = \sum _{l \in \mathscr {L}} w_l \cdot \left\| \phi _l(\hat{I}) - \phi _l(I) \right\| _1 \end{aligned}$$

(19)

where $\phi _l(\cdot )$ denotes the feature map from the l-th layer of the VGG network, and $w_l$ is the weight for this specific feature layer. In this work, the 2nd, 7th, 12th, 21st, and 30th layers of VGG19 are selected as perceptual layers.

To further enhance texture and structure restoration, the MS-SSIM loss is introduced. It measures the structural similarity between images across multiple scales:

$$\begin{aligned} \mathscr {L}_{\text {MS-SSIM}} = 1 - \text {MS-SSIM}(\hat{I}, I) \end{aligned}$$

(20)

where an MS-SSIM value closer to 1 indicates a higher structural similarity. We use SSIM computed at five scales with weighted averaging, where the weights are set to [0.0448, 0.2856, 0.3001, 0.2363, 0.1333].

The composite loss described above collaboratively guides the training process, significantly improving the flare removal performance. Experimental results demonstrate that it outperforms single-loss training strategies.

Experiments

Datasets

To train our flare removal model, we primarily used the Flare7K++¹³ dataset. This dataset consists of two parts, Flare7K and Flare-R: Flare7K contains 5,000 simulated scattering flare images and 2,000 simulated reflective flare images, while Flare-R supplements this with 962 real flare patterns. We utilize its dynamic synthesis pipeline to generate paired training samples by randomly selecting backgrounds from the 23,949 natural images in the Flickr24K²⁹ dataset and sampling flare patterns and their corresponding light sources with equal probability from Flare7K and Flare-R. To enhance the model’s adaptability to real-world nighttime scenes, we also additionally introduced 600 real images from FlareReal600³⁰. However, this portion of data serves only as a minor supplement, accounting for approximately 2.44% of the total training samples, which exceed 24,000. The vast majority of the training data still originates from Flare7K++.

Prior to training, we apply a series of complex data augmentation operations, with the detailed parameters shown in Table 1, to the images from the Flare7K¹³, Flare-R¹³, and FlareReal600³⁰ datasets. The entire process strictly distinguishes the processing for base images and flare images: first, all base images and flare images undergo an initial random gamma correction ($\gamma \sim U(1.8, 2.2)$) and random flips (horizontal or vertical). Subsequently, to simulate diverse flare morphologies, only the flare images are subjected to a series of exclusive geometric and appearance transformations, including random rotation ($0^{\circ }$ to $360^{\circ }$), translation (up to 50 pixels), scaling (0.8 to 1.1 times), shear ($\pm 10^{\circ }$), and Gaussian blur ($\sigma \sim U(0.1, 3)$), after which they are center-cropped to $256 \times 256$. Meanwhile, only the base images are randomly cropped to the same size and are enhanced to simulate the physical characteristics of the sensor by adding Gaussian noise ($\sigma \approx 0.01 \times \chi ^2(1)$) and multiplying by a random gain ($g \sim U(0.5, 1.2)$). The processed flare component is added to the enhanced base image to generate the low-quality (LQ) input. Finally, both the LQ image and the enhanced base image, which serves as the ground truth (GT), are subjected to a reverse gamma correction, ultimately forming the training pair $\langle \text {LQ, GT} \rangle$ normalized to the range [0, 1].

Table 1 Training data augmentation parameters.

Full size table

During the testing phase, we evaluate our model on two standard test sets provided by Flare7K++¹³. The first is the Flare7K++ real test dataset, which contains 100 pairs of real nighttime images captured under diverse lighting conditions and flare patterns. The second is the Flare7K++ synthetic test dataset, which we use to further verify the model’s generalization ability on synthetically generated flare images. In addition to these public datasets, we captured our own real-world nighttime scenes using an iPhone 15 Pro and a Xiaomi 13 smartphone. This allows us to evaluate the robustness and practicality of our model on unlabeled images from real-world scenarios.

Training settings

The entire training process is conducted on a single NVIDIA TITAN RTX GPU with 24 GB of memory. Our model is an image restoration network based on an encoder-decoder architecture, where the encoder and decoder are composed of [1, 2, 3] and [3, 1, 1] residual blocks, respectively. We train the model using the AdamW optimizer with an initial learning rate of $1 \times 10^{-3}$, a batch size of 8, and for a total of 100,000 steps. To enhance stability during the initial training phase, we employ a warm-up strategy for the first 5,000 steps, gradually increasing the learning rate. Subsequently, we use the MultiStepLR scheduler to decay the learning rate by a factor of 0.5 at three predefined milestones (30k, 60k, and 90k steps). This strategy helps mitigate training fluctuations and improves the final convergence accuracy.

Evaluation metrics

Most existing flare removal methods evaluate on images with a resolution of $512 \times 512$. To ensure a fair and consistent comparison, all test images are uniformly cropped or resized to $512 \times 512$ and normalized to the range [0, 1]. We adopt three widely-used metrics to evaluate image restoration quality: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM)³¹, and Learned Perceptual Image Patch Similarity (LPIPS)³². To more comprehensively assess the model’s performance in removing different types of flare components, we also introduce two local evaluation metrics proposed by Dai et al.¹³: S-PSNR and G-PSNR. These metrics independently evaluate the regions of strong glare and stripe diffusion, respectively. In addition to these quantitative metrics, we report the number of parameters and FLOPs for each model to assess their computational cost. We also conduct a qualitative analysis on real-world nighttime images to demonstrate the model’s ability to suppress complex lighting interference while preserving structural details.

Table 2 Comparison results on the Flare7K++ real test dataset. Best results are highlighted in Bold, second-best in Italic. * denotes models with reduced parameters due to limited GPU memory. † indicates methods without released code, for which metrics are reported from the original paper and may be incomplete. Note: The average inference time per image for SMFR-Net and SMFR-Net-L is 0.0825 s and 0.0412 s, respectively, measured on an NVIDIA TITAN RTX (24 GB) GPU.

Full size table

Results

To comprehensively verify the effectiveness and superiority of the proposed SMFR-Net in the image deglare task, we selected several representative existing methods for performance comparison. These methods cover traditional image enhancement techniques, direct glare removal methods, and various recently proposed end-to-end image restoration network architectures. Specifically, the comparison methods include: the glare removal method proposed by Wu¹⁰; the end-to-end restoration network proposed by Dai¹³; the nighttime lighting enhancement method proposed by Sharma³³; several well-known image restoration networks trained on the Flare7K++ or Flare7K datasets, such as U-Net¹¹, HINet³⁶, MPRNet³⁵, Restormer¹⁷, Uformer¹⁶, and NAFNet²⁶; the Uformer+ND method based on depth estimation by Kotp and Torki¹⁹; as well as recent state-of-the-art methods SPDDNet³⁷ and LPFSformer³⁸, ensuring a comprehensive comparison. Detailed evaluation results can be found in Tables 2 and 4.

Table 3 Performance of SMFR-Net trained on Flare7K++ vs. Flare7K++ with FlareReal600. Best results are highlighted in bold and second best in italic

Full size table

Table 4 Comparison results on the Flare7K++ synthetic test dataset. All models were trained on the combined Flare7K++ and FlareReal600 datasets. Best results are highlighted in bold, second-best in italic.

Full size table

The results indicate that SMFR-Net exhibits leading performance on both the real and synthetic test sets of Flare7K++, significantly outperforming most mainstream methods. Compared to the current state-of-the-art methods, SMFR-Net improves PSNR, G-PSNR, and S-PSNR by 0.114 dB, 0.048 dB, and 0.065 dB, respectively, on the real test set. It is worth noting that most models in Table 2 were trained only on the Flare7K++ dataset, whereas SMFR-Net incorporates both Flare7K++ and FlareReal600 to leverage additional diversity. To isolate the impact of data differences and objectively validate the effectiveness of the model architecture itself, we conducted a fair comparison by training a version of SMFR-Net using only the Flare7K++ dataset. As shown in Table 3, while the absence of diversity from FlareReal600 resulted in a slight decrease in some metrics compared to the dual-dataset training, this version of SMFR-Net still significantly outperforms most mainstream methods.

On the synthetic test set, SMFR-Net also demonstrates a clear advantage, achieving a PSNR of 30.276, which surpasses the Uformer and Kotp methods by 0.778 dB and 0.703 dB, respectively. Furthermore, the SSIM score increases to 0.966, while the LPIPS value decreases to 0.0177. These results strongly indicate that SMFR-Net is more effective at restoring image details in regions affected by strong glare and streak diffusion.

At the same time, SMFR-Net demonstrates excellent computational efficiency. The model only requires 7.981M parameters and 103.888G FLOPs, maintaining superior performance while significantly reducing computational costs compared to high-complexity networks, such as Kotp (129.306M / 271.419G) and HINet (88.674M / 685.127G).It is worth noting that, due to the high computational cost of the original MPRNet and Restormer models, this study adopted their lightweight versions for comparison. At the same time, we also proposed our own lightweight model–SMFR-Net-Light (SMFR-Net-L)–whose network width is 32, with parameters and FLOPs of 2.152M and 31.228G, respectively. Although the performance of this model is slightly lower than the full version of SMFR-Net, it still significantly outperforms other comparison methods.

In qualitative analysis, SMFR-Net also demonstrates clear and stable visual results on the Flare7K++ real test images (Fig. 4) and the self-collected validation set (Fig. 5), particularly excelling in restoring details in large-scale glare regions, further confirming its practical application value and structural preservation capability.

Ablation study

In addition to validating the overall performance of the backbone model, we conducted a series of ablation experiments to explore the impact of individual components and training strategies. The first group of experiments mainly focuses on evaluating the effectiveness of the designed modules, specifically MGDC, FDM, CSAM, and the structure-aware composite loss function. To achieve this, we systematically removed each individual module or loss term, built corresponding control models, and compared their performance with the full SMFR-Net on the Flare7K++ real test dataset. The results are shown in Fig. 6 and Table 5.

Table 5 Ablation study of key modules and the loss function in SMFR-Net on the Flare7K++ real test dataset. Best results are highlighted in bold.

Full size table

Among all the comparison models, the complete SMFR-Net consistently outperforms in all metrics, fully demonstrating the synergistic effect of frequency domain modeling, dilated convolutions, and attention mechanisms in improving glare region modeling and image structure restoration. Furthermore, the structure-aware composite loss function significantly enhances perceptual consistency and subjective visual quality.

The second set of experiments aims to explore the impact of different encoder-decoder combinations on model performance and complexity. We construct multiple combinations using SMEBlock, NAFBlock, and SMDBlock, and compare them with the final model architecture (SMFR-Net). To improve training efficiency, the channel number was reduced from 64 to 16 during experiments, so the overall performance is slightly lower than the full configuration. As shown in Table 6, SMFR-Net (ours), which adopts the SMEBlock + SMDBlock combination under the full configuration, achieves the best performance, outperforming all other combinations across multiple key metrics, while maintaining a good balance between performance and efficiency with 7.981M parameters and 103.888G FLOPs.

In comparison, SMEBlock + NAFBlock exhibits a slight advantage in G-PSNR (24.279), but falls short of SMFR-Net in PSNR, LPIPS, and other subjective and objective metrics; while All NAFBlock, despite having the lowest parameter count (6.439M) and computation (92.735G), shows a significant drop in performance. In summary, the combination of SMEBlock and SMDBlock can more effectively model image structures and restore details under strong light interference, achieving an ideal balance between performance and complexity, and validating its rationality and superiority as the final backbone architecture.

Table 6 Comparison of different encoder–decoder combinations on model performance and complexity on the Flare7K++ real test dataset. Best results are highlighted in bold.

Full size table

In the third group of experiments, we compare different combinations of loss functions to evaluate their impact on model performance. Specifically, we conducted a systematic ablation study to analyze each loss component and its corresponding weighting coefficient. As shown in Table 7, the experimental results illustrate our step-wise process to determine the final configuration. We begin with a baseline model using only L1 loss ($\lambda _1=1.0$). Upon introducing the perceptual loss, we drew from successful practices in the image deglaring field for balancing pixel-level fidelity with perceptual quality, such as in Flare7K++¹³, and adopted a balanced weighting of $\lambda _1=0.5$ and $\lambda _2=0.5$. As the data in the table shows, this combination improves PSNR while reducing LPIPS from 0.0430 to 0.0401. The next step is to introduce and fine-tune the MS-SSIM loss. The experiment shows that assigning a high weight ($\lambda _3=0.5$) leads to a decrease in all metrics (e.g., PSNR drops to 28.105 and SSIM to 0.894). Therefore, by reducing its weight to $\lambda _3=0.2$, the model achieves the best values across all evaluation metrics, with a PSNR of 28.352 and an SSIM of 0.907. This study indicates that our final weighting coefficients ($\lambda _1=0.5, \lambda _2=0.5, \lambda _3=0.2$) are a well-justified combination of established practices and empirical fine-tuning.

Table 7 Ablation study on loss function components and associated weights on the Flare7K++ real test dataset. Best results are in bold.

Full size table

Table 8 Comparison results of CSAM (full) vs. w/o SA on Flare7K++ test datasets. Best results are highlighted in bold.

Full size table

Furthermore, we conducted an ablation study on the CSAM module to evaluate the role of its spatial attention (SA) mechanism in the deglaring task. As shown in Table 8, the results reveal a noteworthy phenomenon: on the Flare7K++ real test set, while removing the spatial attention branch led to a slight increase in PSNR (from 28.352 to 28.463), the perceptually-oriented metrics, such as LPIPS, G-PSNR, and S-PSNR, all exhibited a significant decline. We attribute this seemingly contradictory result to a trade-off where the model sacrifices structural details for a lower pixel-level mean squared error. For instance, G-PSNR decreased from 24.841. to 24.753, indicating that the model’s global modeling capability for handling strong light interference was significantly compromised. On the Flare7K++ synthetic test dataset, this trend becomes even more pronounced: the model with spatial attention preserved outperforms the version without it in multiple metrics, including PSNR (30.276), SSIM (0.966), LPIPS (0.0177), and G-PSNR (25.561), further confirming the critical role of the spatial attention mechanism in structural perception and detail restoration.

Additional analyses

To validate the applicability and effectiveness of the proposed glare removal method in real-world visual tasks, we conducted experimental evaluations on two representative tasks: semantic segmentation and object detection.

For semantic segmentation, we employed the Segment Anything Model (SAM) proposed by Meta AI⁵. This model possesses zero-shot segmentation capability, enabling high-quality segmentation without fine-tuning, and is suitable for various visual scenarios. As shown in Fig. 7, we input the original image, the image processed by SMFR-Net, and the glare-free ground truth (GT) image into the SAM model to generate the corresponding semantic segmentation results. The results show that the strong glare region in the original image was misidentified as a semantic object, resulting in segmentation errors; whereas after SMFR-Net processing, the glare regions were effectively removed, image structural boundaries became clearer, and semantic partitioning was more accurate and closely aligned with the GT. This verifies the effectiveness of our method in restoring true semantic structures.

For object detection, we selected the medium-scale variant YOLOv11m of the YOLOv11 model³⁹ for inference evaluation. This model achieves relatively high detection accuracy while maintaining good inference speed, making it suitable for multi-object detection tasks in complex nighttime environments. As shown in Fig. 8, strong light interference in the original image significantly affects the model’s perception ability, leading to missed detections or low confidence scores. For example, in the Input4 scene, due to glare occlusion, the motorcycle was detected as “motorcycle” with a confidence score of only 0.56. In contrast, in the image processed by SMFR-Net, the glare interference was effectively suppressed, the confidence of “motorcycle” increased to 0.76, and the detection bounding box aligned better with the object edges.

To further quantify the changes in detection performance, we collected confidence differences of eight targets across four scenes before and after processing. The results indicate that after SMFR-Net processing, target confidence scores generally improved, with an average increase of 0.089, further validating the applicability and effectiveness of our method in real-world visual tasks.

Limitation

Although the SMFR-Net proposed in this paper demonstrates effectiveness across various scenarios, as a model based on supervised learning, its performance is still limited in certain extreme cases. When the scale and intensity of the flare cause the underlying texture and structural information in vast regions of an image to be completely occluded, the model’s restoration capability is affected. In situations of complete information loss, a supervised learning model struggles to reconstruct complex, scene-consistent details due to the lack of effective input cues, and its output tends to converge towards blurry or overly-smooth results. To address this challenge, a research direction worth exploring is the combination of the efficient SMFR-Net architecture with generative models. By leveraging the prior knowledge of generative models, it is expected to enable plausible generative inpainting for regions with complete information loss, thereby enhancing the model’s restoration capabilities in extreme degradation scenarios.

Conclusion

This paper proposes a multi-domain flare removal network–SMFR-Net, aiming to provide a simple and efficient solution. By designing an encoder for multi-domain modeling and a decoder for efficient restoration, SMFR-Net expands the receptive field while maintaining a low computational cost. SMFR-Net contains only 7.981M parameters and achieves significant removal effects in various scenarios, demonstrating its advantages in performance. Furthermore, this paper also proposes a lightweight version of SMFR-Net–SMFR-Net-L, containing only 2.152M parameters, which effectively reduces the computational burden and is more suitable for resource-constrained devices. Experimental results show that although the performance of the lightweight version is slightly lower than the full version, SMFR-Net-L still exhibits good results in the flare removal task and surpasses most existing methods on both the Flare7K++ test set and in real-world application scenarios.

Data availability

The datasets used in this study are publicly available. Flare7K++ is available at https://github.com/ykdai/Flare7K under the S-Lab License 1.0. FlareReal600 is available at https://github.com/Zdafeng/FlareReal600.

Change history

10 January 2026
The original online version of this Article was revised: The Funding section in the original version of this Article was incorrect. The Funding section nowreads: “This work was supported by the Graduate Student Innovation Program of Chongqing University of Science and Technology (grant no.YKJCX2421505) and the Chongqing Technology Innovation and Application Development Project (grant no. 2023TIAD-GPX0007).”

References

Dai, Y., Li, C., Zhou, S., Feng, R. & Loy, C. C. Flare7k: A phenomenological nighttime flare removal dataset. Adv. Neural Inf. Process. Syst. 35, 3926–3937 (2022).
Thoma, M. A survey of semantic segmentation. arXiv preprint arXiv:1602.06541 (2016).
Zou, Z., Chen, K., Shi, Z., Guo, Y. & Ye, J. Object detection in 20 years: A survey. Proc. IEEE 111, 257–276 (2023).
Article ADS Google Scholar
Ranftl, R., Bochkovskiy, A. & Koltun, V. Vision transformers for dense prediction. ArXiv preprint arXiv:2103.13413(2021).
Kirillov, A. et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision, 4015–4026 (2023).
Chabert, F. Automated lens flare removal (Stanford University, In Technical report (Department of Electrical Engineering, 2015).
Google Scholar
Vitoria, P. & Ballester, C. Automatic flare spot artifact detection and removal in photographs. J. Math. Imaging Vis. 61, 515–533 (2019).
Article MathSciNet Google Scholar
Asha, C., Bhat, S. K., Nayak, D. & Bhat, C. Auto removal of bright spot from images captured against flashing light source. In 2019 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), 1–6 (IEEE, 2019).
Seibert, J. A., Nalcioglu, O. & Roeck, W. Removal of image intensifier veiling glare by mathematical deconvolution techniques. Med. Phys. 12, 281–288 (1985).
Article PubMed CAS Google Scholar
Wu, Y. et al. How to train neural networks for flare removal. In Proceedings of the IEEE/CVF international conference on computer vision, 2239–2247 (2021).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, 234–241 (Springer, 2015).
Qiao, X., Hancke, G. P. & Lau, R. W. Light source guided single-image flare removal from unpaired data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4177–4185 (2021).
Dai, Y. et al. Flare7k++: Mixing synthetic and real datasets for nighttime flare removal and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 46, 7041–7055 (2024).
Article ADS PubMed Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022 (2021).
Wang, Z. et al. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 17683–17693 (2022).
Zamir, S. W. et al. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5728–5739 (2022).
Zhang, D. et al. Ff-former: Swin fourier transformer for nighttime flare removal. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2824–2832 (2023).
Kotp, Y. & Torki, M. Flare-free vision: Empowering uformer with depth insights. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2565–2569 (IEEE, 2024).
Huang, Y. et al. Diffusion model-based image editing: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 47(6), 4409-4437. (2025).
Huang, Y. et al. Wavedm: Wavelet-based diffusion models for image restoration. IEEE Trans. Multimed. 26, 7058–7073 (2024).
Article Google Scholar
Liu, Y., Huang, J. & Chen, S. Deseal: Semantic-aware seal2clear attention for document seal removal. IEEE Signal Process. Lett. 30, 1702–1706 (2023).
Article ADS Google Scholar
Johnson, J., Alahi, A. & Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, 694–711 (Springer, 2016).
Yu, F. & Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
Liu, Z., Shen, Z., Savvides, M. & Cheng, K.-T. Reactnet: Towards precise binary neural network with generalized activation functions. In European conference on computer vision, 143–159 (Springer, 2020).
Chen, L., Chu, X., Zhang, X. & Sun, J. Simple baselines for image restoration. In European conference on computer vision, 17–33 (Springer, 2022).
Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), 3–19 (2018).
Wang, Z., Simoncelli, E. P. & Bovik, A. C. Multiscale structural similarity for image quality assessment. In The thrity-seventh asilomar conference on signals, systems & computers, 2003, vol. 2, 1398–1402 (Ieee, 2003).
Zhang, X., Ng, R. & Chen, Q. Single image reflection separation with perceptual losses. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4786–4794 (2018).
Zhang, D. Flarereal600: Real-captured paired dataset for nighttime flare removal. https://github.com/Zdafeng/FlareReal600 (2024). Accessed: 2025-07-2.
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Article ADS PubMed Google Scholar
Zhang, R., Isola, P., Efros, A. A., Shechtman, E. & Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, 586–595 (2018).
Sharma, A. & Tan, R. T. Nighttime visibility enhancement by increasing the dynamic range and suppression of light effects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11977–11986 (2021).
Zhou, Y. et al. Improving lens flare removal with general-purpose pipeline and multiple light sources recovery. In Proceedings of the IEEE/CVF international conference on computer vision, 12969–12979 (2023).
Zamir, S. W. et al. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14821–14831 (2021).
Chen, L., Lu, X., Zhang, J., Chu, X. & Chen, C. Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 182–192 (2021).
Qi, K., Wang, B. & Liu, Y. A self-prompt based dual-domain network for nighttime flare removal. Eng. Appl. Artif. Intell. 144, 110103 (2025).
Article Google Scholar
Chen, G.-Y. et al. Lpfsformer: Location prior guided frequency and spatial interactive learning for nighttime flare removal. IEEE Trans. Circuits Syst. Video Technol. 35(4), 3706-3718. (2024).
Khanam, R. & Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725 (2024).

Download references

Funding

This work was supported by the Graduate Student Innovation Program of Chongqing University of Science and Technology (grant no. YKJCX2421505) and the Chongqing Technology Innovation and Application Development Project (grant no. 2023TIAD-GPX0007).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Chongqing University of Science and Technology, Chongqing, 401331, China
Shaofeng Liu, Guorong Chen, Weijie Zhang, Qingru Zhang, Jinmei Zhang & Jian Wang
Chongqing Institute of Intelligent Mathematics and Autonomous Intelligence, Chongqing University of Science and Technology, Yubei,Chongqing, 401127, China
Guorong Chen

Authors

Shaofeng Liu
View author publications
Search author on:PubMed Google Scholar
Guorong Chen
View author publications
Search author on:PubMed Google Scholar
Weijie Zhang
View author publications
Search author on:PubMed Google Scholar
Qingru Zhang
View author publications
Search author on:PubMed Google Scholar
Jinmei Zhang
View author publications
Search author on:PubMed Google Scholar
Jian Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Shaofeng Liu conceived the study, conducted the experiments, and contributed to writing. Guorong Chen supervised the project, identified issues, and revised the manuscript. Weijie Zhang assisted in writing. Qingru Zhang contributed to the evaluation and visualization. Jinmei Zhang and Jian Wang reviewed the manuscript. All authors approved the final manuscript.

Corresponding author

Correspondence to Guorong Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, S., Chen, G., Zhang, W. et al. SMFR-Net: simple multi-domain flare removal network. Sci Rep 15, 37251 (2025). https://doi.org/10.1038/s41598-025-21378-8

Download citation

Received: 22 July 2025
Accepted: 19 September 2025
Published: 24 October 2025
Version of record: 24 October 2025
DOI: https://doi.org/10.1038/s41598-025-21378-8

Subjects

Abstract

Similar content being viewed by others

An image deblurring method using improved U-Net model based on multilayer fusion and attention mechanism

SVTSR: image super-resolution using scattering vision transformer

High-resolution single-photon imaging with physics-informed deep learning

Introduction

Methods

Overall architecture

SMEBlock

Frequency Domain Modulation module (FDM)

Multi-scale Grouped Dilated Convolution (MGDC)

Channel-Spatial Attention Module (CSAM)

SMDBlock

Loss function design

Experiments

Datasets

Training settings

Evaluation metrics

Results

Ablation study

Additional analyses

Limitation

Conclusion

Data availability

Change history

10 January 2026

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links