Reparameterizable large kernel attention networks for infrared image super-resolution

Wei, Ran; Zuo, Linze; Wang, Xuesong; Wu, Xianyu

doi:10.1038/s41598-025-24193-3

Download PDF

Article
Open access
Published: 18 November 2025

Reparameterizable large kernel attention networks for infrared image super-resolution

Ran Wei¹,
Linze Zuo²,
Xuesong Wang² &
…
Xianyu Wu²

Scientific Reports volume 15, Article number: 40612 (2025) Cite this article

1546 Accesses
Metrics details

Subjects

Abstract

To address the challenge of balancing reconstruction performance and inference speed in the existing infrared image super-resolution algorithms, this paper introduces a novel Large Kernel Reparameterization Attention mechanism. Based on this, we propose the reparameterizable large kernel attention network for infrared image super-resolution. During training, a multi-branch large kernel network is employed to fully extract information, while at inference time, it is equivalently transformed into a single-branch large kernel network, achieving a trade-off between processing performance and inference speed. Compared to state-of-the-art methods, our approach improves the average PSNR on a self-constructed infrared dataset by 0.0008 dB. Additionally, on the RK3588 Neural Processing Unit, it requires only 37ms to perform 4$\times$ super-resolution on 320$\times$180 images.

Image super-resolution method using a generative adversarial network incorporating attention and residual density

Article Open access 24 December 2025

Dual-branch feature encoding framework for infrared images super-resolution reconstruction

Article Open access 23 April 2024

A lightweight large receptive field network LrfSR for image super-resolution

Article Open access 11 April 2025

Introduction

Infrared imaging technology allows us to observe parts of the spectrum beyond visible light, which enables the conversion of invisible infrared radiation into visible images, expanding our perceptual range. However, infrared images obtained by detecting the heat radiated by objects often suffer from low resolution, insufficient contrast, and blurriness, posing challenges for research and applications. The low resolution affects clarity and detail representation, limiting their usefulness.

Improving imaging quality through hardware enhancements is costly in terms of industrial expenses and effort, and the ultimate performance improvements are constrained by insurmountable physical limitations. In contrast, infrared image super-resolution (SR) reconstruction technology offers a cost-effective solution by recovering high-resolution (HR) infrared images from low-resolution (LR) counterparts. This approach meets the practical demand for high-definition infrared images, opening up new possibilities for the application and dissemination of infrared imaging technology across various fields.

Infrared image super-resolution typically employs general image SR methods. Zhang et al.¹ were the first to use compressed sensing for SR image reconstruction. They downsampled the SR and high-resolution images to capture high-frequency noise information, which was then fed into Convolutional Neural Networks (CNNs) to learn nonlinear mappings. Experiments demonstrated that using CNNs to represent nonlinear mappings could enhance texture information in reconstructed infrared images. Kwasniewska et al.² developed a wide receptive field residual network using dense connections, confirming that a wide receptive field effectively enhances low-contrast infrared images. Yuan et al.³ proposed a gradient residual attention network based on CNNs, utilizing gradient operators to better extract features from infrared images. Zou et al.⁴ constructed an infrared image SR model similar to U-Net using residual networks, incorporating multi-receptive field modules to extract high-frequency and low-frequency features and achieving commendable results. Recent research has further explored the integration of physical models and deep learning: Zhang et al.^5,6 significantly improved the spectral fidelity of image reconstruction by combining image degradation models with deep priors, offering new references for image restoration.

Although these models successfully reconstruct high-quality infrared images, they heavily rely on large datasets. In some fields, constructing datasets is challenging due to expensive equipment or a limited natural environment. While self-supervised learning⁷ and semi-supervised frameworks⁸ boost model robustness in data-scarce conditions, transferring knowledge from large pre-trained datasets yields superior reconstruction performance in target domains. Consequently, transfer learning has emerged in many methods across various image processing domains^9,10,11.

In the field of infrared image SR, Almasri et al.¹² used residual blocks to separately extract information from visible and infrared images, followed by fusion. Their experiments demonstrated that visible light images help improve the high-frequency details of infrared images. Huang et al.¹³ proposed PSRGAN (Progressive Super-Resolution Generative Adversarial Network), which leverages visible images and 100 pairs of infrared images to enhance the restoration performance of infrared images. The proposed PSRGAN achieved excellent infrared SR performance by fine-tuning a pretrained network using only 55 infrared images.

In practical applications, super-resolution algorithms often need to be deployed on resource-constrained devices. Due to their high computational complexity, existing algorithms struggle to balance performance and speed on such devices^14,15. In recent years, the model reparameterization technique has emerged as an effective network optimization strategy, converting complex modules of trained models into simplified structures, significantly improving the deployment capability of models in resource-constrained hardware environments. In the field of computer vision, the ACB¹⁶ (Asymmetric Convolution Block) technique integrates asymmetric convolution structures with standard convolution substrates, effectively enhancing the performance of convolutional neural networks. On the other hand, the RepVGG architecture achieves a dual breakthrough in accuracy and speed in image classification tasks through a stacking design of multi-layer reconstructed convolutions¹⁷. With innovation based on the reparameterization technique, lightweight super-resolution models have made significant progress. The FIMDN framework, proposed by AIM2020, enhances the IMDN network with ACB, verifying the feasibility of improving super-resolution performance while maintaining inference efficiency¹⁸. In terms of feature extraction, DBB (Diverse Branch Block) adopts a multi-branch structure similar to Inception, effectively capturing diverse features during the training process. This innovation has been widely applied to various network architectures¹⁹. Inspired by RepVGG and DBB, the Edge-oriented Convolution Block (ECB) targeting mobile devices achieves a balance between computational efficiency and visual quality through a designed reconstruction convolution method for real-time applications²⁰. However, these reparameterization techniques still face challenges in the application of deep networks, mainly due to the high training complexity involved and limitations imposed by the local receptive field.

To address the challenge of achieving a balance between performance and inference speed for infrared image SR algorithms on resource-constrained platforms, this paper proposes a large kernel reparameterization attention mechanism. Based on this, we introduce the reparameterizable large kernel attention network for infrared image super-resolution (REPLKASR) network. Inspired by RepVGG, the method proposed in this paper applies the large kernel convolution attention module to the reparameterization method. By employing a multi-branch large kernel network during training to fully extract infrared image features and equivalently transforming it into a single-branch large kernel network during inference, we achieve a trade-off between speed and performance. Unlike existing reparameterization techniques focusing on small kernel optimization (e.g., the 3$\times$3 convolution kernel in ECB/RepVGG), our method achieves fundamental expansion of the receptive field by fusing 5$\times$5 large convolution kernels, while maintaining the deployment efficiency of structural reparameterization. To reduce the dependence of model training on large-scale infrared datasets, this method first pretrains on visible light datasets to obtain basic feature representations, and then fine-tunes on infrared datasets to achieve cross-modal knowledge transfer. On the RK3588 Neural Processing Unit (NPU), our approach can perform 4$\times$ super-resolution on 320$\times$180 images in just 37 ms, meeting the requirements for real-time super-resolution of infrared images.

The main contributions of this paper are threefold:

1.
This study innovatively designs a large kernel reparameterization unit (REPLKA), which extends existing reparameterization techniques (e.g., ECB). By expanding the convolutional kernel parameters from the standard 3$\times$3 to 5$\times$5, REPLKA dynamically integrates features using a multi-branch structure during training while converting to a single-branch architecture during inference. This approach not only effectively enhances static and dynamic feature extraction capabilities but also ensures deployment efficiency.
2.
This study develops an infrared super-resolution network based on the attention mechanism using large kernel reparameterization. This network integrates the REPLKA module and ECB architecture to optimize infrared image reconstruction performance through multi-scale feature representation. Additionally, the training strategy incorporates knowledge transfer, effectively resolving the data scarcity problem.
3.
To validate the feasibility of the proposed method, this study chooses the Rockchip RK3588 development board as the hardware deployment platform and performs comprehensive quantitative and qualitative evaluation of existing state-of-the-art methods on numerous benchmark datasets. The experimental results demonstrate that, in comparison to conventional image super-resolution approaches, REPLKASR attains superior peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) performance while employing fewer parameters.

The subsequent sections of this paper are structured as follows: Chapter 2 details the design principles of the REPLKA module and the large-kernel convolution reparameterization mechanism, while introducing the overall architecture of the REPLKASR network; Chapter 3 systematically presents the experimental design, including the training strategy, benchmark dataset evaluations, and deployment outcomes on the RK3588 NPU platform; Chapter 4 provides an in-depth discussion of the method’s limitations, ablation study results, and potential future extensions; finally, Chapter 5 concludes with the core contributions.

Proposed method

Network architecture

The structure of the proposed REPLKASR network is illustrated in Fig. 1 and consists of three main components: a shallow feature extraction module, a deep feature extraction module based on cascaded ECB modules and REPLKA modules, and a high-quality image reconstruction module. The ECB and REPLKA modules will be introduced in Sections 2.3 and 2.4, respectively.

The shallow feature extraction module consists of a single ECB module, given the input low-resolution image ${{{I}}_{\text {LR}}}\in {{R}^{\left( 3\times \text {H}\times \text {W} \right) }}$,where H and W represent the height and width of the LR image, respectively. The application of the shallow feature extraction module is represented as $\text {ECB}\left( \cdot \right)$,utilized for extracting shallow features.

The process is represented as follows:

$$\begin{aligned} {{{F}}_{\text {p}}}=\text {ECB}({{{I}}_{\text {LR}}}). \end{aligned}$$

(1)

Next, the shallow features are passed onto the deep feature extraction module to obtain deeper and more abstract high-level features. This process can be described as follows:

$$\begin{aligned} {{{F}}_{\text {r}}}\text {=}{{{f}}_{\text {DF}}}\text {(}{{{F}}_{\text {p}}}\text {)}. \end{aligned}$$

(2)

Where ${{F}_{\text {r}}}$ represents the deep feature maps, ${{f}_{\text {DF}}}\left( \cdot \right)$ represents the deep feature extraction module, which includes multiple cascaded ECBs and a REPLKA module. Intermediate features ${{F}_{1}}$, ${{F}_{2}}, \ldots , {{F}_{n}}$ are progressively extracted. The specific process is illustrated as follows:

$$\begin{aligned} & {{{F}}_{{i}}}\text {=}{{{f}}_{\text {ECB}{_{\text {i}}}}}\left( {{{F}}_{{i-1}}} \right) \text {, }\!\!~\!\!{ i=1,2,......,}{n}. \end{aligned}$$

(3)

$$\begin{aligned} & {{{F}}_{{n}}}=\text {REPLKA(}{{{F}}_{{n-1}}}{)}. \end{aligned}$$

(4)

Where ${f}_{{\text {ECB}}_{i}}(\cdot )$ represents the i-th ECB, $\text {REPLKA}(\cdot )$ denotes the REPLKA module, and n indicates the number of ECBs. Subsequently, ${{F}_{\text {r}}}$ and ${{F}_{\text {p}}}$ are fed into the image reconstruction module to complete the super-resolution reconstruction of the image. This process can be described as:

$$\begin{aligned} {{{I}}_{\text {SR}}}\text {=}{{{f}}_{\text {RC}}}\text {(}{{{F}}_{\text {p}}}\text {+}{{{F}}_{\text {r}}}\text {)}. \end{aligned}$$

(5)

Where ${{I}_{\text {SR}}}$ represents the reconstructed SR image, ${{f}_{\text {RC}}}$ denotes the upsampling module, which consists of sub-pixel convolutional layers.

Structural reparameterization

During training, neural networks often utilize multi-branch models similar to the ResNet²¹ style,as illustrated in Fig. 2(a), where parallel branches generally enhance the model’s representational capacity. Each branch can learn different features and ultimately enhance the model’s performance by combining the features through fusion mechanisms. However, multi-branch network models require multiple memory accesses and writes during inference, resulting in significant time wastage. Additionally, the time consumption increases when multiple branches are merged. By converting the multi-branch network into a single-path model with a VGG-style²² architecture during inference, these drawbacks can be overcome.

Figure 2(b) illustrates the multi-branch network structure used during model training, while Fig. 2(c) represents the network structure utilized during inference.

Structural Reparameterization²⁰ refers to the process of combining the biases and weights of a pretrained multi-branch network and storing them in a single-branch network model, ensuring consistency between the results obtained from multi-branch and single-branch networks during inference. Structural Reparameterization primarily takes three forms: merging convolutional and batch normalization (BN)²³ operators, expanding 1$\times$1 convolutional layers into 3$\times$3 convolutional layers, and merging 3$\times$3 convolutional layers on each branch into a single 3$\times$3 convolutional layer.

Since both convolution and BN operators perform linear operations, they can be merged into a single operator. For the BN layer, it mainly includes four parameters: $\mu$ (mean), ${{\sigma }^{2}}$ (variance), $\gamma$ and $\beta$, where $\mu$ and ${{\sigma }^{2}}$ are obtained statistically during the training process, while $\gamma$ and $\beta$ are learned during training. The calculation formula for the i-th channel of the feature map’s BN is shown in Eq. (6), where $\varepsilon$ is a very small constant to prevent the denominator from being zero.

$$\begin{aligned} {{{y}}_{{i}}}{=}\frac{{{{x}}_{{i}}}{-}{{\!\!\mu \!\!}_{{i}}}}{\sqrt{{{\!\!\sigma \!\!}^{\text {2}}}{+}\!\!\varepsilon \!\!}}{{\!\!\gamma \!\!}_{{i}}}{+}{{\!\!\beta \!\!}_{{i}}}. \end{aligned}$$

(6)

For the feature map M of the i-th input to the BN layer, it can be represented as Eq. (7):

$$\begin{aligned} bn{{(M,\mu ,\sigma ,\gamma ,\beta )}_{:,i,:,:}}=\left( {{M}_{:,i,:,:}}-\frac{{{\gamma }_{i}}}{{{\sigma }_{i}}} \right) +{{\beta }_{i}}. \end{aligned}$$

(7)

The weights of the new convolutional layer after the transformation can be calculated using Eq. 8, where i represents the i-th convolutional kernel, and ${W}'$and ${b}'$ are the new weights and biases.

$$\begin{aligned} W_{i,:,:,:}^{\prime }=\frac{{{\gamma }_{i}}}{{{\sigma }_{i}}}{{W}_{i,:,:,:}},b_{i}^{\prime }={{\beta }_{i}}-\frac{{{\mu }_{i}}{{\gamma }_{i}}}{{{\sigma }_{i}}}. \end{aligned}$$

(8)

The 1$\times$1 convolution can be transformed into a 3$\times$3 convolution by adding zeros around the original weights, as depicted in Fig. 3. This conversion results in a 3$\times$3 convolutional layer.

When all branches in the network consist of 3$\times$3 convolutions, as shown in Fig. 4, the addition operations performed after branching can be combined. By summing the trained biases and weights together and then reverting to a single convolution, the single-branch network can achieve results consistent with the multi-branch network, thereby improving inference speed.

Network structure of the ECB module

As described in Section 2.2, multi-branch convolutions can be fused into a single convolution during inference. Inspired by ECBSR²⁰, this paper adopts the ECB module from ECBSR, using a multi-branch structure during training and merging it into a single-branch network during inference to enhance inference speed. The ECB module is illustrated in Fig. 5:

The ECB module, depicted in Fig. 5(a), comprises four types of operators.

A 3$\times$3 convolution: It initially utilizes a standard 3$\times$3 convolution to ensure basic performance. This conventional convolutional operation expressed as:

$$\begin{aligned} {{{F}}_{{n}}}\text {=}{{{K}}_{{n}}}\times {X+}{{{B}}_{{n}}}. \end{aligned}$$

(9)

Where ${{F}_{n}}\text {, }X\text {, }{{K}_{n}}$ and ${{B}_{n}}$ respectively represent the output features, input features, weights, and biases of the standard convolution.

Dilated-Compress Convolution Combination: Wider features significantly enhance expressiveness and contribute to better performance in SR tasks. As the second component of the ECB, dilated-compress convolution is utilized. As depicted in Fig. 5(a)’s second column, it starts with a 1$\times$1 convolution as the dilated convolution, doubling the number of channels to enhance expressiveness. Subsequently, a 3$\times$3 convolution is employed as the compress convolution to restore the number of channels.

Using $\{{K}_{e}\text {, }{B}_{e}\}$ and $\{{K}_{s}\text {, }{B}_{s}\}$ to represent the weights and biases of the 1$\times$1 dilated convolution and 3$\times$3 compress convolution, respectively, the dilated-compress convolution is expressed as follows in Eq. (10):

$$\begin{aligned} {{F}_{es}}={{K}_{s}}\times \left( {{K}_{e}}X+{{B}_{e}} \right) +{{B}_{s}}. \end{aligned}$$

(10)

Convolution with scaled Sobel filters: Edge information has been proven to be highly beneficial for SR task²⁴. ECB incorporates the extraction of first-order derivatives into its design. Due to the challenge of automatically learning sharp edge filters, ECB opts to use predefined edge filters and learns scaling factors for each filter. Specifically, the input features undergo a standard 1$\times$1 convolution first, followed by the utilization of two scaled Sobel filters to extract the gradients of the intermediate features.

Let ${{D}_{x}}$ and ${{D}_{y}}$represent the horizontal and vertical Sobel filters, respectively. They are expressed as shown in Eq. (11):

$$\begin{aligned} {{{D}}_{{x}}}{=}\left[ \begin{array}{lll} \text {+1} & \text {0} & \text {-1} \\ \text {+2} & \text {0} & \text {-2} \\ \text {-1} & \text {0} & \text {-1} \\ \end{array} \right] \text {, }{{{D}}_{{y}}}{=}\left[ \begin{array}{lll} \text {+1} & \text {+2} & \text {+1} \\ \text {0} & \text {0} & \text {0} \\ \text {-1} & \text {-2} & \text {-1} \\ \end{array} \right] \text {.} \end{aligned}$$

(11)

The extraction of edge information in both horizontal and vertical directions for each channel of the intermediate features involves processing with Sobel filters followed by scaling according to channel-specific scaling factors. The extraction of edge information in the horizontal and vertical directions is represented as shown in Eqs. (12) and (13):

$$\begin{aligned} & {{F}_{{{D}_{x}}}}=\left( {{S}_{{{D}_{x}}}}\cdot {{D}_{x}} \right) \otimes \left( {{K}_{x}}\times X+{{B}_{x}} \right) +{{B}_{{{D}_{x}}~}}. \end{aligned}$$

(12)

$$\begin{aligned} & {{F}_{{{D}_{y}}}}=({{S}_{{{D}_{y}}}}\cdot {{D}_{y}})\otimes ({{K}_{y}}\times X+{{B}_{y}})+{{B}_{{{D}_{y}}}}. \end{aligned}$$

(13)

Where $\{{K}_{x}\text {, }{B}_{x}\}$ and $\{{K}_{y}\text {, }{B}_{y}\}$ represent the weights and biases of the 1$\times$1 convolutions for the horizontal and vertical branches, respectively. $\{{S}_{{{D}_{x}}}\text {, }{B}_{{D}_{x}}\}$ and $\{{S}_{{{D}_{y}}}\text {, }{B}_{{{D}_{y}}}\}$ denote the scaling parameters and biases. The edge information extracted by the horizontal and vertical Sobel filters is directly summed to obtain the combined edge information ${{F}_{sob}}$, as shown in Eq. (14):

$$\begin{aligned} {{{F}}_{{sob}}}{=}{{{F}}_{{{{D}}_{{x}}}}}{+}{{{F}}_{{{{D}}_{{y}}}}}. \end{aligned}$$

(14)

Convolution with a combination of scaled Laplacian filters: In addition to first-order derivatives, the ECB module also employs Laplacian filters to extract second-order spatial derivatives. The input features first undergo a standard 1$\times$1 convolution, followed by the extraction of second-order spatial derivatives using a Laplacian filter ${{D}_{lap}}$, which is represented as shown in Eq. (15):

$$\begin{aligned} {{D}_{lap}}=\left[ \begin{matrix} 0 & +1 & 0 \\ +1 & -4 & +1 \\ 0 & +1 & 0 \\ \end{matrix} \right] . \end{aligned}$$

(15)

The extraction of scaled second-order edge information is represented as shown in Eq. (16):

$$\begin{aligned} {{F}_{lap}}=({{S}_{lap}}\cdot {{D}_{lap}})\otimes ({{K}_{l}}\times X+{{B}_{l}})+{{B}_{lap}}. \end{aligned}$$

(16)

Where $\{{K}_{l}\text {, }{B}_{l}\}$ represent the weights, biases of the 1$\times$1 convolution, and $\{{S}_{lap}\text {, }{B}_{lap}\}$ are the scaling factors and biases of ${{D}_{lap}}$. The output of ECB consists of four parts:

$$\begin{aligned} F={{F}_{n}}+{{F}_{es}}+{{F}_{sob}}+{{F}_{lap}}. \end{aligned}$$

(17)

Then, the combined feature map is passed through a non-linear activation layer, specifically a Parametric Rectified Linear Unit (PReLU).

Network structure of the REPLKA module

Applying large convolutional kernels in networks can increase their receptive field without reducing the effective resolution of features. However, these kernels may introduce blank regions during the convolution process, leading to the loss of local information. To effectively capture information from the input feature maps, it is necessary to employ multiple large convolutional kernels in parallel. By adopting reparameterization strategies, the model’s ability to extract information can be enhanced without increasing computational costs. Inspired by RepVGG and ECBSR, this paper introduces for the first time the strategy of large kernel reparameterization. Based on large kernel reparameterization, the REPLKA module is proposed. The network structure of REPLKA is depicted in Fig. 6. The process during training of the REPLKA module can be represented as follows:

$$\begin{aligned} \left\{ \begin{matrix} Y & = & D{{W}_{1}}(X)D{{W}_{2}}\left( X \right) +D{{W}_{3}}\left( X \right) +D{{W}_{4}}(X) , \\ Z & = & DW{{D}_{5\times 5}}(Y), \\ Z & = & Con{{v}_{1\times 1}}(Z) ,\\ Z & = & Z\otimes X . \\ \end{matrix} \right. \end{aligned}$$

(18)

Where $D{{W}_{1}}\left( \cdot \right)$, $D{{W}_{2}}\left( \cdot \right)$, $D{{W}_{3}}\left( \cdot \right)$, $D{{W}_{4}}\left( \cdot \right)$ denote four depthwise convolutions with kernel size of 5$\times$5, aiming to enhance the model’s expressive capacity. $DWD_{5\times 5}\left( \cdot \right)$ represents a depthwise dilated convolution with kernel size of 5$\times$5 and dilation rate of 3. $C\text {on}{{\text {v}}_{1\times 1}}\left( \cdot \right)$ denotes a convolution with kernel size of 1$\times$1. $\otimes$ denotes element-wise multiplication.

During inference, the biases and weights trained in the processes of $D{{W}_{1}}\left( \cdot \right)$, $D{{W}_{2}}\left( \cdot \right)$, $D{{W}_{3}}\left( \cdot \right)$ and $D{{W}_{4}}\left( \cdot \right)$ can be summed together. The resulting biases and weights are then used as the biases and weights in the depthwise convolutions during inference. The specific process is illustrated in Eq. (19):

$$\begin{aligned} \left\{ \begin{matrix} K_d=K_1+K_2+K_3+K_4, \\ B_d=B_1+B_2+B_3+B_4. \end{matrix}\right. \end{aligned}$$

(19)

Where $\left\{ {{K}_{1}},{{K}_{2}},{{K}_{3}},{{K}_{4}} \right\}$ and $\{{{B}_{1}},{{B}_{2}},{{B}_{3}},{{B}_{4}}\}$ represent the weights, biases of $D{{W}_{1}}\left( \cdot \right)$, $D{{W}_{2}}\left( \cdot \right)$, $D{{W}_{3}}\left( \cdot \right)$ and $D{{W}_{4}}\left( \cdot \right)$. ${{K}_{d}}$ and ${{B}_{d}}~$denote the weights and biases in the depthwise convolutions during inference.

The pseudocode of the proposed method in this paper is presented in Table 1.

Table 1 Pseudocode of the REPLKASR network.

Full size table

Experiments

Experimental setup

Because acquiring infrared data is challenging, this chapter adopts a transfer learning strategy. Initially, the RepLKASR network is trained using 2650 pairs of visible images from Flickr2K²⁵ and 800 pairs of visible images from DIV2K²⁶.

Subsequently, 100 infrared images are selected from the M3FD²⁷ public infrared dataset, denoted as M3FD-100. Among these, 70 images are used for training, 15 for validation, and 15 for testing (referred to as M3FD-15). Additionally, 15 images are chosen from the Iray infrared super-resolution dataset²⁸ for testing (referred to as Iray-15). Furthermore, 15 images each are selected from the Iray infrared boat target recognition dataset (abbreviated as Iray-boat) and the Iray infrared ship traffic dataset (abbreviated as Iray-traffic) for testing. All the aforementioned infrared images are generated using bicubic degradation to form paired data.

In addition to the aforementioned datasets, this section includes 15 infrared images of traffic scenes captured around the campus for no-reference testing, named as “self-built”. For data augmentation in the training dataset, random combinations of rotations ($0^\circ$, $90^\circ$, $180^\circ$, $270^\circ$) and horizontal flips are applied. The evaluation metrics utilize the average PSNR and SSIM on the luminance channel.

In RepLKASR, this chapter employs $n=$8 ECBs and 1 REPLKA, with a channel width set to 32.

The model is trained using the Adam²⁹ optimizer with parameters $\beta$1=0.9 and $\beta$2=0.99. The learning rate is initialized to 5e-4 and scheduled using cosine annealing throughout the entire training process of 1e6 iterations. For ablation studies, all models are trained within 4e5 iterations. The exponential moving average (EMA)³⁰ weight is set to 0.999. Only L1 loss is utilized for optimizing the model. The patch size and batch size for RepLKASR are set to 192$\times$192 and 64, respectively. The same training strategy is applied during transfer learning. The compared methods in this chapter also undergo transfer learning using the same strategy.

The experiments in this section are conducted using the Ubuntu 20.04 operating system and the PyTorch 1.9.0 training framework.

Ablation experiment of the REPLKA module

To verify the effectiveness of the proposed large kernel reparameterization module in REPLKASR, ablation experiments are conducted in this section. To conserve computational resources, the ablation experiments are uniformly trained for 400,000 iterations. The performance of the model is evaluated on benchmark datasets including Set5³¹, Set14³², BSD100³³, Urban100³⁴, and Manga109³⁵.

As shown in Table 2, training with the four-branch module achieves better performance. During inference, the reparameterization strategy ensures that the computational complexity of the four-branch network matches that of the single-branch network, demonstrating the effectiveness of the proposed REPLKA module.

From “Single-Branch” to “Four-Branch,” as the number of branches increases, the PSNR and SSIM values gradually improve on different datasets. For example, on the Set5 dataset, the PSNR/SSIM increases from 31.60/0.8867 to 31.66/0.8882, indicating that more branches contribute to improving the image quality.

In the “Multi-Adds (G)” column, it can be seen that the computational complexity of all configurations containing REPLKA modules is 6.6 GOPs (Giga Operations), indicating that the network’s total computational complexity remains unchanged despite the addition of REPLKA module branches. This is because the structural reparameterization method optimizes the internal structure to maintain computational efficiency.

Table 2 Ablation Study of the REPLKA Module: The impact of different configurations, including the absence of the REPLKA module, single-branch module, two-branch module, three-branch module, and four-branch module, on $\times$2 Super-Resolution tasks performed by REPLKASR is investigated. The best metrics are highlighted in bold.

Full size table

Results of the infrared image experiment

Since there are few dedicated lightweight models for infrared image super-resolution reconstruction, this paper evaluates the performance of the proposed RepLKASR by conducting transfer learning with the same infrared data using classical lightweight SR methods with comparable parameters. The methods compared include ESPCN³⁶, FSRCNN³⁷, IMDN-RTC³⁸, ECBSR-M10C32²⁰, MAN-tiny³⁹ and SMFANet⁴⁰. Table 4 presents the quantitative comparison on the M3FD-15, Iray-15, Iray-boat and Iray-traffic datasets with an upsampling factor of $\times$4. Both tables also provide the Params, Multi-Adds and FLOPs for an output resolution of 1280$\times$720.

Table 3 Quantitatively compare with the state-of-the-art methods in the image super-resolution field on the Infrared Images benchmark dataset. The best and second-best performances are highlighted in Italic and bold, respectively.

Full size table

As shown in Table 3, REPLKASR demonstrates significant advantages in multiple key metrics. It achieves the best or second-best performance in terms of PSNR and SSIM for both $\times$2 and $\times$4 upscaling factors. Particularly, at the $\times$4 upscaling factor, REPLKASR achieves the highest PSNR and SSIM values on the M3FD-15, Iray-boat, and Iray-traffic datasets. Although REPLKASR slightly surpasses some methods in terms of parameter count and computational complexity (Multi-Adds and FLOPs), its significant improvement in image quality proves its efficiency and superiority in infrared image super-resolution tasks. Therefore, REPLKASR, while maintaining high computational efficiency, can provide higher-quality super-resolution images, demonstrating important practical value.

In addition to quantitative evaluations, visual comparisons between the proposed RepLKASR and six state-of-the-art lightweight SR methods, including ESPCN, FSRCNN, IMDN-RTC, and ECBSR, are provided. Figures 7 and 8 present visual comparisons on the $\times$4 M3FD-15 dataset and Iray-15 dataset with the state-of-the-art methods. The images within the red boxes are cropped and magnified. Figures 9 and 10 show visual comparisons on the $\times$4 Iray-boat and Iray-traffic datasets, respectively. To further demonstrate the effectiveness of the proposed RepLKASR, the confidence levels detected by YOLOv5 are also presented in Figs. 9 and 10.

In Fig. 7, for img010 in the M3FD-15 dataset and img1866 in the Iray-traffic dataset, the proposed RepLKASR method can restore the window image to a level almost indistinguishable from the HR image, while other methods still produce images with blur and artifacts, failing to restore straight lines, resulting in unacceptable reconstructions. In Fig. 8, for img005 in the Iray-15 dataset and img6415 in the Iray-boat dataset, the image restored by the proposed RepLKASR method is clear and clean, while the reconstructions by other methods exhibit broken contours or distorted contours, leading to unacceptable results.

In the visual results of Figs. 7 and 8, the proposed method achieves the best visual effects among the results, and the results of the ECBSR method come closest to those of the proposed method. This is because ECBSR utilizes multiple parallel small kernel convolution modules, known as ECB. In contrast, the proposed method introduces large kernel convolution blocks based on the ECBSR network, which increases the receptive field for feature learning compared to ECB. As a result, the proposed method can extract more details and has a larger image restoration range compared to ECBSR.

As shown in Fig. 9 and Fig. 10, REPLKASR achieved the highest confidence scores in multiple object detection tasks. No instances of false negatives were observed in any of the test images for REPLKASR, while most other methods exhibited false negatives in images img6506 and img6502. This indicates that REPLKASR has higher robustness and reliability in object detection tasks. Additionally, REPLKASR did not exhibit any instances of false positives. This suggests that REPLKASR is able to better preserve the structural information of the objects and reduce interference from background noise when generating super-resolution images.

In conclusion, the proposed RepLKASR algorithm achieves the best super-resolution performance in various scenarios, yielding the highest target detection confidence, thus demonstrating the effectiveness of the proposed method.

Neural network processor inference results

The neural network processor chosen for this study is the Rockchip RK3588 development board. This chip features an eight-core CPU with four A76 cores and four A55 cores, as well as an ARM G610MP4 GPU. It also includes an integrated NPU with a computational power of 6 TOPs (Tera Operations Per Second), capable of performing 6 trillion operations per second. This processor is characterized by high computational power, low power consumption, and multiple interfaces, making it well-suited to the system requirements.

Moreover, the platform runs on the Ubuntu 20.04 operating system, allowing for direct configuration of necessary code libraries and open-source deep learning frameworks, significantly simplifying the network deployment process. For the infrared sensor, the study employs the X162O-F180W uncooled infrared detector from Chengdu Jinglin, which connects via USB. This setup enables video stream reading through the software depend-ency library OpenCV (a cross-platform computer vision and machine learning software library). The real-time infrared image super-resolution system constructed in this study is illustrated in Fig. 11.

Table 4 Performance comparison of lightweight SR methods on the RK3588NPU. Tested on public datasets. Inference time is measured based on the output image size of 1280$\times$720. The best and second-best performances are highlighted in Italic and bold, respectively.

Full size table

This section presents the inference speed and accuracy of the quantized REPLKASR model (abbreviated as REPLKASR-uint8) on the RK3588 platform. To further demonstrate the superiority of the proposed REPLKASR-uint8, comparisons are made with ESPCN-uint8, FSRCNN-uint8, IMDN-RTC-uint8, and ECBSR-M10C32-uint8 (abbreviated as ECBSR-uint8). Table 4 shows the accuracy and inference times of each algorithm on the RKNPU. The proposed REPLKASR-uint8 achieves real-time super-resolution for 320$\times$180 images on the RK3588NPU.

Inference results (Table 4) of the quantized algorithms on the RK3588NPU show that compared to the IMDN-RTC-uint8$\times$4 method, the proposed RepLKASR-uint8$\times$4 method achieves an average improvement of 0.05 dB in PSNR and 0.0056 in SSIM across four infrared datasets. Compared to the ECBSR-M10-uint8$\times$4, the RepLKASR-uint8$\times$4 method achieves an average improvement of 0.035 dB in PSNR and 0.0031 in SSIM across the same four infrared datasets.

In addition to the referenced dataset comparisons, we also conducted a four-fold super-resolution experiment on the self-built dataset because the real dataset lacks references. Table 5 compares the no-reference evaluation metrics, namely Natural Image Quality Evaluator (NIQE)⁴¹ and Perception based Image Quality Evaluator (PIQE)⁴². NIQE, based on human visual perception, indicates that a lower value corresponds to a higher visual quality of the image. PIQE is sensitive to image noise, with a lower score indicating higher image quality.

Table 5 Performance comparison of lightweight SR methods on the RK3588NPU. Tested on Self-built datasets. Inference time is measured based on the output image size of 1280$\times$720. The best and second-best performances are highlighted in Italic and bold, respectively.

Full size table

As shown in Table 5, the proposed RepLKASR-uint8$\times$4 outperforms all compared methods in terms of NIQE and PI metrics on the self-built dataset, achieving the highest efficiency. This indicates that the proposed method in this paper outperforms other comparative methods in terms of image quality restoration, generating outputs that are closer to high-quality natural images. At the same time, the inference time is close to the optimal value, which demonstrates that the method presented in this paper achieves higher image quality while maintaining efficient inference.

In addition to quantitative evaluations, visual comparisons of the proposed RepLKASR-uint8 with state-of-the-art lightweight SR methods on the RK3588NPU are provided, including ESPCN-uint8, FSRCNN-uint8, IMDN-RTC-uint8, and ECBSR-M10-uint8. Figures 12 and 13 show visual comparisons on the four infrared datasets with $\times$4 upscaling against the state-of-the-art methods. The images in the red boxes are cropped and magnified.

Figures 14, 15, and 16 respectively display visual comparisons on the $\times$4 upscaled Iray-boat dataset, Iray-traffic dataset, and self-built dataset against the state-of-the-art methods. To further demonstrate the effectiveness of the proposed RepLKASR-uint8, the confidence scores from YOLOv5 detection are also presented.

For img1866 in Iray-traffic, the RepLKASR-uint8 method proposed can restore the fence image almost indistinguishable from HR, while other methods still produce images containing blur and artifacts, and may even fail to restore normal straight lines, resulting in unacceptable reconstruction. For img6412 in Iray-boat, the RepLKASR-uint8 method restores a clear and clean image, whereas other methods produce distorted and blurry lines, leading to unacceptable reconstruction.

The REPLKASR method performed excellently in the object detection tasks of all test images. As shown in Fig. 15 and Fig. 16, the proposed method in this paper achieved the highest confidence in vehicle detection tasks, demonstrating stronger stability and robustness. Particularly in the water scene shown in Fig. 14, REPLKASR exhibited significantly higher detection confidence for ships compared to other methods. This is because of the higher contrast between the background and the target in water scenes, and the convolutional kernels used in the REPLKASR method have a larger receptive field, allowing for more effective differentiation between the background and the target.

In summary, the RepLKASR-uint8 algorithm proposed in this study achieves superior super-resolution results in various scenarios, attaining the highest confidence in object detection. This validates the effectiveness of the proposed method.

Discussion

Analysis of reconstruction model limitations under noise and low-contrast conditions

There are performance variations among different datasets for each model. The proposed method in this paper showed outstanding performance on the Iray-boat and Iray-traffic datasets, which mainly focus on ship and vehicle scenes. In contrast, the performance of the models was slightly inferior on the M3FD-15 and Iray-15 datasets, which primarily consist of building and road scenes. This phenomenon may be related to the characteristics of the fine-tuning dataset (M3FD). Although the proposed method exhibited good scene generalization performance, its performance in specific scenes was not as good as models with smaller convolutional kernels.

As shown in the red box region in Fig. 17, the proposed method reconstructs the second letter “O” less effectively than the ECBSR method. This arises from the large receptive field of the kernel convolution: when the input low-resolution image contains excessive noise, the model tends to misidentify noise as valid features for reconstruction. Additionally, the first letter “O” fails to reconstruct due to insufficient contrast. Notably, in the green box region, when confronted with ambiguous information in the input image, the proposed method partially restores high-frequency details compared to ECBSR, indicating that in areas with lower noise levels, the large receptive field helps enhance reconstruction performance.

Performance gain bottlenecks with increasing model complexity

From Table 4 and Table 5, it can be observed that as the model complexity increases, the marginal effect of performance improvement gradually diminishes. On one hand, due to the lack of high-quality infrared datasets, the models may not have been sufficiently trained, resulting in limited performance improvement. On the other hand, the increase in model complexity significantly increases the consumption of computational resources, which may limit the practicality of the model in real-time applications or resource-constrained devices. For example, at a magnification factor of $\times$2, the improvement in PSNR and SSIM metrics for SMFANet and the proposed method compared to ECBSR and MAN-tiny is not significant. This indicates that in certain cases, further increasing model complexity may not lead to significant performance improvement but instead increase the consumption of computational resources.

Therefore, in future work, we plan to explore more efficient model architectures and training strategies to reduce computational costs while maintaining or improving performance. Specifically, we consider introducing lightweight modules (such as depth-wise separable convolution) or knowledge distillation techniques to optimize model design, while enhancing model generalization and training efficiency through data augmentation or transfer learning.

Conclusions

Currently existing infrared image super-resolution reconstruction networks often struggle to balance reconstruction performance and inference speed, making real-time processing challenging on resource-constrained edge computing platforms. Addressing this issue, this paper introduces for the first time the Large Kernel Resampling Attention Mechanism. During training, it utilizes a multi-branch large kernel network to fully extract information and converts equivalently to a single-branch large kernel network during inference, achieving a balance between processing performance and inference speed.

Compared to state-of-the-art SR methods with similar Params and FLOPs, REPLKASR improves PSNR on infrared datasets by 0.08 dB and SSIM by 0.0004. The REPLKASR model is deployed on the RK3588 neural network processor and combined with infrared addition to build a real-time super-resolution reconstruction system for infrared scenes. This system achieves four-fold real-time super-resolution for 320$\times$180 images.

For data citations of datasets uploaded to e.g. figshare, please use the howpublished option in the bib entry to specify the platform and the link, as in the Hao:gidmaps:2014 example in the sample bibliography file.

Data availability

The public data used in this work are listed here: 1. Flickr2K: https://openaccess.thecvf.com/content_cvpr_2017_workshops/w12/html/Lim_Enhanced_Deep_Residual_CVPR_2017_paper.html?ref=https://githubhelp.com; 2. DIV2K: https://openaccess.thecvf.com/content_cvpr_2017_workshops/w12/html/Agustsson_NTIRE_2017_Challenge_CVPR_2017_paper.html; 3. Set5: http://eprints.imtlucca.it/2412/; 4. Set14: https://link.springer.com/chapter/10.1007/978-3-642-27413-8_47; 5. Urban100: https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Huang_Single_Image_Super-Resolution_2015_CVPR_paper.html; 6. BSD100: https://doi.org/10.1109/ICCV.2001.937655; 7. Manga109: https://doi.org/10.1007/s11042-016-4020-z; 8. M3FD: https://openaccess.thecvf.com/content/CVPR2022/html/Liu_Target-Aware_Dual_Adversarial_Learning_and_a_Multi-Scenario_Multi-Modality_Benchmark_To_CVPR_2022_paper.html; 9.Iray infrared super-resolution dataset: http://openai.raytrontek.com/apply/Sea_shipping.html/.

References

Zhang, X. et al. Infrared image super resolution by combining compressive sensing and deep learning. Sensors 18, 2587 (2018).
Article ADS PubMed PubMed Central Google Scholar
Kwasniewska, A., Ruminski, J., Szankin, M. & Kaczmarek, M. Super-resolved thermal imagery for high-accuracy facial areas detection and analysis. Eng. Appl. Artif. Intell. 87, 103263 (2020).
Article Google Scholar
Yuan, X. et al. Gradient residual attention network for infrared image super-resolution. Opt. Lasers Eng. 175, 107998 (2024).
Article Google Scholar
Zou, Y. et al. Super-resolution reconstruction of infrared images based on a convolutional neural network with skip connections. Opt. Lasers Eng. 146, 106717 (2021).
Article Google Scholar
Zhang, J., Ye, Y., Fang, F., Wang, T. & Zhang, G. Dmcsc: Deep multisource convolutional sparse coding model for pansharpening. IEEE Trans. Geosci. Remote. Sens. 61, 1–13 (2023).
Google Scholar
Zhang, J., Fang, F., Wang, T., Zhang, G. & Song, H. Frdiff: Framelet-based conditional diffusion model for multispectral and panchromatic image fusion. IEEE Trans. Multimed. 27, 5989-6002 (2025).
Wang, D., Zhuang, L., Gao, L., Sun, X. & Zhao, X. Global feature-injected blind-spot network for hyperspectral anomaly detection. IEEE Geosci. Remote Sens. Lett. 21, 1-5 (2024).
Wang, D., Gao, L., Qu, Y., Sun, X. & Liao, W. Frequency-to-spectrum mapping gan for semisupervised hyperspectral anomaly detection. CAAI Trans. Intell. Technol. 8, 1258–1273 (2023).
Article Google Scholar
Wertheimer, D. & Hariharan, B. Few-shot learning with localization in realistic settings. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 6558–6567 (2019).
Liu, Y., Zhang, Z., Niu, L., Chen, J. & Zhang, L. Mixed supervised object detection by transferring mask prior and semantic similarity. Adv. Neural Inf. Process. Syst. 34, 3978–3990 (2021).
Google Scholar
Chen, J., Niu, L., Liu, L. & Zhang, L. Weak-shot fine-grained classification via similarity transfer. Adv. Neural Inf. Process.Syst. 34, 7306–7318 (2021).
Google Scholar
Almasri, F. & Debeir, O. Multimodal sensor fusion in single thermal image super-resolution. In Computer Vision–ACCV 2018 Workshops: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers 14, 418–433 (Springer, 2019).
Huang, Y., Jiang, Z., Lan, R., Zhang, S. & Pi, K. Infrared image super-resolution via transfer learning and psrgan. IEEE Signal Process. Lett. 28, 982–986 (2021).
Article ADS Google Scholar
Li, Z., Sun, Y., Zhang, L. & Tang, J. Ctnet: Context-based tandem network for semantic segmentation. IEEE Trans. Pattern Analysis Mach. Intell. 44, 9904–9917 (2021).
Article ADS Google Scholar
Li, Z., Tang, H., Peng, Z., Qi, G.-J. & Tang, J. Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Trans. Neural Netw. Learn. Syst. https://doi.org/10.1109/TNNLS.2023.3240195(2023).
Ding, X., Guo, Y., Ding, G. & Han, J. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 1911–1920 (Seoul, Korea (South), 2019).
Ding, X. et al. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13733–13742 (virtual, 2021).
Zhang, K. et al. AIM 2020 challenge on efficient super-resolution: Methods and results. In Computer Vision - ECCV 2020 Workshops, vol. 12537, 5–40 (Springer, Glasgow, UK, 2020).
Ding, X., Zhang, X., Han, J. & Ding, G. Diverse branch block: Building a convolution as an inception-like unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10886–10895 (virtual, 2021).
Zhang, X., Zeng, H. & Zhang, L. Edge-oriented convolution block for real-time super resolution on mobile devices. In Proceedings of the 29th ACM International Conference on Multimedia, 4034–4043 (2021).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
Simonyan, K. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).
Ma, C. et al. Structure-preserving super resolution with gradient guidance. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7769–7778 (2020).
Lim, B., Son, S., Kim, H., Nah, S. & Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 136–144 (2017).
Agustsson, E. & Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 126–135 (2017).
Liu, J. et al. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5802–5811 (2022).
Chunliu, L. & Wang, S. Iray-ship image database. http://openai.raytrontek.com/apply/Sea_shipping.html(2021). Accessed 10 December 2021.
Diederik, P. K. Adam: A method for stochastic optimization. CoRR arXiv:1412.6980 (2014).
Athiwaratkun, B., Finzi, M., Izmailov, P. & Wilson, A. G. There are many consistent explanations of unlabeled data: Why you should average. arXiv preprint arXiv:1806.05594 (2018).
Bevilacqua, M., Roumy, A., Guillemot, C. & Alberi-Morel, M. L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC). BMVA Press, 135.1–135.10 (2012).
Zeyde, R., Elad, M. & Protter, M. On single image scale-up using sparse-representations. In Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7, 711–730 (Springer, 2012).
Martin, D., Fowlkes, C., Tal, D. & Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings eighth IEEE international conference on computer vision. ICCV 2001, vol. 2, 416–423 (IEEE, 2001).
Huang, J.-B., Singh, A. & Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE conference on computer vision and pattern recognition, 5197–5206 (2015).
Matsui, Y. et al. Sketch-based manga retrieval using manga109 dataset. Multimed. tools applications 76, 21811–21838 (2017).
Article Google Scholar
Shi, W. et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1874–1883 (2016).
Dong, C., Loy, C. C. & Tang, X. Accelerating the super-resolution convolutional neural network. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, 391–407 (Springer, 2016).
Hui, Z., Gao, X., Yang, Y. & Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th acm international conference on multimedia, 2024–2032 (2019).
Wang, Y., Li, Y., Wang, G. & Liu, X. Multi-scale attention network for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5950–5960 (2024).
Zheng, M., Sun, L., Dong, J. & Pan, J. Smfanet: A lightweight self-modulation feature aggregation network for efficient image super-resolution. In European Conference on Computer Vision, 359–375 (Springer, 2024).
Mittal, A., Soundararajan, R. & Bovik, A. C. Making a “completely blind’’ image quality analyzer. IEEE Signal Process. Lett. 20, 209–212 (2012).
Article ADS Google Scholar
Venkatanath, N., Praneeth, D., Bh, M. C., Channappayya, S. S. & Medasani, S. S. Blind image quality evaluation using perception based features. In 2015 twenty first national conference on communications (NCC), 1–6 (IEEE, 2015).

Download references

Acknowledgements

The authors would like to express their gratitude to the anonymous reviewers and editors who worked selflessly to improve our manuscript.

Funding

The project was supported by the Natural Science Foundation of Fujian Province of China (No.2023J01130137) and Fuzhou University (GXRC-18066).

Author information

Authors and Affiliations

Department of Rehabilitation Engineering, China Civil Affairs University, 102600, Beijing, China
Ran Wei
College of Mechanical Engineering and Automation, Fuzhou University, 350108, Fuzhou, China
Linze Zuo, Xuesong Wang & Xianyu Wu

Authors

Ran Wei
View author publications
Search author on:PubMed Google Scholar
Linze Zuo
View author publications
Search author on:PubMed Google Scholar
Xuesong Wang
View author publications
Search author on:PubMed Google Scholar
Xianyu Wu
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, X. Wu.; Formal Analysis, R.W. and X. Wang; Methodology, L.Z. and X. Wu.; Software, L.Z.; Validation, L.Z.; Investigation, R.W. and X. Wang; Data Curation, R.W.; Visualization, L.Z.; Writing – Original Draft Preparation, R.W. and L.Z.; Writing – Review and Editing, X. Wu.; Supervision, X. Wu.; Project Administration, X. Wu.; Funding Acquisition, X. Wu. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Xianyu Wu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wei, R., Zuo, L., Wang, X. et al. Reparameterizable large kernel attention networks for infrared image super-resolution. Sci Rep 15, 40612 (2025). https://doi.org/10.1038/s41598-025-24193-3

Download citation

Received: 30 August 2024
Accepted: 13 October 2025
Published: 18 November 2025
Version of record: 18 November 2025
DOI: https://doi.org/10.1038/s41598-025-24193-3

Reparameterizable large kernel attention networks for infrared image super-resolution

Subjects

Abstract

Similar content being viewed by others

Image super-resolution method using a generative adversarial network incorporating attention and residual density

Dual-branch feature encoding framework for infrared images super-resolution reconstruction

A lightweight large receptive field network LrfSR for image super-resolution

Introduction

Proposed method

Network architecture

Structural reparameterization

Network structure of the ECB module

Network structure of the REPLKA module

Experiments

Experimental setup

Ablation experiment of the REPLKA module

Results of the infrared image experiment

Neural network processor inference results

Discussion

Analysis of reconstruction model limitations under noise and low-contrast conditions

Performance gain bottlenecks with increasing model complexity

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Image super-resolution method using a generative adversarial network incorporating attention and residual density

Dual-branch feature encoding framework for infrared images super-resolution reconstruction

A lightweight large receptive field network LrfSR for image super-resolution

Introduction

Proposed method

Network architecture

Structural reparameterization

Network structure of the ECB module

Network structure of the REPLKA module

Experiments

Experimental setup

Ablation experiment of the REPLKA module

Results of the infrared image experiment

Neural network processor inference results

Discussion

Analysis of reconstruction model limitations under noise and low-contrast conditions

Performance gain bottlenecks with increasing model complexity

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links