A super-resolution network based on dual aggregate transformer for climate downscaling

Li, Meng; Chen, Yijing; Song, Zhihui

doi:10.1038/s41598-025-17234-4

Download PDF

Article
Open access
Published: 29 September 2025

A super-resolution network based on dual aggregate transformer for climate downscaling

Meng Li¹^na1,
Yijing Chen¹^na1 &
Zhihui Song¹

Scientific Reports volume 15, Article number: 33504 (2025) Cite this article

1305 Accesses
1 Citations
Metrics details

Subjects

Abstract

This paper addresses the problem of climate downscaling. Previous research on image super-resolution models has demonstrated the effectiveness of deep learning for downscaling tasks. However, most existing deep learning models for climate downscaling have limited ability to capture the complex details required to generate High-Resolution (HR) image climate data and lack the ability to reassign the importance of different rainfall variables dynamically. To handle these challenges, in this paper, we propose a Climate Downscaling Dual Aggregation Transformer (CDDAT), which can extract rich and high-quality rainfall features and provide additional storm microphysical and dynamical structure information through multivariate fusion. CDDAT is a novel hybrid model consisting of a Lightweight CNN Backbone(LCB) with High Preservation Blocks (HPBs) and a Dual Aggregation Transformer Backbone(DATB) equipped with the adaptive self-attention. Specifically, we first extract high-frequency features employing LCB, which adopts HPBs to dynamically reduce the resolution of rainfall processing features and to extract rainfall image depth features at low cost. Then we utilize the DATB to alternately apply spatial window and channel self-attention for spatial and channel features aggregation. Furthermore, we introduce a multimodal fusion operation based on a convolutional neural network. Finally, we evaluate the CDDAT using the NJU-CPOL dataset. The experimental results demonstrate that the proposed network can perform high texture restoration of rainfall images and achieve state-of-the-art-results in climate downscaling tasks.

Comparative study of single and hybrid deep learning models for daily rainfall prediction in selected African cities

Article Open access 28 November 2025

A high-resolution database of historical and future climate for Africa developed with deep neural networks

Article Open access 22 July 2025

Transformer based models with hierarchical graph representations for enhanced climate forecasting

Article Open access 02 July 2025

Introduction

Climate profoundly impacts social and economic activities, making high-resolution climate prediction a crucial research area for many years. Radar reflectivity data serves as fundamental information for analyzing and forecasting catastrophic weather events. Higher radar resolution leads to more detailed image structures, enabling early detection and warning of severe weather events. The concept of downscaling is to make estimations at a finer spatial scale than that of the original datasets, aiming to enhance information and details. Within climate sciences, climate downscaling methods have been developed and generally fall into two categories: dynamic climate downscaling models and statistical downscaling methods. Recent studies have addressed the Super-Resolution (SR) techniques to solve the climate downscaling problem. SR^1,2,3 has long been a prominent topic in computer vision, aiming to generate high-resolution images from low-resolution inputs. Various SR methods have been introduced for reconstructing high-resolution images, with applications spanning diverse fields such as medical diagnostics⁴, transportation⁵ and addressing the challenges of enhancing precipitation data in the field of climatology. SR models are mainly constructed by Convolutional Neural Networks (CNNs) to generate High-Resolution (HR) images with minimal differences compared to Low-Resolution (LR) images, which is a significant improvement over traditional interpolation methods. Among them, CNNs equipped with a large number of learnable weights can effectively model and capture the complex spatial patterns embedded in LR images. However, these algorithms primarily optimize by minimizing the mean square error, leading to a lack of high-frequency content in the final generated images. Moreover, the convolution operations adopt a local mechanism, which hinders the establishment of global dependencies and limits the overall performance of the model. Very recently, the transformer-based model has been proposed for SR problem. The central component of the Transformer is the Self-Attention (SA) mechanism⁶, which facilitates the establishment of global dependencies. This property alleviates the limitations of CNN-based algorithms. Some studies have shown that Transformer mitigates the high complexity of global self-attention.

Moreover, utilizing multiple types of information in deep learning, also known as multimodal learning, is drawing growing interest⁷. With the rapid growth of the types of indicators available in the meteorological field, deep learning-based climate downscaling tasks should also consider the problem of multimodal learning. Radar data poses more complexity compared to standard images due to its multiscale, multidimensional nature and susceptibility to factors like illumination, cloud cover and sensor noise. Dual polarization radar provides rich polarization information, including the horizontal reflectance factor $\text {Z}_\text {H}$ indicating precipitation intensity, the $\text {Z}_\text {DR}$ representing the horizontal and vertical echo intensity difference, and $\text {K}_\text {DP}$ measuring liquid water content in the atmosphere^8,9. The patterns of $\text {Z}_\text {DR}$, $\text {K}_\text {DP}$ and $\text {Z}_\text {H}$ reveal distinct characteristics of size and distribution of raindrops, which may change dramatically during different evolution stages of storms and thus provide evolution information about storms. Inspired by this pattern of polarization radar, we propose to integrate dual polarization radar data into a unified climate image to enhance model training. Therefore, in this paper, we propose a new model Climate Downscaling Dual Aggregation Transformer (CDDAT) to facilitate the use of multiple input information, to achieve more precise climate downscaling. Specifically, we firstly construct a multimodal fusion structure that uses CNNs to merge multiple radar variables ($\text {Z}_\text {DR}$, $\text {K}_\text {DP}$ and $\text {Z}_\text {H}$) into climate images. Then, shallow feature extraction is implemented by a convolutional layer that maps the input low-resolution data to potential vectors. Furthermore, we design a “CNN+Transformer” pattern as the deep feature extraction part, which can be divided into two parts, Lightweight CNN Backbone(LCB) and DATB. For one thing, LCB reduces the shape of the intermediate layer feature mapping more, dynamically adjusts the feature mapping size to extract high-frequency features, and maintains the deep network depth. The basic feature extraction unit of LCB is the Adaptive Residual Feature Block (ARFB)¹⁰, which can adaptively adjust the residual weights. Hence LCB has the advantage of dynamically adjusting the size of feature mapping to extract high-frequency features. LCB can achieve significantly reduces computational resource consumption and the number of parameters while enabling efficient visual feature extraction.For another, we design DATB to aggregate spatial and channel dimensional features by applying spatial self-attention and channel self-attention alternately in successive Transformer blocks. Since DATB’s self-attention employ shift window operations, it not only efficiently learns the relationship between similar local blocks, but also captures more spatial information, which can lead to more references in the super-resolution region. It is worth noting that DATB’s self-attention is operated using a shift window, so more spatial and channel information can be captured, resulting in more reference information in the super-resolution region. Finally, we use a pixel transformation (or pixel rearrangement) layer for image scaling to achieve image reconstruction. Figure 1 gives the flowchart of our method. Firstly, our multi-radar data is extracted from the source dataset with outlier processing and selected according to the experimental standards. The multi-radar data is then preprocessed including normalization and fused using CNN and then used as input into our proposed model. The LCB and DATB for our model transform the input data from shallow feature into deep features. Based on the pixel transformation layer, the HR image is reconstructed. The main contributions are as follows:

We propose a CDDAT model based on dual aggregation Transformer to generate high-resolution climate predictions. To our knowledge, it is the first attempt to adopt the Transformer model in the SR domain to realize climate downscaling.
We propose the multi-modal fusion method based on CNN modules, which aims to fuse multiple radar metrics data. The dual polarization radar variables ($\text {K}_\text {DP}$ and $\text {Z}_\text {DR}$) are multimodal fused as input model data, which can mine dynamic structural information of convective precipitation over the single variable.
We conduct extensive comparative experiments on the NJU-CPOL dataset to verify the effectiveness and superiority of CDDAT.

Related work

Statistical Climate Downscaling (SD) is a classic tool for obtaining small-scale climate information¹¹. The method relies on historical observations of statistical correlations between General Circulation Models (GCMs) and regional historical observational data and assumes that the relationship is constant, defining the prediction as a function of the predictor. This approach can consist of two main components. First, it is necessary to discover and establish empirical relationships between large-scale climate factors (predictors) and local climate factors (predictors)^12,13. Second, this empirical relationship is applied to outputs from global models or regional models, including precipitation, temperature, or barometric pressure. The methods for implementing SD tasks are gradually enriched. Regression models have been widely used to fit probability distributions^14,15. The non-homogeneous Hidden Markov Model (NHMM)¹⁶ was introduced to solve the rainfall problem^17,18. However, SD methods are based on the statistical analysis of a long series of historical data and can only obtain finer spatial information. These are often accompanied by ignoring spatial and temporal inconsistencies. In addition, the application of such methods is also limited in areas where there is a lack of measured data. In recent years, more and more climate models have adopted super-resolution techniques for further analysis of climate datasets. Deep learning algorithms for precipitation forecasting in the meteorological field mainly utilize CNNs and Generative Adversarial Networks (GANs). For example, DeepSD¹⁹ was first used in a spatial downscaling task, integrating complex precipitation data into a single image using computer vision methods and leveraging the latent information of precipitation variables. It also uses topographic data as an input channel, taking into account geographic influences. Then, Kumar et al.²⁰ obtained the optimal spatial distribution of rainfall magnitude over India based on DeepSD downscaling results. Models in the super-resolution domain utilize a series of residual blocks and attention mechanisms for continuous improvement of the local perceptual limitations of CNN. The climate downscaling approach is similar to the advances in the SR field. Cheng et al.²¹ introduced a residual dense block to the LapSRN network. They can collect different scales of high-resolution results at the corresponding level. They also conducted a detailed study of checkerboard artifact elimination in parameter studies of deconvolutional layers. Sharma et al.²² introduced ResDeepD with a series of skip connections across residual blocks, presenting faster output and better results. Chiang et al.²³ proposed networks of jump connections, attention blocks, and auxiliary data cascades for heterogeneous precipitation simulation data with bias correction.

Recent climate downscaling methods are SRGAN-based models, such as DeepDT²⁴, PhIRE GAN²⁵ and ProGAN²⁶, where perceptual loss is designed to improve the quality of subjective reconstruction of images, which has been shown to outperform other types of models, such as Augmented Convolutional Long Short-Term Memory (ConvLSTM) and U-Net²⁷. Li et al²⁸ proposed an unsupervised model-guided coarse-to-fine fusion model for hyperspectral image super-resolution task. By fusing deep image prior and degradation model information, Li et al.²⁹ presented a model-informed unsupervised method to deal with hyperspectral image super-resolution problem. Yao et al.³⁰ built a multi-graph neural network for hyperspectral image classification. Ding et al.³¹ constructed a multi-scale receptive field GAT to classify hyperspectral images. Chen et al.³² designed a neural network to detect hyperspectral image changes. Recently, SCNet³³ has emerged as a remarkable model, adept at meticulously extracting intricate structural information from data, which significantly enhances the performance in tasks such as feature extraction and pattern recognition. Concurrently, OSEDiff³⁴, a cutting-edge generative model, showcases its prowess by proficiently integrating multiple types of information to generate highly realistic and high-quality outputs, demonstrating great potential in addressing complex data generation challenges.

However, the above models all select individual rainfall data or add terrain, temperature, and other climate factors to assist in downscaling. Although relevant data is added during downscaling training, the model lacks the ability to reassign the importance of multiple variables. Radar indicators are subject to the inherent defects of radar systems, and the data input into the model sometimes contains noise or unexpected deviations. The presence of low-quality variables will reduce the accuracy of the prediction. In contrast, our model can adaptively reassign the importance of multiple variables, in other words, by decreasing the weights of features that negatively affect the prediction and increasing the weights of features that positively contribute to the prediction.

Materials

Data

This study uses data from a C-band dual polarization weather radar operated by Nanjing University. The data set (termed NJU-CPOL) was collected during 2014–2019, covering 268 precipitation events. The original radar base data ($\text {Z}_\text {H}$, $\text {Z}_\text {DR}$ and $\text {K}_\text {DP}$) were obtained from NJU-CPOL for training, verification and testing. During model training, we use vertical flipping operation as a data augmentation method for radar meteorological data. As shown in Fig. 2, Bilinear Interpolation is used to downsample the HR image to obtain the corresponding LR image. For the 2× super-resolution (2$\times$) experiment, the height and width of the downsampled LR image are 1/2 of the original HR image; while for the 4× super-resolution (4×) experiment, the height and width of the downsampled LR image are 1/4 of the original HR image. In addition, to increase the diversity of the dataset and verify the effectiveness of the algorithm during rainfall weather conditions, since part of the radar reflectivity information is invalid , such data will be discarded when constructing the training set. Note that the pixels on each image record the climate level at a certain time.

Data preprocessing

In this paper, the randomization strategy is adopted to ensure that the data order is different after each shuffling, so as to ensure that the combination of training samples has sufficient randomness. We choose the 0-1 normalization operation, which maps the result between [0,1] by performing a linear transformation on the raw data. The calculation process is shown in the formula :

$$\begin{aligned} {{\textbf {X}}}^{*} = \frac{{\textbf {X}}-{{\textbf {X}}}_{min}}{{\textbf {X}}_{max}-{\textbf {X}}_{min}} \end{aligned}$$

(1)

where, $\textbf{X}_{min}$ is the minimum value of the input data, ${\textbf {X}}_{max}$ is the maximum value of the input data. Through normalization processing, we can ensure that the input data have a consistent distribution, which improves the training efficiency and performance of the model. At the same time, some of the original radar data are treated as outliers. Some of the radar reflectivity information is invalid, which does not contribute much to the radar image downscaling task, but aggravates the imbalance phenomenon. For example, if the radar echoes of a single sample are all below 5 dBZ, these clutter data will be discarded when forming the training set. In practice, especially in the field of meteorology, the amount of meteorological data collected by radar observation stations is limited. Data enhancement technology exists to transform images or other types of data algorithmically, thereby increasing the diversity and quantity of data. In this paper, the operation of vertical flip is adopted to increase the number of valid samples, which can not only avoid the situation of overfitting of the model but also enhance its prediction ability for and adaptation ability to unknown data.

Model implementation

We implemented our model in the Pytorch framework. There are a total of 7310 precipitation data samples which are randomly separated into 80% for training, 10% for validation, and the other 10% for testing. The model training is optimized by Adam Optimizer³⁵ with default settings and a learning rate of $10^{-4}$. The training process was performed on NVIDIA GeForce RTX 2080Ti. The CDDAT can be optimized with the commonly used loss functions for SR, such as $L_2$^1,36,37, $L_1$^38,39,40 and perceptual losses^41,42. For simplicity, given several N ground-truth HR images ${{\textbf {X}}_{t, i}}_{i=1}^{N}$, we optimize the parameters of CDDAT by minimizing the pixel-wise $L_1$ loss:

$$\begin{aligned} L_1 =\frac{1}{N} \sum _{i=1}^{n} \Vert {\textbf {X}}_{h,i}-{\textbf {X}}_{t,i} \Vert _{1} \end{aligned}$$

(2)

Methods

In this section, the proposed CDDAT consists of two main parts. The first part introduces the process of image preprocessing. The second part is to introduce our downsampling method. Subsequently, we elaborate on the core components of CDDAT: the Lightweight CNN Backbone (LCB) and the Dual Aggregation Transformer Backbone (DATB). These parts are introduced in the following subsections.

CNN for image preprocessing

As mentioned above, there are several measurement dimensions for weather forecasting using dual-polarization radar. In this paper, reflectance factor ($\text {Z}_\text {H}$), differential reflectance factor ($\text {Z}_\text {DR}$) and differential phase factor ($\text {K}_\text {DP}$) are used. In addition, the general color image includes three channels, such as red, blue, and green. Each channel represents a feature of the picture. Inspired by this structure, we populate each image channel with radar data to obtain a multi-channel climate image for model input. The following steps describe the generation process of model input images. First, we obtain the relevant reflectance factor ($\text {Z}_\text {H}$), differential reflectance factor ($\text {Z}_\text {DR}$), and differential phase factor ($\text {K}_\text {DP}$) data from the NJU-CPOL dataset. Then, we treat each radar indicator as a channel to get the climate image Ic (three-channel). As shown in Fig. 3, by transforming the radar measurements into multiple channels, we can reduce the complexity of the radar image input to the model and capture a richer set of non-linear features in the climate image. After processing the raw radar data , a standard rainfall dataset is generated from the NJU-CPOL data. Precipitation in each dataset is recorded in the region. The $\text {Z}_\text {H}$, $\text {Z}_\text {DR}$ and $\text {K}_\text {DP}$ are transformed from the radar measurements, and then they are input into a 1$\times$1 convolution layer to obtain a three- channel input data of our proposed model. This processing can reduce the complexity of the radar image input to the model and capture a richer set of non-linear features in the climate image.

Network structure

As shown in Fig. 4, the main purpose of this work is to obtain a high-resolution result ${{\textbf {I}}}_{SR}$ from a given low-resolution input ${{\textbf {I}}}_{LR}$. It can be simplified to ${{\textbf {I}}}_{SR}=Model({{\textbf {I}}}_{LR};\varvec{\theta } )$ , where $\varvec{\theta }$ is the hyperparameter of the model. The CDDAT proposed in this paper includes three modules: shallow feature extraction, deep feature extraction, and image reconstruction. The shallow feature extraction is obtained by a convolution layer. The deep feature extraction is accomplished based on LCB and DATB where LCB consists of a group of HPBs and DATB is mainly composed of Dual Spatial Transformer Backbone (DSTB) and Dual Channel Transformer Backbone (DCTB) with global residual learning. The image reconstruction is achieved using PixelShuffle layer. Initially, we employ a convolution layer to process it and generate the shallow feature ${{\textbf {F}}}_{0}$.

$$\begin{aligned} {{\textbf {F}}}_{0} = {f}_{s}({{\textbf {I}}}_{LR}) \end{aligned}$$

(3)

where ${f}_{s}$ denotes the shallow feature extraction layer. ${{\textbf {F}}}_{0}$ is the extracted shallow feature, which is then used as the input of LCB with several High Preserving Blocks (HPBs).

$$\begin{aligned} {{\textbf {F}}}_{n} = {f}^{n}_{HPB} ({f}^{n-1}_{HPB}(...({f}^{1}_{HPB}({{\textbf {F}}}_{0})))) \end{aligned}$$

(4)

where ${f}^{n}_{HPB}$ denotes the mapping of $n-th$ HPB and ${{\textbf {F}}}_{n}$ represents the output of $n-th$ HPB.

$$\begin{aligned} {{\textbf {F}}}_{D} = {f}^{n}_{D} ({f}^{n-1}_{D}(...({f}^{1}_{D}[{{\textbf {F}}}_{1},{{\textbf {F}}}_{2},...,{{\textbf {F}}}_{n}]))) \end{aligned}$$

(5)

where ${{\textbf {F}}}_{D}$ is the output of DATB and ${f}_{D}$ stands for the operation of DATB. Finally, ${{\textbf {F}}}_{D}$ and ${{\textbf {F}}}_{0}$ are simultaneously fed into the reconstruction module to get the SR image ${{\textbf {I}}}_{SR}$.

$$\begin{aligned} {{\textbf {I}}}_{SR} = f({f}_{P} (f({{\textbf {F}}}_{d}+({\textbf {F}}_{0})) \end{aligned}$$

(6)

where f and ${f}_{P}$ stand for the convolutional layer and PixelShuffle layer, respectively.

Skip connections

To avoid gradient vanishing for training a model, skip connections have been widely used for both residual learning and decreasing training parameters. As shown in Fig. 5, we combine non-linear mapping $f({{\varvec{x}}})$ with input ${{\varvec{x}}}$ to form residual learning, that is, by rewriting the original mapping into $f({{\varvec{x}}})+{{\varvec{x}}}$. It is said to be easier to optimize the residuals than the original mapping. To the extreme that the mapping is identical, learning a zero residual is easier than fitting the original one by stacked non-linear layers⁴³. We also apply several skip connections in subsequent Transformer modules, for example, High Preservation Blocks (HPB) and its core form Adaptive Residual Feature Blocks(ARFB). HPB is used to maximize the utilization of feature information under a low computational budget, thereby avoiding a sharp performance drop caused by reductions in network depth . The adaptive mechanism of ARFB enables the model to adjust to different input contents. In addition, inspired by the VDSR¹, this paper uses skip joins for global residual learning. Different from^1,43, our skip connections is combined with Transformer modules to facilitate global modeling in Transformer architectures.

Lightweight CNN backbone (LCB)

The radar image dataset created is multidimensional data with spatial correlation. Dual-polarization radar images exhibit high local spatial correlation and rich texture features. In addition, to avoid occupying a large amount of GPU memory, a lightweight CNN backbone network is mainly used to extract shallow features from the input image ${\textbf {I}}_{LR}$, enabling the model to have initial super-resolution capability. According to Fig. 4, we can observe that LCB is mainly composed of a series of HPB, mainly using ARFB. As displayed in Fig. 6b, ARFB consists of two residual units (RU) and two convolutional layers. Assuming that ${{\textbf {x}}}_{ru}$ is the input of RU, the process of RU can be formulated as:

$$\begin{aligned} {\textbf{y}}_{ru} = \lambda _{res} \cdot {f}_{ex}({f}_{re}({{\textbf {x}}}_{ru})) + \lambda _{x}\cdot \textbf{x} \end{aligned}$$

(7)

HPB, which is mainly composed of ARFB, is used to obtain high-frequency features of rainfall maps. As shown in Fig. 6a, input feature mapping (${\textbf {Q}}_ {s}$) after average pooling and upsampling gets ${\textbf {Q}}_{u}$ elements, to calculate the high-frequency information ($\textbf{P}_ {h}$). ${\textbf {Q}}_{u}$ is the average information of ${\textbf{Q}}_{s}$. The purpose of this operation is to preserve the details and edges of the feature map until the high-frequency feature is downsampled. The formula for this process is:

$$\begin{aligned} {\textbf{P}}_{h} ={\textbf{Q}}_{s} - {\textbf{Q}}_{u} \end{aligned}$$

(8)

The downsampled feature maps are denoted as ${{\textbf {F}}}^{'}_{n-1}$. Several ARFBs are utilized to explore the potential information for completing the SR image. It is worth noting that these ARFBs share weights to reduce parameters. Meanwhile, a single ARFB is used to process the ${{\textbf{P}}}_{h}$ to align the feature space with ${{\textbf{F}}}^{'}_{n-1}$. After feature extraction, ${{\textbf{F}}}^{'}_{n-1}$ is upsampled to the original size by bilinear interpolation. After that, we fuse ${{\textbf{F}}}^{'}_{n-1}$ with ${{\textbf{P}}}^{'}_{h}$ for preserving the initial details and obtain the feature ${{\textbf{F}}}^{''}_{n-1}$. This operation can be expressed as

$$\begin{aligned} {\textbf{F}}^{''}_{n-1} =[f_{a}({{\textbf{P}}}_{h}), u(f^{4}_{a}(d({{\textbf{F}}}^{'}{n-1}))] \end{aligned}$$

(9)

where u and d denote the upsampling and downsampling operations, respectively, $f_{a}$ denotes the operation of ARFB. Then, a 1$\times$1 convolution layer is used to reduce the channel number. And a channel attention module is employed to highlight channels with high activation values. Finally, an ARFB is used to extract the final features and the global residual connection is proposed to add the original features ${{\textbf{F}}}^{'}_{n-1}$ to ${{\textbf{F}}}_{n}$.

Dual aggregation transformer block based on BSConv

Blueprint Separable Convolution(BSConv) decomposes standard convolution into pointwise convolution and depthwise convolution. It is the inverse operation of depthwise separable convolution (DSConv). A study⁴⁴ shows that BSConv performs better in effectively separating standard convolution. Compared to DSConv, BSConv adds pointwise convolution to achieve interaction between channels.

The Dual-aggregate Transformer Block (DATB) based on blueprint convolution is equipped with Adaptive Self-Attention (ASA) and Blueprint Spatial Gate Feedforward Network (BSGFN)⁴⁵. BSGFN can enhance the modeling of spatial relationships and feature interactions within data. As shown in Fig. 7, if a given input feature is fed into the n-th DATB block, the specific formula is defined as follows:

$$\begin{aligned} {\textbf{X}}^{'}_{n} = \text {ASA}(\text {LN}({\textbf{X}}_{n-1})) + {\textbf{X}}_{n-1} \end{aligned}$$

(10)

$$\begin{aligned} {\textbf{X}}_{n} = \text {BSGFN}(\text {LN}({\textbf{X}}^{'}_{n})) + {\textbf{X}}^{'}_{n} \end{aligned}$$

(11)

where ${\textbf{X}}_{n}$ is the output feature, LN($\cdot$) is the LayerNorm layer. ASA denotes adaptive self-attention (see Fig. 8), which includes Adaptive spatial self-attention (AS-SA) and Adaptive channel self-attention (AC-SA). Spatial Window Self-Attention(SW-SA): As shown in Fig. 9a, query, key, and value matrices (denoted as $\textbf{Q}$, $\textbf{K}$, and $\textbf{V}$, respectively) are generated using linear projection, where all matrices are in $\textbf{R}^{H \times W \times C}$ space. The specific formula of SW-SA is defined as follows:

$$\begin{aligned} \text {SW}-\text {SA}(\textbf{X})={\textbf{Y}}_{s}{\textbf{W}}_{p} \end{aligned}$$

(12)

where ${\textbf{W}}_{p} \in {{\textbf{R}}^{C \times C}}$ is the linear projection to fuse all features. The feature ${\textbf{Y}}_{s} \in {{\textbf{R}}^{H \times W \times C}}$ is obtained by reshaping and concatenating all ${\textbf{Y}}_{s}^{i}$, where ${\textbf{Y}}_{s}^{i}$ is the spatital self-attention of i-th head.

Channel-Wise Self-Attention(CW-SA): As shown in Fig. 9b, the self-attention mechanism in the channel-wise self-attention (CW-SA) is performed along the channel dimension. By the principle of SW-SA, the channel is subsequently segmented into heads, and attention is individually applied to each head. Finally, attention features ${\textbf{Y}}_{c} \in {{\textbf{R}}^{H \times W \times C}}$ are obtained by concatenating and reshaping then all ${\textbf{Y}}_{c}^{i}$.

As depicted in Fig. 8a, b, self-attention mechanisms AS-SA and AC-SA are designed based on SW-SA and CW-SA, respectively. AS-SA is to enhance the model’s ability to capture contextually relevant spatial dependencies while maintaining computational efficiency. AC-SA is designed for enhancing model performance by enabling context-aware feature refinement across channels.The process is defined as

$$\begin{aligned} \text {AS} - \text {SA}(\textbf{X}) = \left( \text {C} - \text {I}({\textbf{Y}}_{s},{\textbf{Y}}_{w}) + \text {S} - \text {I}({\textbf{Y}}_{w},{\textbf{Y}}_{s}) \right) {\textbf{W}}_{p} \end{aligned}$$

(13)

$$\begin{aligned} \text {AC} - \text {SA}(\textbf{X}) = \left( \text {S} - \text {I}({\textbf{Y}}_{c},{\textbf{Y}}_{w}) + \text {C} - \text {I}({\textbf{Y}}_{w},{\textbf{Y}}_{c}) \right) {\textbf{W}}_{p} \end{aligned}$$

(14)

where ${\textbf{Y}}_{s}$, ${\textbf{Y}}_{c}$ and ${\textbf{Y}}_{w}$ are the outputs of SW-SA, CW-SA, and BS-Conv defined above. ${\textbf{W}}_{p}$ is the projection matrix the same as Eq. (12), S-I and C-I are shown in Fig. 10. Compared with the serial and parallel designs of attention mechanisms, AS-SA and AC-SA have their unique advantages. Firstly, they effectively combine local convolutional information with global self-attention information to improve the quality of feature fusion. Moreover, a simple addition combination can lead to a misalignment of features. The adaptive interaction makes the outputs of the two branches adapt to each other, thereby optimizing the feature fusion effect. Secondly, AS-SA uses complementary cues through channel interaction to improve channel modeling capabilities, while AC-SA enhances representation capabilities through spatial interaction. Importantly, by adding adaptive interactions, convolutional branches can capture global information in the same way as self-attention, thus improving the quality of the convolutional output. Finally, we use the pixel shuffling method from efficient sub-pixel convolutional neural network (ESPCN) and add another convolution layer to optimize the block result, as shown in Fig. 11. Thus, the detailed spatial layout of rainfall in radar images can be further restored.

Data availability

The core code of our method is available on GitHub (https://github.com/MM-miao11/CDDAT) and archived via Zenodo (https://doi.org/10.5281/zenodo.16784015).

Results

The performance of the generated climate images

The Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) were selected to verify the reproduction performance of weather radar and the similarity of the original image resolution. The higher the PSNR value, the better the reconstruction performance of the algorithm. The range of SSIM is between 0 and 1, and the higher the SSIM value, the closer the reconstructed climate image is in structure.

Radar precipitation data reconstruction supervisor effect comparison

In order to verify the performance of CDDAT in radar images, the effectiveness of the proposed method can be evaluated by directly observing the final output images. In the overall comparison image of subjective effects at a magnification factor of 2, as shown in Fig. 12, significant performance differences between different super-resolution models can be observed. The images reconstructed by the Bicubic model are the most blurred, with unclear edges, overall dim brightness, and the highest amount of accompanying image noise. In comparison, the reconstruction images produced by the SRCNN algorithm show improvements in clarity and brightness recovery, with some enhancement in contour details. However, the high-resolution precipitation distribution map generated by the CDDAT model exhibits similar effects to DAT, SRGAN, SCNet and OSEDiff. As shown in Fig. 13 below, the image reconstructed by the Bicubic model still has the worst effect when the magnification factor is 4. Different from the effect of ×2, the effect of the SRGAN model is fuzzy with serious artifacts and unclear contour details when the magnification factor is 4. The effect of the SCNet model is slightly fuzzy. The reconstruction effect of DAT, OSEDiff and CDDAT models is close to that of high-resolution rainfall map HR. It shows that these two models have the best rainfall prediction effect. The following section will compare the objective indicators of CDDAT and DAT models. In order to evaluate the performance of these three models more clearly, this paper focuses on selected rainfall images generated during severe storm weather. In these images, the color depth directly reflects the rainfall, and most models tend to produce overly smooth edge boundaries and excessive dark patterns when reconstructing dark areas, that is, parts of the image where red and orange areas merge together. Next, we will conduct a detailed comparative analysis of this problem to comprehensively evaluate the performance of each model in rainfall map reconstruction. Faced with complex precipitation distributions, such as local heavy rainfall meteorological conditions, there is an issue of neglecting some small echo information, highlighted in Fig. 14 with a black box. Within the black box, during the downsizing process implemented by different models, it is found that SRCNN, SRGAN, SCNet and OSEDiff lose small amounts of rainfall information. Surprisingly, the Bicubic algorithm produces blurry images but does not lose radar information. At this point, the DAT model and the CDDAT model perform well in heavy rainfall scenes, preserving the information of small rainfall, with the brightness and clarity of the rainfall images generated by CDDAT being the highest overall, effectively suppressing the unclear texture noise in the generated images. During the downscaling process, there are more dark areas in the generated rainfall images, indicating regions with high rainfall amounts. It indicates that while noise reduction occurs during the generation of rainfall images, noise points are still present in the dark areas. Choosing a higher magnification factor can better highlight the differences between models. Therefore, we selected strong precipitation images downscaled by a factor of 4, which are images with rich dark areas of rainfall, to compare the rainfall generated by the SRGAN, DAT, SRCNN, SCNet, OSEDiff and CDDAT models. As shown in Fig. 15, we found that SRGAN, SCNet and OSEDiff produce ghosting, affecting the representation of dark areas. The proposed CDDAT model, with its high-frequency extraction module compared to the basic DAT model, can generate more detailed high-frequency information, which is advantageous for the representation of dark areas, showing the best results.

Table 1 Average peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) results of the reconstructed echo of the NJU-CPOL radar.

Full size table

Comparison of objective indexes of radar precipitation data reconstruction

As shown in Table 1 , the results indicate that models based on Transformers outperform traditional super-resolution methods in all scale modes, with the model built on Transformers performing the best overall. Specifically, the model CDDAT, which incorporates a high-frequency extraction module and double-aggregated Transformer blocks proposed in this paper, demonstrates the best performance. This suggests that compared to CNN-based models, transformer-based models are also effective and show further performance improvement compared to models based on adversarial neural networks. For a scaling factor of 2, the proposed CDDAT network achieved the highest PSNR value. The PSNR value of DAT is 0.16dB higher than that of SRGAN, while CDDAT is 0.27dB higher than SRGAN (an increase of 0.82%). Compared to the SRCNN model, the PSNR value improved significantly from 35.52 to 38.02 when using the CDDAT model. In comparison to the baseline model DAT⁴⁶, CDDAT achieved a 0.11 increase in PSNR (0.29% improvement). Additionally, there was a minor increase in SSIM, further reinforcing the performance enhancement of the CDDAT model. For a scaling factor of 4, CDDAT’s PSNR value is 0.15dB higher than SRGAN and 0.09dB higher than DAT. In conclusion, the CDDAT super-resolution network demonstrates unique advantages in the field of meteorological rainfall, performing with high performance on rainfall datasets. This confirms the effectiveness of the algorithm in the task of rainfall downscaling.

Ablation experiments

Next, we study the effectiveness of the downscaling scheme proposed in this paper.

To test the effectiveness of the multi-radar indicator fusion scheme proposed in this paper for downscaling methods, different fusion schemes are selected for comparison for the three indicators $Z_{H}$, $Z_{DR}$, and $K_{DP}$.
To test the effectiveness of the high-frequency extraction module, the DAT and MC-DAT models are selected for the comparative experiments. Since the subjective demonstrations are too similar, the comparison mainly focuses on the performance of objective indicators.

The resuIts of the ablation experiments are shown in Table 2. In the DAT and MC-DAT models, the scheme of fusing the three indicators demonstrates the best comprehensive performance, with the highest PSNR and SSIM values. However, when down-sampling by a factor of 4, training the MC-DAT model after fusing $Z_{H}$ and $Z_{DR}$ yields the highest PSNR value, which reaches 32.93. Through comparison, it is found that adding $K_{DP}$ to the input data has little impact on the overall distribution of the model. For both the DAT and MC-DAT models, the PSNR values of the ($Z_{H}$, $Z_{DR}$) combination are higher than those of the ($Z_{H}$, $K_{DP}$) combination when training the models. For the DAT model, the PSNR results of fusing ($Z_{H}$, $Z_{DR}$) and ($Z_{H}$, $Z_{DR}$, $K_{DP}$) are identical, with a value of 32.83.

However, we find that $K_{DP}$ has a certain ability to enhance structural details. When down-sampling by a factor of 4, the SSIM value obtained by training the MC-DAT model with the fused dataset ($Z_{H}$, $Z_{DR}$, $K_{DP}$) is the highest, which is 0.8728. We can observe that, for both the DAT and MC-DAT models, when down-sampling by factors of 2 and 4, the SSIM results of the models trained with the dataset fused by the three indicators are higher than those of the models trained with a single $Z_{H}$ indicator, and those of the fused datasets ($Z_{H}$, $Z_{DR}$) and ($Z_{H}$, $K_{DP}$). The results show that the $K_{DP}$ indicator has little improvement effect on the prediction of the overall rainfall distribution, but it helps to increase the detailed information of rainfall images. Comprehensively considering, the $K_{DP}$ indicator can be retained as one of the input indicators of the model to enhance the detailed edge information of refined rainfall.

In conclusion, on the one hand, the above results demonstrate the effectiveness of the multi-radar-indicator fusion scheme proposed in this paper. The inclusion of the $Z_{DR}$ and $K_{DP}$ indicators increases the micro-physical characteristics of raindrop particles, which is beneficial for generating the detailed edges of rainfall images. On the other hand, the above results also illustrate that the MC-DAT has improved its performance due to the added high-frequency extraction module, enhancing the model’s ability to capture and extract high-frequency rainfall information.

Table 2 Comparison results of different indicators fusion.

Full size table

Comparative study on additional dataset

To further verify the generalization of our model, we carried out additional experimental evaluations using a different dataset⁴⁷, which contains urban climate information at 2.5 km (LR) and 250 m (HR) resolutions obtained by physical modelling paradigms.The Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are utilized to evaluate the reconstruction of dew point data for different models:

$$\begin{aligned} MAE = \frac{1}{N}\sum _{n = 1}^{N} \left| G_{n}^{HR} - y_{n}^{HR} \right| \end{aligned}$$

(15)

$$\begin{aligned} RMSE = \sqrt{\frac{1}{N} \sum _{n = 1}^{N} \left( G_{n}^{HR} - y_{n}^{HR} \right) ^2} \end{aligned}$$

(16)

where $G_{n}^{HR}$ denotes the nth ground truth, $y_{n}^{HR}$ denotes the nth SR model outputs and N is the number of samples.

From Fig. 16, it is clear that, the SR results from CDDAT contains more refined small-scale features than other models. Table 3 shows that CDDAT outperforms other methods with respect to the MAE and RMSE.The reason why the proposed CDDAT has better performance is that the multimodal fusion architecture of CDDAT facilitates more physically consistent information.

Table 3 MAE and RMSE for the reconstruction performance of dew point.

Full size table

Table 4 Comparison of model complexity.

Full size table

Complexity analysis

From Tables 1, 2, 3 and 4, it is clear that the proposed model (33.9M parameters, 100.76G FLOPS) that can perform better than DAT (33.7M parameters,100.12G FLOPS ) with a slightly larger in parameters and FLOPs, which shows the high efficiency of our proposed model. Compared with SRCNN (0.068M parameters, 0.055G FLOPS) and SRGAN (1.55M parameters, 53.67G FLOPS), our method shows a good trade-off between performance and model complexity.

Conclusions

In this paper, we propose a novel Climate Downscaling Dual Aggregation Transformer(CDDAT) for polarization radar convective storm weather precipitation data. We first adopt a lightweight CNN Backbone (LCB) that extracts high-frequency features from rainfall images and then utilize a Dual Aggregation Transformer Backbone (DATB) that models spatial and channel dimensions in rainfall images by alternately applying self-attention for spatial and channel feature aggregation. Finally, we chose NJU-CPOL dual-polarization weather radar data to evaluate our proposed method based on metrics of PSNR and SSIM. Extensive experimental results show that our model can recover the echo edges and details of severe convective weather more accurately than other existing methods, which is of great significance for the early warning of severe convective weather. Our proposed model can enhance disaster preparedness by providing detailed precipitation. The potential limitation of our model is that our model requires high-resolution training data, which may be sparse in developing regions.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Kim, J., Lee, J. K., & Lee, K. M. Accurate image super-resolution using very deep convolutional networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1646–1654, (2016). https://doi.org/10.1109/CVPR.2016.182
Allebach, J. & Wong, P. W. Edge-directed interpolation. In Proceedings of 3rd IEEE International Conference on Image Processing 707–710 , Vol. 3, (1996). https://doi.org/10.1109/ICIP.1996.560768
Zhang, Y., Tian, Y., Kong, Y., Zhong, B. & Fu, Y. Residual dense network for image super-resolution 08797, Vol. 1802 (2018).
Greenspan, H. Super-resolution in medical imaging. Comput. J. 52, 43–63. https://doi.org/10.1093/comjnl/bxm075 (2009).
Article Google Scholar
Suresh, K. V., Kumar, G. M. & Rajagopalan, A. N. Superresolution of license plates in real traffic videos. IEEE Trans. Intell. Transp. Syst. 8, 321–331. https://doi.org/10.1109/TITS.2007.895291 (2007).
Article Google Scholar
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7132–7141, (2018). https://doi.org/10.1109/CVPR.2018.00745
Baltrusaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 423–443, (2019). https://doi.org/10.1109/tpami.2018.2798607
Thurai, M. et al. Drop shapes and axis ratio distributions: Comparison between 2d video disdrometer and wind-tunnel measurements. J. Atmos. Ocean. Technol. 26, 1427–1432. https://doi.org/10.1175/2009JTECHA1244.1 (2009).
Article ADS Google Scholar
Zhao, K. et al. Recent progress in dual-polarization radar research and applications in China. Adv. Atmos. Sci. 36, 961–974. https://doi.org/10.1007/s00376-019-9057-2 (2019).
Article Google Scholar
Lu, Z., Liu, H., Li, J. & Zhang, L. Efficient transformer for single image super-resolution. (2021).
Von Storch, H., Zorita, E. & Cubasch, U. Downscaling of global climate change estimates to regional scales: An application to Iberian rainfall in wintertime. J. Clim. 6, 1161–1171 (1993).
Article ADS Google Scholar
Wilby, R. & Wigley, T. Precipitation predictors for downscaling: observed and general circulation model relationships. Int. J. Climatol. 20, 641–661 (2000).
Article Google Scholar
Chu, J., Kang, H., Tam, C., Park, C. & Chen, C. Seasonal forecast for local precipitation over northern Taiwan using statistical downscaling. J. Geophys. Res.: Atmos. https://doi.org/10.1029/2007jd009424 (2008).
Article Google Scholar
Fealy, R. & Sweeney, J. Statistical downscaling of precipitation for a selection of sites in Ireland employing a generalised linear modelling approach. Int. J. Climatol. 27, 2083–2094. https://doi.org/10.1002/joc.1506 (2007).
Article Google Scholar
Bergin, E. et al. Using satellite products to evaluate statistical downscaling with generalised linear models. EGU General Assembly Conference Abstracts,EGU General Assembly Conference Abstracts (2015).
Zucchini, W. & Guttorp, P. A hidden Markov model for space-time precipitation. Water Resour. Res. 27, 1917–1923. https://doi.org/10.1029/91wr01403 (1991).
Article ADS Google Scholar
Greene, A. M., Robertson, A. W., Smyth, P. & Triglia, S. Downscaling projections of Indian monsoon rainfall using a non-homogeneous hidden Markov model. Q. J. Royal Meteorol. Soc. 137, 347–359. https://doi.org/10.1002/qj.788 (2011).
Article ADS Google Scholar
Nyongesa, A. M., Zeng, G. & Ongoma, V. Non-homogeneous hidden Markov model for downscaling of short rains occurrence in Kenya. Theor. Appl. Climatol. 139, 1333–1347. https://doi.org/10.1007/s00704-019-03016-2 (2020).
Article ADS Google Scholar
Vandal, T. et al. Deepsd: Generating high resolution climate change projections through single image super-resolution. (Association for Computing Machinery, 2017). https://doi.org/10.1145/3097983.3098004
Kumar, B. et al. Deep learning–based downscaling of summer monsoon rainfall data over Indian region. Theor. Appl. Climatol. 143, 1145–1156. https://doi.org/10.1007/s00704-020-03489-6 (2021).
Article ADS Google Scholar
Cheng, J. et al. ResLap: Generating high-resolution climate prediction through image super-resolution. IEEE Access 8, 39623–39634. https://doi.org/10.1109/ACCESS.2020.2974785 (2020).
Article Google Scholar
Sharma, S. C. M. & Mitra, A. ResDeepD: A residual super-resolution network for deep downscaling of daily precipitation over India. Environ. Data Sci. 1, e19 (2022).
Article Google Scholar
Chiang, C.-H. et al. Climate downscaling: A deep-learning based super-resolution model of precipitation data with attention block and skip connections 17847, Vol. 2403 (2024).
Cheng, J. et al. Deepdt: Generative adversarial network for high-resolution climate prediction. IEEE Geosci. Remote Sens. Lett. 19, 1–5. https://doi.org/10.1109/LGRS.2020.3041760 (2022).
Article ADS Google Scholar
Stengel, K., Glaws, A., Hettinger, D. & King, R. N. Adversarial super-resolution of climatological wind and solar data. Proceedings of the National Academy of Sciences 16805–16815, (2020). https://doi.org/10.1073/pnas.1918964117
Karras, T., Aila, T., Laine, S. & Lehtinen, J. Progressive Growing of gans for Improved Quality, Stability, and Variation (Learning, Learning, 2017).
Kumar, B. et al. On the modern deep learning approaches for precipitation downscaling. (2022).
Li, J. et al. Model-guided coarse-to-fine fusion network for unsupervised hyperspectral image super-resolution. IEEE Geosci. Remote Sens. Lett. 20, 1–5. https://doi.org/10.1109/LGRS.2023.3309854 (2023).
Article Google Scholar
Li, J. et al. Model informed multistage unsupervised network for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 62, 1–17. https://doi.org/10.1109/TGRS.2024.3391014 (2024).
Article ADS Google Scholar
Yao, D. et al. Deep hybrid: Multi-graph neural network collaboration for hyperspectral image classification. Defence Technology 0–null, Accessed 24 February 2025. https://doi.org/10.1016/j.dt.2022.02.007 (2025).
Ding, Y. et al. Multi-scale receptive fields: Graph attention neural network for hyperspectral image classification. Expert Syst. Appl. https://doi.org/10.1016/j.eswa.2023.119858 (2023).
Article Google Scholar
Chen, Z. et al. Temporal difference-guided network for hyperspectral image change detection. Int. J. Remote Sens. 44, 6033–6059. https://doi.org/10.1080/01431161.2023.2258563 (2023).
Article Google Scholar
Wu, G., Jiang, J., Jiang, K. & Liu, X. Fully 1$\times$1 convolutional network for lightweight image super-resolution. arXiv:2307.16140 (2023).
Wu, R., Sun, L., Ma, Z. & Zhang, L. One-step effective diffusion network for real-world image super-resolution. arXiv:2406.08177 (2024).
Kingma, D. & Ba, J. Adam: A method for stochastic optimization. arXiv: Learning,arXiv: Learning (2014).
Dong, C., Loy, C. C., He, K. & Tang, X. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 295–307, (2016). https://doi.org/10.1109/tpami.2015.2439281
Tai, Y., Yang, J. & Liu, X. Image super-resolution via deep recursive residual network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017). https://doi.org/10.1109/cvpr.2017.298
Lai, W.-S., Huang, J.-B., Ahuja, N. & Yang, M.-H. Deep laplacian pyramid networks for fast and accurate super-resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017). https://doi.org/10.1109/cvpr.2017.618
Zhang, Y., Tian, Y., Kong, Y., Zhong, B. & Fu, Y. Residual dense network for image super-resolution. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018). https://doi.org/10.1109/cvpr.2018.00262
Lim, B., Son, S., Kim, H., Nah, S. & Lee, K. M. Enhanced deep residual networks for single image super-resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2017). https://doi.org/10.1109/cvprw.2017.151
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261–2269, (2017). https://doi.org/10.1109/CVPR.2017.243
Sajjadi, M. S. M., Schölkopf, B. & Hirsch, M. Enhancenet: Single image super-resolution through automated texture synthesis (2017). 1612.07919.
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016). https://doi.org/10.1109/cvpr.2016.90
Haase, D. & Amthor, M. Rethinking depthwise separable convolutions: How intra-kernel correlations lead to improved mobilenets. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 14588–14597 (2020).
Li, X., Hu, X. & Yang, J. Spatial group-wise enhance: Improving semantic feature learning in convolutional networks. arXiv:1905.09646 (2019).
Chen, Z. et al. Dual aggregation transformer for image super-resolution. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 12278–12287 (2023).
Wu, Y., Teufel, B., Sushama, L., Belair, S. & Sun, L. Montreal high-resolution climate data (1.0) [data set], (2021). https://doi.org/10.5281/zenodo.5008611

Download references

Funding

This work was supported in part by the Science Research Project of Hebei Education Department under Grant ZD2021319, and in part by Hebei University of Economics and Business under Grant 2024ZD10.

Author information

These authors contributed equally: Meng Li and Yijing Chen.

Authors and Affiliations

College of Statistics and Mathematics, Hebei University of Economics and Business, Shijiazhuang, 050061, China
Meng Li, Yijing Chen & Zhihui Song

Authors

Meng Li
View author publications
Search author on:PubMed Google Scholar
Yijing Chen
View author publications
Search author on:PubMed Google Scholar
Zhihui Song
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, M.L. and Y.C.; methodology, M.L. and Y.C.; software, Y.C. ; validation, Y.C.; investigation, D.F.; resources, M.L.; data curation, Y.C.; writing—original draft preparation, S.D. and D.F.; writing—review andediting, S.D. and Y.S.; visualization, Y.C.; supervision, M.L. , Y.C. and Z.S. All authors have read and agreed to the publishedversion of the manuscript.

Corresponding author

Correspondence to Meng Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, M., Chen, Y. & Song, Z. A super-resolution network based on dual aggregate transformer for climate downscaling. Sci Rep 15, 33504 (2025). https://doi.org/10.1038/s41598-025-17234-4

Download citation

Received: 17 July 2024
Accepted: 22 August 2025
Published: 29 September 2025
Version of record: 29 September 2025
DOI: https://doi.org/10.1038/s41598-025-17234-4

Subjects

Abstract

Similar content being viewed by others

Comparative study of single and hybrid deep learning models for daily rainfall prediction in selected African cities

A high-resolution database of historical and future climate for Africa developed with deep neural networks

Transformer based models with hierarchical graph representations for enhanced climate forecasting

Introduction

Related work

Materials

Data

Data preprocessing

Model implementation

Methods

CNN for image preprocessing

Network structure

Skip connections

Lightweight CNN backbone (LCB)

Dual aggregation transformer block based on BSConv

Data availability

Results

The performance of the generated climate images

Radar precipitation data reconstruction supervisor effect comparison

Comparison of objective indexes of radar precipitation data reconstruction

Ablation experiments

Comparative study on additional dataset

Complexity analysis

Conclusions

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links