Abstract
Land Surface Temperature (LST) is widely recognized as a sensitive indicator of climate change, and it plays a significant role in ecological research. The ERA5-Land LST dataset, developed and managed by the European Centre for Medium-Range Weather Forecasts (ECMWF), is extensively used for global or regional LST studies. However, its fine-scale application is limited by its low spatial resolution. Therefore, to improve the spatial resolution of ERA5-Land LST data, this study proposes an Attention Mechanism U-Net (AMUN) method, which combines data acquisition and preprocessing on the Google Earth Engine (GEE) cloud computing platform, to downscale the hourly monthly mean reanalysis LST data of ERA5-Land across China’s territory from 0.1° to 0.01°. This method comprehensively considers the relationship between the LST and surface features, organically combining multiple deep learning modules, includes the Global Multi-Factor Cross-Attention (GMFCA) module, the Feature Fusion Residual Dense Block (FFRDB) connection module, and the U-Net module. In addition, the Bayesian global optimization algorithm is used to select the optimal hyperparameters of the network in order to enhance the predictive performance of the model. Finally, the downscaling accuracy of the network was evaluated through simulated data experiments and real data experiments and compared with the Random Forest (RF) method. The results show that the network proposed in this study outperforms the RF method, with RMSE reduced by approximately 32–51%. The downscaling method proposed in this study can effectively improve the accuracy of ERA5-Land LST downscaling, providing new insights for LST downscaling research.
Similar content being viewed by others
Introduction
Land Surface Temperature (LST) is an indispensable parameter in the study of the exchange of matter and energy between the atmosphere and the surface and in the study of aspects of climate change. It is one of the important indicators of global climate change and the study of the water–heat balance of the Earth system1,2. Ground-based observations can obtain continuous LST data in time, but they are relatively discrete in spatial distribution. The thermal infrared band of remote sensing satellites can invert LST3, but its resolution often restricts each other in time and space, and the imaging moment and cloud amount also have a certain impact on data validity. ERA5-Land reanalysis data can provide continuous global LST information every hour4, but their spatial resolution is only 0.1°. To meet the demand for high spatio-temporal resolution LST and to adapt to the diversified requirements of various fine earth science applications, the downscaling of LST data is required.
In recent years, scholars have proposed many downscaling algorithms, including image fusion5, statistical regression models6, and algorithms based on machine learning7. Among them, image fusion includes the Spatio-Temporal Adaptive Reflectance Fusion Model (STARFM)8, Enhanced STARFM (ESTARFM)9, the Spatio-Temporal Integrated Temperature Fusion Model10, and the Spatio-Temporal Adaptive Data Fusion Algorithm for Temperature Mapping5. These models obtain continuous LST information by fusing the LST data of different time frequencies, while maintaining the good spatial texture of the downscaled LST data11. However, they have problems such as unsatisfactory inversion accuracy of heterogeneous surfaces, high model training costs, the smoothing of details, and unclear physical mechanisms.
Compared with the above methods, the statistical regression downscaling model based on the “scale invariance” assumption fully considers the physical mechanism of the energy balance of surface thermal radiation information before and after LST downscaling and retains the detail information of LST after downscaling12. This method is simple and efficient, and it currently dominates in the field of LST downscaling research. For example, the DisTrad algorithm13 realizes LST downscaling by establishing a linear regression relationship between LST and the Normalized Difference Vegetation Index (NDVI), and the TsHARP algorithm14 realizes LST downscaling by using vegetation coverage and LST to establish a linear regression model. In addition, the Geographically Weighted Regression (GWR) method15 is also widely used in LST downscaling. However, due to the complex features of high-dimensional data that cannot be fully characterized with traditional statistical regression models, downscaling methods based on machine learning have emerged16; due to their superior nonlinear fitting capabilities, they are widely used in downscaling work and have improved the performance of downscaling. For example, Random Forest (RF), Artificial Neural Networks (ANN), Support Vector Machines (SVM), etc., use NDVI, surface reflectance, geographical/terrain factors, and other remote sensing indicators as downscaling factors to downscale LST17,18,19. However, machine learning methods have certain limitations when used to extract higher-dimensional information, and they cannot deeply characterize the intrinsic relationship between multiple environmental factors and target quantities.
In order to explore the potential relationships among multiple environmental factors and to capture the complex multilevel information in high-dimensional data, deep learning has emerged as a new research breakthrough following machine learning. Convolutional Neural Networks (CNNs) can deeply mine multiscale features of high-dimensional images20,21,22,23; CNNs include the Super-Resolution Convolutional Neural Network (SRCNN), which directly learns the end-to-end mapping between fine-scale and coarse-scale images and has achieved significant results in the field of image super-resolution reconstruction. However, LST data at different scales are not merely different in terms of resolution. Traditional super-resolution methods lack the capacity to capture necessary surface details for LST downscaling, and a large number of auxiliary data need to be introduced to supplement the details of LST. Therefore, the attention mechanism becomes an entry point. It helps the model find the features most relevant to LST among a large amount of auxiliary data and integrates these features into the prediction of LST, thereby achieving more accurate LST downscaling. However, research in this area is not the main focus at present.
The establishment of a downscaling model based on deep learning requires the support of a large amount of data. At the same time, in order to adapt to the model input and standardize the geographic references, a huge amount of data preprocessing work needs to be conducted. Traditional remote sensing methods cannot meet the needs of the model construction due to their high computational complexity and high processing costs. With the emergence of cloud computing platforms such as Google Earth Engine (GEE), this problem has been solved. GEE combines Google’s cloud computing capabilities and the Earth observation data of major institutions to solve important global social problems. It provides a large amount of geospatial data, including satellite remote sensing images and basic geographic information24,25,26,27. The ERA5-Land LST data and the seven auxiliary factors needed for the downscaling model research in this study are all retrieved through GEE and preprocessed through cloud computing.
In order to achieve more accurate LST downscaling and to explore the application of the attention mechanisms of deep learning methods to downscaling, this study proposes an Attention Mechanism U-Net (AMUN) method to downscale the hourly and monthly average LST reanalysis data of ERA5-Land across China. This method transfers the U-Net network used for image semantic segmentation28,29,30 to the field of LST downscaling and calibrates the feature maps through the Global Multi-Factor Cross-Attention (GMFCA) module. In addition, the Feature Fusion Residual Dense Block (FFRDB) connection module is introduced to deeply extract the features of coarse-scale LST data. Finally, the calibrated feature maps, deeply extracted feature maps, and shallowly extracted feature maps are fused as the input of U-Net to obtain fine LST data. At the same time, the Bayesian global optimization algorithm31,32 is used to search for the optimal combination of the hyperparameters of the model to enable the best performance of the model and to improve the model’s prediction accuracy.
The main innovations and contributions of this study are as follows:
-
1.
This study proposes a novel downscaling network that combines image super-resolution reconstruction with the concept of scale-invariant effects to achieve downscaling.
-
2.
The study introduces an attention mechanism, which is integrated with U-Net, to deeply extract the complex relationships between auxiliary factors and LST, significantly improving downscaling accuracy.
-
3.
The proposed method enhances the spatial resolution of ERA5-Land LST data, providing a new reference for acquiring high spatiotemporal LST. This is of great importance for refined surface temperature applications, such as agricultural planning, urban heat management, and climate change research.
The rest of this paper is arranged as follows. Section "Materials" introduces the acquisition and preprocessing of the ERA5-Land LST data and auxiliary factor data based on GEE. Section "Methods" introduces the AMUN downscaling method proposed in this study. Section "Results and discussion" evaluates and discusses the effectiveness of the AMUN method. Finally, Section "Conclusion" summarizes the contributions of this study.
Materials
Data acquisition based on GEE
In this study, the territory of China is taken as the research area, as it has rich terrain features and significant spatial variability in LST (Fig. 1). The LST data come from the ERA5-Land hourly monthly average reanalysis dataset with a spatial resolution of 0.1°; this dataset provides LST data at all times of the day on a monthly basis. It is obtained from the skin_temperature band of the ECMWF/ERA5_LAND/MONTHLY_BY_HOUR dataset in GEE. The auxiliary remote sensing dataset directly obtains the NDVI, land cover type, and surface reflectance using the Moderate Resolution Imaging Spectroradiometer (MODIS) MOD13Q1, MCD12Q1, and MOD09Q1 products. The Normalized Difference Building Index (NDBI) and the Modified Normalized Difference Water Index (MNDWI) are calculated using the MOD09A1 product. The slope and aspect are calculated using the Digital Elevation Model (DEM) from the Shuttle Radar Topography Mission (SRTM) (Table 1). Finally, the accuracy is evaluated using the temperature data measured at meteorological stations.
Topographic map of China (using DEM data from SRTM [https://lpdaac.usgs.gov/products/srtmgl1v003/] and produced with ArcGIS 10.8 software).
Data processing based on GEE
In order to adapt to the model input and to unify data from different sources, data preprocessing is performed. The preprocessing involved in this study is completed within the GEE platform, which includes mean synthesis, ROI clipping, resampling, and reprojection. All the auxiliary data are resampled to 0.01° and 0.1° under the WGS84 coordinate system using bilinear interpolation, and the LST data are resampled to 1° based on 0.1°33. Additional processing is required for individual data items, such as the extraction of the slope and aspect from the DEM data obtained from the USGS/SRTMGL1_003 dataset in GEE, which is achieved through the Slope and Aspect functions, respectively. The sur_refl_b02, sur_refl_b04, and sur_refl_b06 bands obtained from the MODIS/061/MOD09A1 dataset are used to calculate the normalized difference to obtain NDBI and MNDWI; this is implemented in GEE through the NormalizedDifference functions. It can be expressed as:
where \(sur\_refl\_b02\), \(sur\_refl\_b04\), and \(sur\_refl\_b06\) represent the near-infrared, green light, and shortwave infrared bands of MODIS surface reflectance, respectively. It should be noted that the Mean functions in GEE can synthesize data into a monthly time scale. In addition, for data that do not change on a monthly basis, the Mean method also has the implicit function of converting the ImageCollection type to the Image type, so as to crop the image data (GEE only supports the Clip functions for the Image type). The corresponding datasets and processing flow of this study are shown in Fig. 2. Among them, the 1° LST and 0.1° auxiliary data are used for model training and simulation data experiments, while the original 0.1° LST and 0.01° auxiliary data are used for real data experiments to achieve downscaling.
Dataset and processing procedure.
Methods
LST downscaling framework
In this study, a total of seven environmental and geographical factors are used as auxiliary factors for downscaling LST. The environmental factors are NDVI, land cover type, surface reflectance, MNDWI, and NDBI34, which are used to provide vegetation cover conditions, surface physical characteristics, inland water conditions, and building cover conditions, respectively. The geographical factors are slope and aspect. The framework for the LST downscaling in this study can be expressed as:
where \(LST_{H}\) and \(LST_{L}\) represent high-resolution and low-resolution LST data, respectively, and \(f\) represents the downscaling model proposed in this study.
AMUN Method
In order to delve into the intricate characteristics of LST and thoroughly examine the association between LST and auxiliary factors, this study introduces the AMUN method (Fig. 3). In the network, the low-resolution LST data and 7 high-resolution auxiliary data are used as input. To utilize the precise magnitude data of the coarse LST and the abundant attributes of diverse fine auxiliary factors, this study proposes the GMFCA module, which mutually adjusts the feature maps. Furthermore, the FFRDB connection module is employed to derive multi-tiered feature data from the provided LST map. Finally, through global residual learning, the outputs of GMFCA, FFRDB, and shallow feature extraction are fused as the input of U-Net, to extract texture information from the feature fusion map, and retain its high-frequency details when they are mapped to LST.
Overall structure of AMUN method. (a) Residual Dense Block (RDB), (b) Channel Weight Extraction (CWE), (c) Encoder-Bridge-Decoder (E-B-D). (To highlight the characteristics of U-Net, the number of convolution kernels in each layer of the E-B-D module is marked.)
Detailed information about each module is provided in the following subsections.
GMFCA Module
In order to consider multi-feature input and its complementary information, this study proposes the GMFCA module based on the global cross-attention mechanism and multi-factor cross-attention mechanism33. This approach, while maintaining the effective performance of both aspects, decreases the model’s training parameters and enhances the efficiency of the model’s training process.
The GMFCA module contains eight inputs, which are extracted and weighted using the CWE block to obtain one LST channel weight and seven auxiliary data spatial weights. For the channel weights, the convolutional layer extracts the feature map containing its amplitude information from the sampled LST data. It employs the sigmoid layer for feature map normalization to derive the channel weight, amplifies the auxiliary data feature map through multiplication, adjusts the feature map extracted from the auxiliary data via an element-wise weighting operation, and subsequently merges the seven adjusted auxiliary data feature maps using element-wise addition. For the spatial weights, each of the seven auxiliary data’s spatial weights is multiplied with the LST feature map to amplify the detailed features of the LST data. Ultimately, the channel-weighted auxiliary data feature map and the spatially weighted LST feature map, which have been combined through addition, are integrated via the cascade function. The high-dimensional feature space is then condensed using a 1 × 1 convolutional layer to project onto the low-dimensional feature space, yielding a recalibrated feature map abundant in details and magnitude. Taking one of the auxiliary data channels as an example, its mechanism of action with the LST channel can be expressed as:
where \(F_{W}\) represents the feature map after weighted calibration; \(F_{A}\) and \(F_{T}\) represent the feature maps extracted from the input auxiliary data and LST data, respectively; \(W\) represents the convolution kernel in the CWE block; \(\times\) and \(\otimes\) represent the element-wise multiplication operation and convolution operation, respectively; and \(\sigma\) represents the sigmoid function. The effectiveness of this module is discussed in Section "Reliability experiment of three network modules".
FFRDB connection module
The objective of the FFRDB connection module is to derive layered data from the LST input attributes that have been subjected to preliminary feature extraction. It first concatenates three RDBs35 through global feature fusion; then, it goes through a 1 × 1 convolutional layer to adaptively fuse local features at different levels; finally, it extracts features through convolution. Each RDB consists of three layers of dense blocks, local feature fusion, and local residual learning, forming a continuous memory mechanism. Specifically, the cascade function deeply connects the output of the previous RDB with the output of each layer of the current RDB. Through element-wise addition, the output of the previous RDB is directly fused into the output of the 1 × 1 convolution, further enhancing the feature information flow. After such an operation, an LST feature map rich in local features is obtained.
U-Net module
The primary function of U-Net in AMUN is to conduct advanced feature extraction and amalgamation on the combined outcomes of GMFCA, FFRDB, and shallow feature extraction, so that the network can fully utilize the features of each level at the input end and enhance the network’s expressive power. Its structure consists of a set of convolutional layers for extracting the output features of global residual learning, a transposed convolutional layer for upsampling to the original input size, and an E-B-D module. The E-D-B module has a total of 5 layers, 4 symmetrical encoder and decoder layers, and 1 bridge layer. The corresponding encoders and decoders are jump-connected to realize the fusion of shallow information and deep information.
Bayesian optimization of hyperparameters
Bayesian optimization is a global optimization method that builds a probabilistic model of the objective function and uses this model to explore the optimal hyperparameter combination that has not yet been evaluated, balancing the relationship between the two. The Bayesian optimization rule32 can be expressed as:
where \(p\left( {w{|}D} \right)\) represents the posterior probability, \(P\left( {D{|}w} \right)\) represents the likelihood, \(p\left( w \right)\) and \(p\left( D \right)\) represent the prior probabilities of the corresponding events, \(w\) represents the value of the parameter to be optimized, and \(D\) represents the data that have been observed.
This study uses this method to optimize the selection of five parameters: solver, learning rate, number of iterations, batch size, and regularization strength. In addition, the learning rate decay factor is set to 0.5, and the learning rate decay period is set to half of the number of iterations. The detailed procedure of the experiment and its outcomes are presented in section "Bayesian optimization experiment".
Model training and data preparation
-
1.
Training and validation data: The network is trained using 24 data sets from a 12-month period in 2020–2021 (Table 2). The initial LST data serve as the label dataset. The slice sizes of the input and labels are 32 × 32 × 8 and 32 × 32 × 1, respectively. After processing 24 sets of data, a total of 36,387 patches (all worthless areas are excluded) were generated. Among these, 85% serve as the training set for network training, while the remaining 15% constitute the validation set used to assess the model’s performance during the training process.
-
2.
Simulated and real data test sets: The efficacy of the AMUN method is assessed via experiments with both simulated and real data. In the simulation experiment, image data from August 2019 and February 2022 are used to test the downscaling model and to evaluate the model’s performance in the summer and winter of different years. In the real data experiment, image data from four distinct seasons, namely February 2019, May 2020, August 2021, and November 2022, are employed. This is done to put the downscaling model to the test, with the objective of assessing the model’s performance across different seasons, both within and beyond the temporal scope of the training set. The image resolution is 10 times that of the simulated test set (Table 2).
-
3.
Parameter settings: After Bayesian hyperparameter optimization, the Rmsprop solver is finally used as the gradient descent optimization algorithm, with a total of 99 rounds of training, a batch size of 34, a regularization strength of 0.0001, an initial learning rate of 0.0001, and halving every 50 rounds.
Results and discussion
LST downscaling evaluation
To ascertain the efficacy of the suggested AMUN approach in the downscaling estimation of LST, this study takes the territory of China as the research area and carries out downscaling experiments comprising two aspects: simulated data and real data, which are described in this section. In both experiments, the proposed method is compared longitudinally with RF. This study uses three quantitative indicators, Mean Absolute Error (MAE), Coefficient of Determination (R2), and Root Mean Square Error (RMSE), to evaluate the accuracy of the LST downscaling results.
Simulated data experiment
In the simulated data experiment, the original ERA5-Land data (0.1°) are downsampled to a spatial resolution of 1° as the simulated LST data to be downscaled. Subsequently, with this simulated LST and seven auxiliary datasets of 0.1° spatial resolution, the trained model is used to downscale the 1° simulated LST data to 0.1°. Consequently, the original ERA5-Land data can serve as a benchmark to contrast with the LST data derived from downscaling, allowing the model’s performance to be assessed in terms of visual impact and quantitative metrics.
The downscaling results and quantitative indicator data graphs of the AMUN and RF methods in August 2019 and February 2022 are shown in Figs. 4 and 5, respectively. It can be observed that due to the higher terrain in the southwest, the LST of the Qinghai-Tibet Plateau is relatively lower compared to other regions. Both AMUN and RF’s downscaling outcomes exhibit strong spatial coherence when juxtaposed with the reference data. However, the fitting line of the scatter plot associated with AMUN is noticeably nearer to the reference line, indicating that AMUN’s downscaling result has superior spatial correlation compared to RF (Figs. 4b2, c2 and 5b2, c2). The bias of the AMUN downscaling results in August 2019 is generally low, with only a very small number of areas showing values greater than 2, and the distribution is relatively random, which is visually significantly better than the RF downscaling results (Figs. 4b3, c3 and 5b3, c3). Although the AMUN downscaling results in February 2022 show a larger number of areas with bias values greater than 2, they are still visually superior to the RF downscaling results overall.
LST visual effects and quantitative indicator data for August 2019. (a1,a2) are the original (reference) image and input LST data, respectively. (b1–b3) represent the AMUN downscaling results, scatter plot, and bias level stereogram, respectively. (c1–c3) represent the RF downscaling results, scatter plot, and bias level stereogram, respectively. The elliptical red circles in the figures indicate areas of significant contrast between the reference image and the prediction results from the two methods, highlighting that the AMUN method’s predictions are closer to the reference image in terms of spatial details.
LST visual effects and quantitative indicator data for February 2022. (a1,a2) are the original (reference) image and input LST data, respectively. (b1–b3) represent the AMUN downscaling results, scatter plot, and bias level stereogram, respectively. (c1–c3) represent the RF downscaling results, scatter plot, and bias level stereogram, respectively. The elliptical red circles in the figures indicate areas of significant contrast between the reference image and the prediction results from the two methods, highlighting that the AMUN method’s predictions are closer to the reference image in terms of spatial details.
For a more robust validation of the AMUN method’s effectiveness, the LST downscaling outcomes were assessed quantitatively (Table 3). As inferred from the coefficient of determination, both the AMUN and RF downscaling outcomes demonstrate commendable performance, with R2 values of 0.986 and 0.936, respectively. However, when considering the MAE and RMSE outcomes, the AMUN method’s downscaling outcomes significantly outperform those of RF, with MAE values of 0.731 K and 1.637 K, respectively, and RMSE values of 1.112 K and 2.292 K. This suggests that the AMUN method exhibits superior performance in terms of expressing spatial details.
To sum up, in the simulation experiment, both the AMUN and RF methods are employed to downscale the 1° LST data to 0.1°. When juxtaposed with the reference data, the spatial coherence and details of the AMUN downscaling outcomes outshine those of the RF downscaling outcomes.
Real data experiment
In the real data experiment, the original ERA5-Land data (0.1°) are downscaled to a spatial resolution of 0.01°. Subsequently, the downscaling results are validated in two ways: one is based on the area scale, where the downscaling results are downsampled to match the original LST data for validation; the other is based on the point scale, where the downscaling results are validated with actual meteorological station data.
In the area scale validation, the performances of the RF and AMUN methods are compared using data from four time scales in February 2019, May 2020, August 2021, and November 2022, with the original LST data as the benchmark (Fig. 6). Visually, the downscaling results of the AMUN method accurately capture the spatial distribution of LST. Regarding the areas with small distribution areas and abrupt temperature changes, the AMUN method can capture them well using the auxiliary data, but with the RF method, these areas are difficult to handle. This reflects the fact that the AMUN method can better restore spatial details.
Visual comparison of the downscaling results of the AMUN and RF methods with the original LST data in February 2019, May 2020, August 2021, and November 2022. (a1–a4) represent the original (reference) images. (b1–b4) represent the downscaling results of the AMUN method. (c1–c4) represent the downscaling results of the RF method. (d) is the schematic diagram of the area for February 2019 and May 2020. (e) is the schematic diagram of the area for August 2021 and November 2022. The elliptical red circles in the figures indicate areas of significant contrast between the reference image and the prediction results from the two methods, highlighting that the AMUN method’s predictions are closer to the reference image in terms of spatial details.
The spatial correlation and bias of the AMUN method are superior to those of the RF method (Figs. 7 and 8). The downscaling results of both methods perform well in flat areas, with errors mainly concentrated in mountainous areas. The AMUN method demonstrates that it can significantly resist these errors. Therefore, the AMUN method proposed in this study is effective for the downscaling of the original LST data.
Scatter plots of the original LST data compared separately with the AMUN and RF methods in February 2019, May 2020, August 2021, and November 2022. (a1–a4) represent the scatter plots corresponding to the AMUN method. (b1–b4) represent the scatter plots corresponding to the RF method.
Bias level diagrams of the original LST data compared separately with the AMUN and RF methods in February 2019, May 2020, August 2021, and November 2022. (a1–a4) represent the bias level diagrams corresponding to the AMUN method. (b1–b4) represent the bias level diagrams corresponding to the RF method.
In the point scale validation, the same four time scales of data in February 2019, May 2020, August 2021, and November 2022 are used for accuracy validation. The actual station temperature data corresponding to the two regions of the four time scales are compared with the original data, the AMUN downscaling results, and the RF downscaling results (Fig. 9). In order to reduce the difference between air temperature and LST, the average monthly maximum temperature data of the station are used in area (a), and the average monthly temperature data of the station are used in area (b) for a comparison of all three. The results show that all three have good spatial correlation with the station data, and the trend of the line changes is highly consistent with the station data. The AMUN method is even better than the original data, indicating that the AMUN method both retains the accuracy of the original data and supplements the spatial details effectively.
Schematic diagram of the station validation area and line chart. (a,a1,a2) correspond to the validation area and line chart for February 2019 and May 2020. (b,b1,b2) correspond to the validation area and line chart for August 2021 and November 2022.
Apart from the visual assessment of the downscaling outcomes, this experiment also carried out a quantitative evaluation (Table 4). Both the RF and AMUN methods exhibit a strong correlation with the original image, with R2 values exceeding 0.94. In terms of MAE and RMSE values, the AMUN method outperforms the RF method, with MAE values of 0.613 K and 0.956 K, and RMSE values of 0.989 K and 1.444 K, respectively. From just the numerical values of the quantitative evaluation, the AMUN method does not have a clear advantage over the RF method; however, from the visual effect in Fig. 6, it can be seen that the reason for this phenomenon is that the AMUN method introduces more spatial details, which make the downsampled results slightly rough compared to the smoothness of the original data. Moreover, when using bilinear interpolation for downsampling, the sampling method itself also has a certain systematic error; so, the gap between the two methods is reduced to a certain extent.
In the evaluation based on stations, the temperature data of the meteorological stations are compared with the AMUN downscaling results, RF downscaling results, and the original data. Due to factors such as the different speeds at which the air and ground absorb and release solar heat, there is a systematic error of 2-3 K between the air temperature and LST, but this does not affect the evaluation of the spatial correlation of the downscaling results. As shown in Table 4, the R2 values of AMUN, RF, and the original data are 0.894, 0.861, and 0.878, respectively; the MAE values are 2.848 K, 2.927 K, and 2.776 K, respectively; and the RMSE values are 4.307 K, 4.933 K, and 4.625 K, respectively (Table 4). The three indicators of the AMUN method results are superior to those of the RF method, and they are also superior to the evaluation results of the original data in terms of R2 and RMSE. This once again proves that the downscaling results of the AMUN method not only maintain a high consistency with the original data, but also enrich the real LST details to a certain extent.
Discussion
Reliability experiment of three network modules
In order to clarify the effectiveness of the GMFCA module and the three-module combination AMUN proposed in this method, three modules are combined, namely (1) the GMFAC module alone; (2) the combination of the GMFCA module and the FFRDB connection module; (3) the combination of the GMFCA module and the U-Net module; and (4) AMUN as the combination of the three modules. The performance of the four combinations is evaluated using the RMSE of the validation set (the input data were normalized) during the training process, and the same hyperparameters are used during the training of the four combinations. At the end of the iteration, the RMSEs of the validation set of the four combinations are all below 0.5. When the training of the GMFAC module alone ends, the RMSE of the validation set is 0.44. After introducing the FFRDB connection module, the RMSE of the validation set slightly increases. After introducing the U-Net module, the RMSE of the validation set significantly decreases. When the FFRDB connection module and the U-Net module are introduced at the same time to form AMUN, the final RMSE of the validation set is 0.28, which is the best of the four combinations, and the RMSE is significantly lower than the other three combinations (Fig. 10). To explain the observed results: The GMFCA and FFRDB modules are parallel in structure. When the FFRDB module is introduced, the model captures more feature information, but it cannot fully utilize this information. By introducing the U-Net module, the model can effectively extract and integrate the combined features, enabling more efficient utilization of the extracted information. Overall, the GMFCA module and the AMUN method proposed in this study are significantly effective. After training, the size of the final AMUN model is 84.1 MB.
RMSEs of the validation set during the training process of the four combinations. Due to the normalization of the data, RMSE has no units.
Bayesian optimization experiment
In order to better ensure the performance of the downscaling model and to enhance the robustness of the AMUN method, this experiment uses the Bayesian global optimization method to optimize the selection of five parameters: solver, learning rate, number of iterations, batch size, and regularization strength, and it explores the best hyperparameter combination of the downscaling model, with a total of 30 experiments. The combination of hyperparameters with good effects is more inclined towards lower learning rates, regularization strengths, and batch sizes and more inclined towards higher numbers of iterations. For the solver, the Rmsprop solver has the best training performance for the AMUN method; the Adam solver is second; and the Sgdm solver experienced gradient explosion twice, leading to the failure of the model training (Fig. 11). Finally, after optimization, the Rmsprop solver was used as the gradient descent optimization algorithm, with a total of 99 rounds of training, a batch size of 34, a regularization strength of 0.0001, and an initial learning rate of 0.0001. Overall, using Bayesian optimization can select the best hyperparameters for the model, reducing the RMSE of the validation set from 5.6863 K to 1.4892 K. Compared to grid search and random search methods, Bayesian optimization can find the optimal hyperparameter combination more quickly, reducing computational costs31.
Bayesian hyperparameter optimization process. Units of RMSE: K.
Adaptability of AMUN to different terrains
To further evaluate the effectiveness of the AMUN method, experiments were conducted under different terrain conditions (plateau, plain, and mountainous regions) (Fig. 12). A quantitative analysis of AMUN’s performance in these terrains was conducted (Table 5). Based on the quantitative results, the AMUN method achieved the highest spatial correlation with the original map in the mountainous region, with an R2 of 0.976, followed by the plateau region with an R2 of 0.958. However, in terms of the error between predicted and actual values, the plain region performed best, with an MAE of 0.279 K and an RMSE of 0.357 K. The plateau region followed, with an MAE of 0.486 K and an RMSE of 0.620 K. In summary, the AMUN method performs well across all three terrain types. Although the R2 performance in the plain region is less pronounced due to weaker spatial variability of LST, the method still shows strong results in terms of MAE and RMSE.
Comparison between the experimental results of AMUN and the original maps in different terrains (a1–a6 represent the original maps, b1–b6 represent the AMUN results).
Adaptability of AMUN to different seasons
To investigate the performance of the AMUN method across different seasons, this study evaluated the model’s downscaling results for four seasons (Fig. 13). The seasonal divisions are as follows: spring (March–May), summer (June–August), autumn (September–November), and winter (December-February). The experimental results show that the model maintains strong spatial correlations with the original values across all four seasons, with R2 values consistently around 0.98 and no significant differences between seasons. In terms of error, the model performs best in summer, with an MAE around 0.5 K and an RMSE around 0.8 K. The seasons with the largest errors are spring and winter, where the MAE is approximately 0.35 K higher and the RMSE is about 0.65 higher compared to summer. In summary, the AMUN method performs best in summer, followed by autumn, with spring and winter showing comparable performance, though slightly behind summer and autumn.
Performance of the AMUN method across three evaluation metrics in different seasons.
Conclusion
In this study, deep learning methods are innovatively used to explore LST downscaling. This study proposes a U-Net method based on an attention mechanism, which consists of GMFCA, FFRDB, and U-Net modules. The GMFCA module is an attention mechanism that can fully extract the complementary information between the LST data and the seven auxiliary data. Then, the FFRDB module is introduced to capture rich features from the input LST data. Then, using the U-Net module, multilevel feature information can be further extracted from the input, and more potential connections between the data can be mined. The effectiveness of the AMUN method is demonstrated by comparing the combination of the three modules. The GEE platform is used to obtain and preprocess the ERA5-Land hourly and monthly average reanalysis LST dataset and the seven auxiliary data, improving the efficiency of the downscaling work. In addition, using the Bayesian optimization algorithm, the training hyperparameters of the AMUN method are optimized and selected, improving the accuracy and robustness of the downscaling model. In the simulated data experiment and the real data experiment, the AMUN method and RF method are compared. The AMUN method has better spatial correlation and accuracy and is better matched with the original LST data and station-measured data. The RF method cannot fully describe the spatial variability of LST and cannot accurately capture the area of abrupt temperature change. In summary, the methods and the series of optimization measures proposed in this study can effectively cope with LST downscaling work.
In addition to ERA5-Land reanalysis LST data, with a reasonable selection of auxiliary factors and by adjusting the number of input heads, the AMUN method can accommodate similar downscaling tasks based on the scale-invariant effect. However, the method still has some limitations. For example, the model’s prediction stability under different terrain and seasonal conditions requires further improvement. Additionally, due to the slice-based input model training method, stitching artifacts appear in a few areas. In our future research, we plan to further improve the model structure and training mechanism to enhance prediction stability while also exploring the application of downscaling methods inspired by image super-resolution reconstruction to existing machine learning models (such as SVM and ANN) to investigate their predictive performance.
Data availability
The sources of the datasets used in this study have been acknowledged in the main text. Specifically, all auxiliary data and LST sources, as well as the generated intermediate data, were processed using the Google Earth Engine (GEE) cloud computing platform (https://earthengine.google.com/). The vector data used during the mapping process were obtained from: https://datav.aliyun.com/. The data supporting the results of this study are available from the author, Shiyu Li, upon request.
References
Duan, S. et al. Estimation of diurnal cycle of land surface temperature at high temporal and spatial resolution from clear-sky modis data. Remote Sens. 6, 3247–3262 (2014).
Li, Z. et al. Satellite-derived land surface temperature: Current status and perspectives. Remote Sens. Environ. 131, 14–37 (2013).
Jimenez-Munoz, J. C. et al. Revision of the single-channel algorithm for land surface temperature retrieval from landsat thermal-infrared data. IEEE Trans. Geosci. Remote Sensing. 47, 339–349 (2008).
Liu, J., Hagan, D. F. T. & Liu, Y. Global land surface temperature change (2003–2017) and its relationship with climate drivers: Airs, modis, and Era5-land based analysis. Remote Sens. 13, 44 (2020).
Weng, Q., Fu, P. & Gao, F. Generating daily land surface temperature at landsat resolution by fusing landsat and modis data. Remote Sens. Environ. 145, 55–67 (2014).
Li, Y. & Yan, X. Statistical downscaling of monthly mean temperature for Kazakhstan in Central Asia. Clim. Res. 72, 101–110 (2017).
Li, W., Ni, L., Li, Z., Duan, S. & Wu, H. Evaluation of machine learning algorithms in spatial downscaling of modis land surface temperature. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 12, 2299–2307 (2019).
Gao, F., Masek, J., Schwaller, M. & Hall, F. On the blending of the landsat and modis surface reflectance: Predicting daily landsat surface reflectance. IEEE Trans. Geosci. Remote Sensing. 44, 2207–2218 (2006).
Zhu, X., Chen, J., Gao, F., Chen, X. & Masek, J. G. An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions. Remote Sens. Environ. 114, 2610–2623 (2010).
Wu, P., Shen, H., Zhang, L. & Göttsche, F. Integrated fusion of multi-scale polar-orbiting and geostationary satellite observations for the mapping of high spatial and temporal resolution land surface temperature. Remote Sens. Environ. 156, 169–181 (2015).
Wang, N., Tian, J., Su, S. & Tian, Q. A downscaling method based on modis product for hourly Era5 reanalysis of land surface temperature. Remote Sens. 15, 4441 (2023).
Mascaro, G., Ko, A. & Vivoni, E. R. Closing the loop of satellite soil moisture estimation via scale invariance of hydrologic simulations. Sci. Rep. 9, 16123 (2019).
Essa, W., Verbeiren, B., van der Kwast, J., Van de Voorde, T. & Batelaan, O. Evaluation of the distrad thermal sharpening methodology for urban areas. Int. J. Appl. Earth Obs. Geoinf. 19, 163–172 (2012).
Agam, N., Kustas, W. P., Anderson, M. C., Li, F. & Neale, C. M. A vegetation index based technique for spatial sharpening of thermal imagery. Remote Sens. Environ. 107, 545–558 (2007).
Chao, L. et al. Geographically weighted regression based methods for merging satellite and gauge precipitation. J. Hydrol. 558, 275–289 (2018).
Sebbar, B. et al. Machine-learning-based downscaling of hourly Era5-land air temperature over mountainous regions. Atmosphere. 14, 610 (2023).
Anandhi, A., Srinivas, V. V., Kumar, D. N. & Nanjundiah, R. S. Role of predictors in downscaling surface temperature to river basin in India for Ipcc Sres scenarios using support vector machine. Int. J. Climatol. J. R. Meteorol. Soc. 29, 583–603 (2009).
Hutengs, C. & Vohland, M. Downscaling land surface temperatures at regional scales with random forest regression. Remote Sens. Environ. 178, 127–141 (2016).
Kolios, S., Georgoulas, G. & Stylios, C. Achieving downscaling of meteosat thermal infrared imagery using artificial neural networks. Int. J. Remote Sens. 34, 7706–7722 (2013).
Dong, C., Loy, C. C., He, K. & Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2015).
Qiao, C. et al. Evaluation and development of deep neural networks for image super-resolution in optical microscopy. Nat. Methods. 18, 194–202 (2021).
Shen, H., Lin, L., Li, J., Yuan, Q. & Zhao, L. A residual convolutional neural network for polarimetric sar image super-resolution. Isprs-J. Photogramm. Remote Sens. 161, 90–108 (2020).
Wu, J. et al. Fusing Landsat 8 and Sentinel-2 Data for 10-M dense time-series imagery using a degradation-term constrained deep network. Int. J. Appl. Earth Obs. Geoinf. 108, 102738 (2022).
Gorelick, N. et al. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).
Kumar, L. & Mutanga, O. Google Earth engine applications since inception: Usage, trends, and potential. Remote Sens. 10, 1509 (2018).
Onačillová, K. et al. Combining Landsat 8 and Sentinel-2 data in google earth engine to derive higher resolution land surface temperature maps in urban environment. Remote Sens. 14, 4076 (2022).
Zhao, Q. et al. Progress and trends in the application of Google Earth and Google Earth Engine. Remote Sens. 13, 3778 (2021).
Ye, H., Tang, S., Yang, C. & Chen, C. Reconstruction of daily modis/aqua chlorophyll-a concentration in turbid estuarine waters based on attention U-Net. Remote Sens. 15, 546 (2023).
Zhang, Z., Liu, Q. & Wang, Y. Road extraction by deep residual U-Net. IEEE Geosci. Remote Sens. Lett. 15, 749–753 (2018).
Zhang, G., Roslan, S. N. A. B., Wang, C. & Quan, L. Research on land cover classification of multi-source remote sensing data based on improved U-net network. Sci. Rep. 13, 16275 (2023).
Lian, X. et al. A multiscale local-global feature fusion method for sar image classification with bayesian hyperparameter optimization algorithm. Appl. Sci. 13, 6806 (2023).
Uçar, M. Deep neural network model with bayesian optimization for tuberculosis detection from X-ray images. Multimed. Tools Appl. 1–22 (2023).
Jing, Y., Lin, L., Li, X., Li, T. & Shen, H. An attention mechanism based convolutional network for satellite precipitation downscaling over China. J. Hydrol. 613, 128388 (2022).
Yang, G. J., Pu, R. L., Zhao, C. J., Huang, W. J. & Wang, J. H. Estimation of subpixel land surface temperature using an endmember index based technique: A case examination on aster and modis temperature products over a heterogeneous area. Remote Sens. Environ. 115, 1202–1219 (2011).
Zhang, Y., Tian, Y., Kong, Y., Zhong, B. & Fu, Y. Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 43, 2480–2495 (2020).
Acknowledgements
This research was funded by National Key Research and Development Program of China, grant number 2022YFE0203800.
Author information
Authors and Affiliations
Contributions
S.L. and H.W. conceptualized the study and developed the methodology. S.L. validated the study, curated the data, prepared the original draft, and created the visualizations. H.W. provided the resources and supervised the study. H.W., Q.Y., and X.W. reviewed and edited the writing. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, S., Wan, H., Yu, Q. et al. Downscaling of ERA5 reanalysis land surface temperature based on attention mechanism and Google Earth Engine. Sci Rep 15, 675 (2025). https://doi.org/10.1038/s41598-024-83944-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-83944-w
Keywords
This article is cited by
-
A super-resolution framework for downscaling machine learning weather prediction toward 1-km air temperature
npj Climate and Atmospheric Science (2026)
-
Future water stress in arid landscapes projected with GeoAI
Theoretical and Applied Climatology (2026)















