Main

Cities are the cradle of human civilization, ensuring human progress, scientific innovations and economic advancements. Despite the constructive development activities within cities, they have also created and intensified certain environmental challenges1,2. The Urban Heat Island (UHI) effect causing urban overheating is a prominent example of these concerns resulting from rising urbanization and anthropogenic activities3,4. It has seriously endangered human lives, well-being and ecosystems, ultimately leading to economic consequences5. In July 2023, the world experienced the hottest month on record, with widespread heatwaves across many countries6. Moreover, temperature extremes on land will increase even faster compared to the increase of global mean temperature (land and ocean) due to climate change from human activities4. Hence, holistically formulating an effective adaptation and mitigation strategy for the UHI effect has become a focal issue for sustainable urban development7,8.

The existing literature on mitigating the UHI effect is primarily focused on strategies that seek solutions within the city limits, such as the provision of green, blue or gray infrastructure including trees9, grass10, parks11, green walls12, green roofs13, lakes14 and so on. These function as urban cooling sources/heat sinks, reducing the temperature in surrounding areas through processes such as heat absorption, evapotranspiration, convection and circulation15,16. However, urban land is precious and limited for those mitigation interventions. These interventions have limited capacity to reduce UHI intensity in a specific urban district17,18, making it challenging to sufficiently reduce the UHI at the city scale. Urban heat is not confined within the physical boundaries of a city. Instead, urban heat can diffuse to the neighboring rural areas, which have more natural land cover than the city, including trees, rivers, grassland, cropland and so on.19,20,21. This suggests that rural land cover may also serve as an element for absorbing urban heat, thereby harboring the significant potential for mitigating urban heat islands. This offers a unique opportunity to mitigate the UHI effect through the utilization of neighboring rural land cover (NRLC) to effectively tackle the UHI challenge.

The existing literature discusses the advantages and implications of NRLC in mitigating the UHI22,23. Yao et al.24 reported that the effect of greening in rural areas was an important and widespread driver of diurnal surface urban heat island intensity variability, responsible for 22.5%. Two cities with relatively comparable urban configuration and population density may have different UHI intensities merely due to different surrounding rural land cover characteristics25. Given the limited number of investigations on UHI mitigation in rural areas, there remains a dearth of knowledge about the quantified impact of rural land cover on UHI. Specifically, there is lack of understanding regarding the influencing location and patterns of mitigating UHI through NRLC. It is significant to explore the impact of rural land cover patterns on mitigating UHI for enhancing the potential of their effective implementation.

The innovations of this study lie in: (1) systematically quantifying the spatial extent (location) of rural land cover that affect UHI along with physical mechanisms discussed and (2) identifying key factors of rural land cover and evaluating their potential in UHI mitigation. Both qualitatively and quantitatively associated relations are revealed between rural land cover and UHI mitigation based on Chinese cities, considering that China has undergone rapid urbanization in the past 30 years. These cities have vast characteristics such as diversity of geographic features, climatic conditions, urban forms, development and so on, which will facilitate the feasibility and validity of technology deployment. This study can significantly contribute to the development of UHI mitigation strategies and sustainable urban development.

Results

Locations of rural land cover for UHI mitigation

The quantitative influence of rural land cover on the Urban Heat Island Intensity (UHII) was analyzed as shown in Fig. 1, using the regression model depicted in Methods. Two important parameters of rural land cover were considered, that is, the distance from the urban boundary (locations) and land cover types. Here the urban areas were divided into five different rings, namely urban ladders ULi (i = 1–5). UL1 represented the area of urban center, whereas UL2–5 gradually expanded outwards (i = 5 corresponding to the urban boundary area), as shown in Fig. 2. The urban boundary was taken as the baseline and was extended outwards to obtain four rings, that is, rural ladders RLj (j = 1–4), of different radii. The inner boundary of each RL was the urban boundary. The average equivalent diameter of the selected cities was 22 km. Rural land cover was classified into five types: woodland, cropland, impervious surface, water body and grassland (grassland was not analyzed independently; explanation in Methods).

Fig. 1: Impact of rural land cover types in the different rural regions on the surface UHII in the different urban regions.
Fig. 1: Impact of rural land cover types in the different rural regions on the surface UHII in the different urban regions.
Full size image

ae, From the left to right, respectively: NRLC, woodland, cropland, impervious surface and water body. The horizontal coordinates represent the different urban ladders (ULi, i = 1–5). The vertical coordinates represent the explanation degrees (R2) of different cover types to the surface UHI. f, The schematic representation of urban regions and rural land cover (the variation of color range standing for different urban regions). g, The specific locations of various urban and rural regions.

Source data

Fig. 2: Generation process of urban ladders and rural ladders.
Fig. 2: Generation process of urban ladders and rural ladders.
Full size image

a, Urban ladders. b, Rural ladders.

Figure 1a shows that NRLC (considering the totality of five types) can achieve cooling for all ULs. Specially, NRLC had the largest explanation degree of nearly 30% on the surface UHII variance in UL1, due to the effect of heat island circulation and convection (Supplementary Information B). The NRLC in RL3, RL2 and RL4 had the highest explanation degree on the surface UHII variance in UL1,2,5, UL4 and UL3, respectively. Given that NRLC in RL3 still had the second largest explanation degree to the surface UHII variance in UL3,4, the NRLC in RL3 potentially had the greatest UHI mitigation capacity for the entire urban area compared with RL1,2,4.

Figure 1b–e shows that impervious surfaces had the most significant influence on surface UHII, followed by cropland, and finally woodland and water body. The explanation degree of impervious surface in RL3 and RL4 to surface UHII variance was compared, with the small difference being within 0.01. The UHI mitigation capacity of impervious surface in RL3 and RL4 was greater than that in RL1 and RL2. Cropland and woodland in RL1 presented a markedly lower explanation degree to UHI variance compared with other RLs. Cropland in RL3 explained the greatest degree of the UHI variance in UL3,5 and exhibited a slightly lower explanation degree compared with the maximum degree in UL1,2,4 (that is, difference of 0.015, 0.005, 0.005 for UL1, UL2 and UL4, respectively). Water body in RL2 explained the UHI variations of all ULs to a higher extent than other RLs. Supplementary Fig. A2 shows the corresponding combinations ULiRLj, that is, NRLC or land cover types in RLj have the largest explanation degree of surface UHII variance in ULi compared with other RLs. Taking NRLC as an example, the corresponding combinations were UL1–RL3, UL2–RL3, UL3–RL4, UL4–RL2, UL5–RL3.

Key parameters of rural land cover for UHI mitigation

This section aimed to rank the landscape-level parameters (LLPs, used for NRLC) with correlations to the UHI mitigation. These parameters were determined through the calculation of SHapley Additive exPlanations (SHAP) values and Pearson correlation analyses. In the previous section, the corresponding RL for each UL (rural land cover in RL explained most of the surface UHII variance in UL) was obtained. On the basis of it, Fig. 3 illustrates the ranking of the SHAP values for the LLPs in rural regions, which reflects the capability to mitigate the surface UHII of the corresponded urban areas (that is, ULi, i = 1–5). The higher the SHAP value, the better the UHI mitigation. It can be noted that AI (aggregation index), COHESION (patch cohesion index), PR (patch richness) and DIVISION (Landscape Division Index) were almost at the top of the SHAP rankings. To sum up, the key LLPs for different corresponding combinations on UHI mitigation were: UL1–RL3 (PR, DIVISION, NP (number of patches)); UL2–RL3 (AI, IJI (interspersion & juxtaposition index), PR, NP); UL3–RL4 (PR, LSI (landscape shape index), COHESION, NP, IJI); UL4–RL2 (LSI, COHESION, IJI, PR); and UL5–RL3 (COHESION, LSI, PR, NP) (more details can be found in Supplementary Table A1).

Fig. 3: SHAP ranking of LLPs for different combinations of RL and UL.
Fig. 3: SHAP ranking of LLPs for different combinations of RL and UL.
Full size image

Columns from left to right represent UL1–RL3, UL2–RL3, UL3–RL4, UL4–RL2 and UL5–RL3.

Supplementary Figs. A3 and A4 and Supplementary Table A1 show the extraction process and final results regarding the key parameters of four land cover types. The key parameters of woodland were mainly NP, IJI and AI. The key parameters of cropland were mainly CIRCLE_AM (circle index distribution) and IJI. The key parameters of impervious surface were mainly IJI, SPLIT (Splitting Index) and CIRCLE_AM. The key parameters for water body were mainly IJI, PD (patch density), CA (total (class) area), SHAPE_AM (shape index distribution) and CLUMPY (clumpiness index).

Impact of key parameters of rural land cover on UHI

This section aimed to elaborate the individual impact of the key LLPs on UHII for the aforementioned combinations (that is, UL1–RL3, UL2–RL3, UL3–RL4, UL4–RL2 and UL5–RL3), as presented in Fig. 4. Most of the key landscape parameters and the surface UHII were close to a simple monotonic relationship. For instance, taking the urban region UL1 (the hottest region) as an example (that is, UL1–RL3; Fig. 4a): (1) the value of surface UHII decreased as PR increased (PR, namely patch richness, indicates the total number of land cover types); (2) the value of surface UHII decreased as DIVISION increased (DIVISION, that is, landscape division index, indicates the probability that two randomly chosen pixels in the landscape are not situated in the same patch) and (3) the value of surface UHII decreased as NP decreased.

Fig. 4: The accumulated local effects of each key parameter of LLPs on the surface UHII.
Fig. 4: The accumulated local effects of each key parameter of LLPs on the surface UHII.
Full size image

Accumulated local effects (ALE) plot is centered so that the mean effect is zero. The value of the ALE can be interpreted as the main effect of the feature at a certain value compared to the average prediction of the data, that is, the smaller the value, the more effective it is in UHI mitigation. Panels a–e correspond to the five combinations of urban and rural ladders, that is, UL1–RL3, UL2–RL3, UL3–RL4, UL4–RL2 and UL5–RL3. The effects of key parameters of LLPs for each combination on the surface UHI are shown by subpanels under the panel of each combination, that is, subpanels (i)–(iii) for PR, DIVISION, NP in a; subpanels (i)–(iv) for AI, IJI, PR, NP in b; subpanels (i)–(v) for PR, LSI, COHESION, NP, IJI in c; subpanels (i)–(iv) for LSI, COHESION, IJI, PR in d and subpanels (i)–(iv) for COHESION, LSI, PR, NP in e.

Figure 4a–e columns elaborate that some key LLPs belonged to more than one UL and had a similar relationship with the surface UHII of different ULs. For example, the surface UHII at UL1,2,3,5 decreased with the decrease of NP, the surface UHII at UL3,4,5 decreased with the increase of LSI and COHESION and the surface UHII at UL2,3,4 decreased with the decrease of IJI. This finding provided an opportunity to achieve effective cooling of a large part of the urban area through simple regulation of the same landscape parameters. The influencing pattern of key parameters, which belong to more than or equal to three ULs and having the same influence pattern on different ULs, were recommended to be used for generating key strategies on UHI mitigation. The other key LLPs, which can only mitigate surface UHII of a particular UL or have different influencing mechanisms on the surface UHII of different ULs, were used for generating supplementary strategies for localized area of refined UHI mitigation. As shown in Fig. 5, the influence patterns of NP, IJI, LSI and COHESION were considered in key strategies to mitigate UHI, that is, (1) decreasing the number of patches (NP); (2) decreasing the even distribution of adjacencies among patch types (IJI); (3) avoiding square patch shapes (LSI) and (4) increasing the connectedness of the patches (COHESION). AI, DIVISION and PR were selected in complementary strategies. Taking the combination of UL2–RL3 as an example: (1) increasing the connectedness of the patches (AI) and (2) increasing the number of patch types (PR).

Fig. 5: Strategies and suggestions.
Fig. 5: Strategies and suggestions.
Full size image

Strategies and suggestions for the key LLPs on UHI mitigation.

The impacts of key LCPs (landscape class parameters) for four cover types on surface UHII and the process of generating key strategies can be seen in Supplementary Figs. A5A12. The influencing pattern of IJI, NP and AI for woodland were selected in key mitigation strategies on UHI. The influencing pattern of CIRCLE_AM and IJI for cropland; CIRCLE_AM, SPLIT and IJI for impervious surface; and NP, CA, CLUMPY, SHAPE_AM and IJI for water body were also selected in main strategies. All the main strategies are summarized in Table 1.

Table 1 Strategies and recommendations for NRLC to mitigate UHI

Discussion

The rural regions, with their rich natural land cover and simpler functional patterns, hold great potential for mitigating UHI24. This study aims to bridge this knowledge gap by investigating both quantitative and qualitative influence of NRLC on UHI mitigation in China from 2000 to 2020, as shown in Extended Data Figs. 1 and 2. Results indicate that NRLC can possess the capacity to mitigate UHI for entire cities. Specifically, we discover that NRLC can contribute to urban cooling, with the most pronounced impact occurring within a 10–15 km radius from the urban boundary, which is closely interacted with the urban area. It further suggests that NRLC within this range can contribute up to 30% to the reduction of UHII in urban centers. The richness and density of landscape patches emerge as key factors in mitigating UHI, with the potential to reduce UHI by 0.5 °C through the modulation of key parameters. More suggestions are summarized in Table 1.

Why do we need rural land cover types at specific locations to mitigate UHI? We explain this through urban physics, leveraging the concept of heat island circulation (Supplementary Information B) and convection. Heat circulation holds paramount importance in urban ventilation and the exchange of energy between urban and rural environments. In this dynamic circulation process, air is warmed up within urban areas due to the effect of buoyancy force, creating a low-pressure zone near the ground. Subsequently the heated air will be transported to the rural regions via convection and diffusion, drawing cooler air from rural areas to continuously replenish the urban core. Our study reveals that RL3 land cover can exert the most significant effect on UHI, with its circular radius encompassing a range of 10–15 km, which is approximately half of the city’s equivalent diameter, similar to the finding from Fan et al.26

During the heat circulation cycles between different ladders (ULi and RLj), the heat is absorbed by the rural landscapes at a different extent depending on the types and locations. Hence, well-designed landscape patches (including features such as the richness and density) should be promoted to realize self-cooling in rural regions.

The complexity and diversity of urban characteristics, including shape, development level, geographical location and climatic conditions, pose potential risks of introducing significantly deviated findings in this study. To address it, our research focuses exclusively on single-centered cities exceeding 200 km2, primarily characterized by plains interspersed with scattered terraces, hills and low mountains. This approach helps to reduce the impact of the cities’ shapes and geographical features under investigation. Furthermore, cities are categorized into five concentric rings (ULi) based on their varying urban development intensities (UDI). This stratification enables a differentiated analysis of the impact of rural land cover on UHI effect across different urban development intensities. By doing so, we successfully group cities based on their urban development levels, thereby limiting and quantifying the influence of urban development on our findings. To validate the influence of climate, cities are grouped according to their climatic zones, and separate analyses are conducted. The mechanisms of rural land cover in UHI mitigation have differences between various climate zones. However, a significant impact of climate is not observed from the results. The overlap of key landscape parameters between different climate zones and China is basically higher than 0.7. Additionally, the majority of mitigation strategies identified in China are transferrable to different climate zones (Supplementary Information C). Consequently, our findings still have relatively high generalizability and applicability in different cities. In the future, this study holds promise in offering valuable methodological and strategic guidance for refinement studies at the city scale and the development of context-specific policy formulations.

The heat island circulation can cross the physical urban boundaries to facilitate heat exchange between cities and rural areas to mitigate UHI. However, at the same time, due to the interaction and collision of energy and heat between cities and rural areas, it may lead to a large amount of pollutants flowing back into the city with the heat island circulation, causing pollution to the urban ecosystem. The local microcirculation between urban, suburban and urban–rural (buffer zone) areas can be improved by rationalizing the landscape patches of suburban and rural areas to provide spaces for heat exchange and pollutant filtration and sinking and to avoid pollutant refluxes. This study shows that different types and locations of rural landscapes may mitigate UHI to different degrees due to the different temperature gradients of the thermal cycles.

To sum up, rather than perceiving urbanization as an undesirable trend that opposes sustainable urban development, it is more constructive to embrace it as a continuous process. Unlike the intricate process of balancing urban development with sustainability, the regulation of rural land cover would yield numerous co-benefits for both urban and rural areas, including offering a nature-based solution without encroaching on urban land3, preserving rural landscape, boosting rural economy, assisting in mitigating the UHI and supporting ongoing urban prosperity and sustainability.

Methods

This study aims to explore the impacts of neighboring rural land cover (locations and landscape types) on urban heat island (UHI) mitigation. The method logic contains three main scenarios: (1) investigating the influence of rural land cover in different locations on UHI at the urban scale; (2) extracting the key rural landscape parameters on UHI mitigation and (3) identifying the impact of individual key landscape parameters on UHI and proposing key mitigation strategies. The applied research framework is shown in Extended Data Fig. 1. First, 30 Chinese cities are selected as case studies. The data of the UHI intensity (UHII) and rural land cover for these cities are collected. Second, urban areas are divided into urban ladders (ULi) based on UDI27, and rural areas are divided into Rural Ladders (RLj) with different distances from the urban boundary28. The UHII values of different ULi are calculated and the land cover types of different RLj are categorized. Third, regression models are used to analyze the impact of different rural land cover from varying distances to the urban boundary29. Then, SHapley Additive exPlanations (SHAP) is employed to rank the key landscape parameters of rural land cover, including landscape-level parameters (LLPs) and landscape-class parameters (LCPs)30. Finally, accumulated local effects (ALE) plots are used to reveal the impact of individual key landscape parameters on UHII31.

Case studies

On the basis of the China Urban Statistical Yearbook 2020 (http://www.stats.gov.cn), 30 monocentric cities in China are selected with an urban area of more than 200 km2 for investigation32. Documents have reported that the urban shape substantially influences the UHI; monocentric cities are more likely to experience the severe UHI phenomena33. As shown in Extended Data Fig. 2, except Urumqi, all the cities are evenly distributed in China’s monsoon climate zone, representing a high level of urbanization. The landform characteristics of the sample cities can basically be categorized as dominated by plains, with scattered terraces, hills and low mountains in the city. Climatic differences in different regions of the country have negligible effects on the results, with sufficient data demonstrated in Supplementary Information C.

Data collection of UHII and rural land cover

Land cover data for 2000, 2005 and 2010 are obtained through Landsat 5 TM (Thematic Mapper) and for 2015 and 2020 through Landsat 8 OLI (Operational Land Imager). According to the common reference system of remote sensing monitoring in China (National Land Use/Cover Classification System for Remote Sensing Monitoring), the rural land cover was divided into five types: impervious surface, woodland, grassland, cropland and water body34. Neighboring rural land cover (NRLC) includes the totality of all land cover, that is, impervious surface, woodland, grassland, cropland and water body. When analyzing landscape types independently, only four land cover types (impervious surface, woodland, cropland and water body) are chosen and grassland is excluded. Grassland has limited latent heat and low heat absorption efficiency35,36 and is not a common land cover type in rural area neighboring cities of China. The training sample points are equally distributed throughout the sample region obtained from high-resolution photos in Google Earth Pro. Landsat 5/8 Level 2, Collection 2, Tier 1 dataset and the Random Forest (RF) algorithm model are employed for land classification. The quality of categorization is assessed through Kappa values, which are all greater than 0.90.

This study incorporates 18 LLPs such as total area (TA), contagion (CONTAG) and Shannon’s evenness index (SHEI) and 22 LCPs such as the largest patch index (LPI), edge density (ED) and intersectionality and juxtaposition Index (IJI)37. LLPs are indicators for the landscape as a whole (NRLC); LCPs are indicators for individual landscape types (impervious surface, cropland, water body, woodland). The selected LLPs and LCPs are listed in Supplementary Table A2. With the support of ArcGIS 10.2, the ArcGrid raster format images from 2000 to 2020 are imported in Fragstats 3.4. Background values of landscape types are filtered using the Class properties file. Finally, the LLPs and LCPs are selected and calculated.

The surface UHII is calculated based on remotely sensed land surface temperature (LST) data. LST is considered to be strongly connected to near-ground temperature and is commonly employed to investigate the geographic and temporal features of the UHI impact38. The LST dataset for the selected cities incorporates synthetic temperature data from MOD11A1 V6.1/LST_Day_1km in July and August. Due to its wide bandwidth, this dataset is well suited to regional-scale cross-sectional data analysis and modeling. Previous research has demonstrated the reliability of MODIS LST results, with errors typically within 1 K (ref. 39). The monthly average of the daily LST values is used to calculate the average summer LST for the selected cities in 2000, 2005, 2010, 2015 and 2020.

This study selects a reference line obtained by offsetting 20 km from the urban boundary as the baseline to calculate the surface UHII40. A ring equal to the urban area is obtained. The LST of this ring is considered to be unaffected by the UHI footprint and used to calculate the surface UHI. In this approach, the non-urban area used for calculating the surface UHII does not overlap with the RL area, to minimize the impact of UHII variation resulting from the temperature change of RLj.

$$\begin{array}{c}{\mathrm{surface}}\;{\mathrm{UHII}}_{{\mathrm{UL}}_{i}}={\mathrm{LST}}_{{\mathrm{UL}}_{i}}-{\mathrm{LST}}_{\mathrm{rural}}\\(i=1,2,3,4,5)\end{array}$$
(1)

where \({\mathrm{surface}}\;{\mathrm{UHII}}_{{\mathrm{UL}}_{i}}\) is the surface UHII of ULi in °C; \({\mathrm{LST}}_{{\mathrm{UL}}_{i}}\) is the average LST for ULi in June to August, °C; and \({\mathrm{LST}}_{\mathrm{rural}}\) is the average LST of the ring from June to August, °C.

Demarcation of ULi and RLj

To investigate the impact of rural land cover on UHI mitigation of different urban regions (at a city scale), both urban and rural areas are demarcated for the sake of cross analysis. The cities are divided into five ULi and the selected rural areas are divided into four RLj.

The urban development intensity indicator (UDIi), which typically exhibits a linear correlation with UHI41, is used for the division of urban areas. Different ULi correspond to different UDIi with certain intervals. Each city is subdivided into five ULi to maximize the segmentation of the metropolitan area and prevent the ULi from becoming excessively small and fragmented42. There are significant differences in surface UHII between the different ULi shown in Supplementary Table A4. The city clustering algorithm (CCA) is utilized in this study to define city boundaries43. Initially, a city map with a resolution of 3,000 m is generated using an UDI threshold higher than 25% (ref. 44). UDI is calculated as the proportion of total number of impervious grids within each 3,000 × 3,000 m pane45, as shown in equation (2). Subsequently, the urban area is identified using CCA with a clustering parameter of 3,000 m, corresponding to the spatial resolution of the initial urban map45. Therefore, two pixels in the city maps previously processed through UDI with a distance between pixel centers not exceeding 3,000 m (clustering parameter of CCA) is assigned to the same city. Then, the complete shape of an urban area is obtained and the periphery of the urban area is extracted as the urban boundary. Finally, UDI intervals within the urban area are further subdivided to better represent the changes in UHI along the UDI. With an UDI interval of 15%, five ULi of UL1(UDI = 85–100%, UL2(UDI = 70–85%), UL3(UDI = 55–70%), UL4(UDI = 40–55%) and UL5(UDI = 25–40%), were derived, as illustrated in Fig. 1.

$${\mathrm{UDI}}_{i}=\frac{{S}_{{\mathrm{Impervious}}\;{\mathrm{surface}},{\mathrm{i}}}}{{S}_{i}}\times 100 \% ,$$
(2)

where i denotes the ith image element on the raster map; UHIi denotes the Urban Development Intensity value of the ith image element; Si denotes the total area of the ith image element; and SImpervious surface, i denotes the area of imperviousness in the ith image element.

The RLj outside the urban area is delineated to ascertain the extent of rural land cover that can exert the most significant impact on the UHI. In previous studies, the widely adopted approach for determining RLj radius is to make the RLj area equal to the urban area46 or to employ a uniform RLj radius of 5 km or 10 km (refs. 47,48). However, when the RLj radius is too small, for instance, less than 1 km, it becomes challenging to reflect the true rural LST and there is also likelihood of it being influenced by the UHI footprint49. Therefore, in this study, we adopt a varying RLj radius methodology50. On the basis of the selection of the rural region in most UHI mitigation studies and the distances between urban boundaries of sample cities (approximately 40 km). We tentatively establish a maximum study area of 20 km from the urban boundaries for rural land cover. The rural area within this boundary is also in close proximity to the metropolitan area. The width of the RLj radius, denoted as D, is determined to change by one level at a distance of 5 km. This choice aligns with the 900-m resolution of rural land cover, and the observed land cover changes between different RLj are significant. Thus, the urban boundary is offset outward by four RLj, each with varying ring widths. As shown in Fig. 1, the minimum and maximum radii for these RLj are set as 2.5 km and 5 km (RL1), 5 km and 10 km (RL2), 10 km and 15 km (RL3) and 15 km and 20 km (RL4), respectively. The specific RLj radius of each city is calculated by equation (3).

$$D={D}_{\min }+\left({D}_{\max }-{D}_{\min }\right)\times \frac{\left(S-{S}_{\min }\right)}{\left({S}_{\max }-{S}_{\min }\right)},$$
(3)

where D is the radius of the RLj; Dmin is the minimum radius of the RLj; Dmax is the maximum radius of the RLj; S is the urban area of the sample cities; Smin is the minimum metropolitan area among all sample cities; and Smax is the maximum urban area among all sample cities.

Analytic method to determine rural land cover regions

After dividing urban and rural areas, the rural region (RLj) to maximize the surface UHI mitigation for each urban area can be determined by R2. R2 is a parameter reflecting the goodness of a regression model51. This statistic also indicates the percentage of variance in the dependent variable that is jointly explained by the independent variables. In other words, R2 offers a measure of how effectively the variations in surface UHII can be explained by the NRLC in the regression model52. The greater the extent to which rural land cover elucidates variations in the UHI, the more pronounced its influence on UHI dynamics and its efficacy in mitigating UHI effects. On the basis of this, seven machine learning models (Lasso regression, Ridge regression, ElasticNet regression, Random Forest regression, Support Vector regression, K-Nearest Neighbors regression and Multilayer Perceptron regression) are used to train the dataset (independent variable: LLPs and LCPs, dependent variable: surface UHII) in this study. These models are selected because they are significantly different in training methods and recognized for their ability to train high-dimensional data with various capabilities53. To achieve better training results, a tenfold cross validation is used to tune the model parameters. To prevent model overfitting, the dataset is divided into training (70%) and test (30%) sets. Random seeds may significantly affect the model training outcomes54. To ensure the robustness of the results, random seeds controlling the division of the training and test sets are not defined, and the model output parameters are averaged over multiple training sessions55. The average values of model output parameters reach equilibrium when the number of trainings reaches 100. The R2 of each regression model in this study is the average value obtained after training 100 times. The non-parametric test is also used to test for significant differences in the training results of different regression models, showing significant variability between the models. In this context, a single example will yield seven distinct R2 values derived from seven separate regression models. The highest among these seven R2 values is extracted to measure the degree to which rural land cover accounts for variations in UHI changes, termed as the ‘explanation degree’, as depicted in equation (4). The next interpretability analyses are carried out based on the model corresponding to the largest R2 (explanation degree).

$${\mathrm{Explanation}}\;{\mathrm{degree}}=\mathop{\mathrm{max}}\limits_{k}\left(\frac{1}{N}\mathop{\sum }\limits_{n=1}^{N}{R}_{n,\,k}^{\,2}\right),$$
(4)

where R2 denotes the extent to which the independent variables in a regression model explain changes in the dependent variable and k represents seven regression models.

By comparing the explanation degree (R2) of different rural regions (RLj), the corresponded rural region (RLj) to maximize the surface UHI mitigation for each urban area (ULi, i = 1–5) are obtained, and together make up the combinations (RLj–ULi).

In this study, three linear regression models (including Lasso regression, Ridge regression and ElasticNet regression) and four nonlinear regression models, including Random Forest regression, Support Vector regression, K-Nearest Neighbors regression and Multilayer Perceptron regression, are used for correlation analyses, which are described as follows. Supplementary Fig. A1 shows the distribution of R2 for these regression models, and the appropriate model is determined by comparing the R2 values.

Ranking method of key landscape parameters for rural land cover

On the basis of the obtained corresponding combinations (RLj–ULi), the average marginal contribution of different landscape parameters to UHI changes is calculated by coupling the SHAP model with the best-fit machine learning model obtained in the previous section. SHAP belongs to the method of ex post interpretation. The basic idea is to calculate the marginal contribution of a feature when it is added to the model and then take the mean value, that is, the SHAP baseline value of the feature, considering the different marginal contributions in the case of all feature sequences. The SHAP values of LLPs and LCPs are sorted and accumulated, and when the accumulated amount reaches 80% of the total number, the parameters whose SHAP values are accumulated are chosen. Because these parameters may not be independent, considering all these parameters may affect or even restrict each other when applying. Therefore, the parameter screening is done by the correlation analysis. Most of the existing data analysis studies use 0.5 to 0.7 as the correlation value for high-dimensional parameter screening56,57, and we chose the middle value of 0.6 as the screening value for this study. In the set of data with correlation coefficients greater than 0.6, parameters with smaller SHAP values are eliminated. This is because the larger the SHAP value, the larger the effect of the parameter on the UHI within its own range of variation.

Influence of the key rural landscape parameters responding to UHI

ALE are used to recognize the influencing patterns of key landscape parameters on UHI mitigation. ALE is a global explanation technique that can describe how key parameters affect the prediction values from a machine learning model, which is a faster and unbiased alternative to partial dependence plots. In this study, ALE can examine the relationship between feature values (that is, landscape parameters) and target variables (that is, UHII). ALE averages and adds the difference in predictions throughout the key landscape parameters, thereby isolating the impacts of each feature value, which is at the cost of a greater number of observations and a nearly uniform distribution. Overall, the ALE model shows the main effects of individual predictor variables and their second-order interactions in black-box supervised learning models that are easy to understand. According to the interactive relationships between variables, ALE plots can be generated based on the fitted supervised learning model.

Data analysis

The data analysis process is shown in Supplementary Fig. A17, which can be considered as a nesting of three loops. The first level of the loop is to train the LLPs and LCPs (independent variable) for a specific RLj and the UHII (dependent variable) for a particular ULi through seven machine learning models. The regression model with the largest R2 is considered as the best trained. This regression model and the corresponding R2, that is, explanation degree, are the output terms of the first layer of the loop. The first loop is to obtain the best-trained model for the specified ULi and RLj (corresponding combination). The second level of the loop, based on the result of the first loop, can be used to compare the extent to which the land cover of different RLj affects the UHI of a specified ULi. At the end, to output the rural region that has the most significant effect on the UHI of a specified ULi. The first two layers of the cycle allow the first objective of this study to be achieved. The criterion for judging whether to continue analyzing the key parameters and patterns of the impact of rural land affects the UHI (the third level of the cycle) is whether there is a RLj of land cover that has a greater than 0° impact on the intensity of the heat island in that urban ladder. The third level of the cycle performs the previous steps once in each of the five ULi to obtain the region of rural land cover that has the most significant impact on the UHI of the respective ULi and the impact extent of rural land cover on the UHI, that is, explanation degree. If explanation degree is less than 0, it suggests that the UHI of this ULi is not affected by rural land cover. If explanation degree is greater than 0, the LLPs and LCPs of rural land cover are ranked from the largest to the smallest by SHAP value. Accumulation starts with the first SHAP value and stops, when the value reaches 80% of the total. The accumulated LLPs and LCPs are subjected to the correlation analysis, and parameters with lower SHAP rankings in the set of parameters with correlation coefficients greater than 0.6 are deleted, which aims to identify the key parameters of rural land cover affecting the UHI. Finally, ALE plots of these key parameters are plotted to explain the pattern of UHI response to the key parameters. At this point, the last question of this study is answered.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.