Introduction

Rural settlements carry a myriad of critical functions related to the production and daily lives of rural residents, emerging from the long-term interaction between humans and their natural environments [1, 2]. Over time, these interactions have imbued rural settlements with archaeological values reflecting historical, cultural, and social significance [3]. The rapid development of the global economy has induced profound changes in urban-rural dynamics, leading to a global phenomenon of rural decline and increasing concern for the simultaneous erosion of rural heritage. In 2017, ICOMOS proposed that all rural areas could be interpreted as heritage sites and encouraged nations to establish rural heritage inventories at various levels to enhance the protection of these heritages [4]. As a major agricultural country with a large rural population and extensive geographic area, China has approximately 580 million people distributed across millions of undeveloped villages. Since the economic reforms initiated in 1978, China has accumulated a vast and diverse array of rural heritages [5]. During this period, China’s urbanization has continuously advanced, marked by a rapid urban-rural transformation. The urbanization rate increased from 17.9% in 1978 to 63.9% in 2020 [6]. Under the development philosophy of “strong cities, weak countryside,” rural settlements generally face challenges such as population outflow, resource scarcity, and economic frailty [7, 8]. Issues such as depopulation and aging within rural residential areas have become prevalent in China, posing challenges to the preservation and development of rural heritage [9].

In 2012, China initiated the “Traditional Villages Survey,” introducing the concept of Traditional Chinese Villages (TCV) to better protect China’s rural heritage. TCV are defined as villages that were established early on, possess a wealth of traditional resources, and have significant historical, cultural, scientific, artistic, social, and economic values worthy of preservation [10]. The launch of the TCV survey sparked extensive research on rural settlements in China, accumulating significant findings in areas such as spatial form [11], landscape imagery [12], value assessment [13], archaeological preservation [14, 15], and the impacts of tourism development [16,17,18]. Additionally, numerous studies have analyzed the spatial distribution patterns of TCV at the regional level, summarizing their characteristics and identifying key areas for heritage conservation [19], leading to contributions to recognizing and safeguarding important rural heritage conservation areas [20].

Numerous studies have begun to focus on how different location conditions influence the distribution patterns of rural heritage, summarizing the underlying determinants. The migration and evolution of rural settlements are influenced by a complex interplay of natural, economic, and social factors [1, 21]. Understanding these intricate coupling relationships remains one of the key challenges in the field of rural settlement research [22]. The exploration of large-scale, selforganizing variable factors in the evolution of rural settlement patterns has become a necessary component in achieving the revitalization of rural China. This encompasses both theoretical inquiries and practical applications aimed at enhancing rural areas’ sustainability and cultural integrity [23, 24].

While the TCV directory has generated significant discourse on rural heritage, research into the macro distribution and influencing factors of TCV remains subject to further scrutiny. The selection mechanism and process for TCV impose limitations that result in blind spots in their spatial distribution, leading to significant local clustering and overall extreme imbalance [25]. Studies on the macro spatial distribution focusing on the TCV can only to a limited extent reveal the correlation between the distribution characteristics of the currently listed TCV and influencing factors such as the current natural environment and social conditions. Some studies have begun to focus on historical rural settlements (HRS) with larger sample sizes as their research subjects. These efforts focus on studying spatial distribution and influencing factors, aiming to accurately reconstruct the real state of rural heritage distribution on a national or regional scale, and to elucidate the relationships with specific external environmental conditions [26,27,28]. The findings from these dataset construction efforts reveal that China still has a vast amount of valuable rural heritage that has not yet been fully explored. However, due to the difficulty in obtaining historical data, acquiring a more comprehensive database of HRS samples has become one of the obstacles hindering current research on the macro distribution of rural heritage.

Furthermore, on the methodological level, the development of geographic information technology and geospatial big data has facilitated the quantitative and multi-scale analysis of distribution patterns and variable factors concerning rural heritage. The integration of various types of geospatial big data provides auxiliary support to field research in the study of rural heritage. Moreover, statistical analysis methods such as regression models [29], Geodetector [30, 31], and panel data models [32] have also been incorporated into rural heritage research. These studies have more broadly identified potential quantitative variables that could impact the distribution of rural settlements, deepening the understanding of geospatial big data in rural heritage research and preliminary explaining the underlying determinants of rural heritage distribution from a mathematical and logical perspective.

Rural settlements exhibit distinct spatial attributes, with their evolution driven by nonlinear dynamics and spatial non-stationary. Traditional quantitative analysis methods often overlook these characteristics, which can lead to results that deviate from reality [33]. Consequently, scholars have employed traditional Geographic Weighted Regression (GWR) models to study the spatial differentiation of traditional villages across various scales, including nationwide in China [34], the Wuling Mountain area [35], and the middle reaches of the Yangtze River urban agglomeration [36]. These studies have confirmed that the spatial distribution of rural heritage is significantly influenced by geographic location, and the effectiveness of driving factors shows noticeable spatial heterogeneity [37]. However, a limitation of the GWR is its assumption that all variables operate at the same spatial scale. Factors influencing the spatial distribution of rural settlements may exhibit similarities at certain spatial scales, significant differences may arise beyond those scales [38]. While GWR can account for spatial non-stationary, it fails to capture the scale differences in spatial heterogeneity of various influencing factors [39]. The Multi-Scale Geographic Weighted Regression (MGWR) model addresses these shortcomings by not only analyzing spatial heterogeneity effectively but also by automatically adjusting the regression bandwidth for different influencing factors. This adjustment allows for a more accurate reflection of the spatial variations of these factors [37], thereby providing more reliable results in the analysis of spatial heterogeneity [40].

In conclusion, the increasingly refined integration of spatial datasets on rural heritage with spatial statistics facilitates a deeper understanding of rural heritage research and comprehensive conservation planning. In this study, HRS is defined as rural settlements established before the Republic of China era (1912 A.D.) that have been preserved to the present day. TCV falls within the definition of HRS, and therefore, TCV is also considered a component of HRS. The objective is to develop more objective and applicable methods for extracting HRS data and to broadly examine the spatial heterogeneity of different influencing variables. Using the Taihang Baxing (THBX) area as a case study, we analysis the spatial distribution characteristics of HRS, evaluate the impacts of underlying driving factors through correlation analysis and spatial regression models, and test for spatial heterogeneity.

The main research questions addressed in this study are as follows:

  • How to objectively obtain geographic information about existing HRS?

  • What are the distribution patterns and underlying determinants of HRS?

  • What is the spatial heterogeneity of the roles played by variables on the distribution patterns of HRS?

By addressing these issues, the study aims to provide key insights into the distribution and determinants of rural heritage. These insights will offer guidance for incorporating historical and cultural heritage into regional development planning. Additionally, the findings will contribute a macro-level theoretical reference for national strategies aimed at rural revitalization, thus supporting efforts to preserve and enhance rural areas in the context of their unique historical and cultural values [3].

Materials and methods

Research framework

The research framework of this study is illustrated in Fig. 1. This study began by collecting and filtering geographic information on HRS, thereby creating a dataset for the THBX. Subsequent steps involved collecting various variables to test their validity for the analysis, culminating in the selection of the MGWR to analysis spatial heterogeneity. The findings related to these research questions are discussed in Sections "Analysis of HRS spatial distribution patterns", "Applicability analysis of selected variables", and "Analysis of spatial heterogeneity in the effects of variables". Finally, Section "Discussion" serves as a retrospective overview of the study, discussing the insights gained, the limitations encountered, and the prospects for future research.

Fig. 1
figure 1

Research framework

Study area

The Taihang Mountains, located in northern China, extend from the northeast to the southwest and act as the geographical boundary between the North China Plain and the Loess Plateau. This range also marks the transition zone between the second and third terraces of China’s topography. Characterized by its complex terrain and challenging access, the region features eight valleys formed by river erosion, which serve as crucial roads connecting the east and west sides of the Taihang Mountains. Historically known as the Taihang Baxing (THBX), the word ’Xing’ refers to ancient paths. These include, from north to south, the Jundu Xing, Puyin Xing, Feihu Xing, Jing Xing, Fukou Xing, Bai Xing, Taihang Xing, and Zhiguan Xing, as depicted in Fig. 2a. Alongside the Silk Road and the Grand Canal, the Taihang Baxing are an integral part of China’s route-based cultural heritage [41, 42]. As history has progressed to the present, THBX no longer solely represents the eight ancient paths; it has come to represent the area where these paths are located, becoming a cultural symbol with regional attributes. The military, cultural, economic, and transport features that have emerged along the THBX have fostered the development of uniquely characteristic settlements clusters in this region.

As is shown in Fig. 2b, the overall distribution of TCV across the country is characterized by a higher concentration in the south than in the north, and more in the east than in the west. The THBX stands out as an important and unique high-density cluster of TCV in the north. This suggests that the settlement characteristics of this region are representative of northern Chinese settlements.

Consequently, as shown in Fig. 2c, the study area encompasses parts of Beijing, Hebei Province, Henan Province, and Shanxi Province, spanning four provinces and municipalities. The boundaries of the specific study area are primarily determined by the administrative boundaries, covering a total of 120 county-level administrative units. Notably, 11 of these counties have been selected for the national list of “Demonstration Counties for Concentrated and Continuous Protection of Traditional Villages,” which ensures the effectiveness of our subsequent HRS information extraction efforts [43,44,45].

Fig. 2
figure 2

a Profile of THBX (including the position of Eight Ancient Paths). b Provinces involved in THBX. c Distribution of TCV nationwide

Collection of HRS information and geographic location

The initial step of this study involves proposing a method for selecting HRS within the THBX, focusing on compiling an HRS dataset along with corresponding geospatial information. Figure 3 illustrates the workflow for capturing the spatial locations of HRS. The criteria for selecting HRS are defined as rural settlements established prior to the founding of the Republic of China (1912 A.D.) that have persisted to the present day. Research in the field of historical toponymy has indicated that place names serve as linguistic evidence of a region’s historical civilization, providing a link between the present and the past. The inheritance displayed by place names supports archaeological and heritage studies [46]. In China, a Chinese character has a more direct connection to its represented meaning compared to an English word and has maintained a stable correspondence between symbols through historical evolution (from traditional to simplified Chinese). Therefore, by matching recorded HRS names with the names of contemporary rural settlements, it is possible to identify existing HRS [47]. This method leverages the linguistic and historical continuity inherent in place names to bridge historical and contemporary data, enhancing the accuracy and depth of rural heritage studies.

Three types of historical information data are accessible for research: historical documents, historical maps, and modern documents, all formally recorded and categorized by administrative divisions. Martínez de la Fuente has demonstrated that historical documents contain crucial information on HRS and possess significant potential for studies in spatial distribution [28]. After collection and organization, the historical data for counties encompassed by the THBX and their associated historical administrative regions can be categorized into image data and text data. For image data, Optical Character Recognition (OCR) systems analysis the visual patterns of text characters within document images, converting them into machine-readable text. The CnOCR package, a Python-based tool, excels in recognizing and extracting Chinese textual information, thus it is employed to extract text from image data [48]. To facilitate smoother expression in subsequent research, the extracted traditional Chinese text is converted into simplified Chinese. Following this, the SpaCy package, a tool within the Natural Language Processing (NLP) domain, is utilised to extract place names associated with HRS from the entire text corpus, and these are cataloged by county [49]. The text produced through automated processes undergoes a further stage of manual proofreading to ensure its accuracy and reliability. This integrated approach of combining advanced computational tools with meticulous human verification ensures the integrity and usability of the data for further analysis.

For the information and geographical coordinates of modern rural settlements, the publicly accessible POIKit tool was utilized to extract Points of Interest (POI) data from Amap, including place names [50, 51]. Subsequently, using the Pandas and Numpy packages in Python [52, 53], the place names from the HRS dataset were matched with those of modern villages to determine the precision of the matches. Since the information for HRS is aggregated by county, the data for modern rural settlements were also filtered by county during the matching process. Following this, the matching results were graded, and entries classified as Grade A or Grade B were saved as valid data. Finally, the HRS dataset was processed using ArcGIS Pro (Version 3.0.1) to determine the spatial locations of HRS, thus enabling subsequent spatial analysis. This approach not only ensures the accuracy of the data integration but also provides a robust foundation for analyzing the geographical distribution and continuity of historical and contemporary rural settlements.

Ultimately, we identified 15,208 existing HRS within the THBX, each equipped with detailed geographical information, including latitude and longitude (Counties table (with the number of HRS) included in THBX is in Appendix 1). The number of HRS identified significantly exceeds the number of TCV listed in the national catalog for the THBX, which totals 778. Subsequently, using ArcGIS Pro, we created a point dataset for HRS and plotted their distribution based on the standard map approved by the Ministry of Natural Resources of China, with the approval number GS(2019)1822, as shown in Fig. 4.

Fig. 3
figure 3

Technical route for obtaining HRS spatial positioning and the details of the key steps

Fig. 4
figure 4

Spatial dataset of HRS in THBX (compared with TCV)

Analytical methods

Spatial analysis

Traditionally, methods for describing the spatial distribution of rural settlements typically employ administrative regions as controlling variables, calculating the density of rural settlements within each administrative area [54]. However, this approach is unsuitable for characterizing the CHRS, as HRS emerged during a period characterized by central authority’s limited control and reach. The appearance of HRS often depends on specific geographic conditions being favorable, and their development demonstrates marked adaptability and self-organizing characteristics. Therefore, this study opts for kernel density estimation to represent the CHRS [54]. Moreover, to further determine the spatial distribution patterns of HRS, the Average Nearest Neighbor (ANN) method is utilized to assess whether HRS are clustered, randomly distributed, or dispersed, analyzing from both a global and topological perspective [55].

Quantitative analysis of the underlying determinants spatial distribution patterns will focus on the mathematical relationships between various variables. This study requires the extraction of results from kernel density analyses conducted at HRS locations to develop variables that describe spatial distribution. Following the extraction, testing for spatial dependency of the data is essential. Spatial autocorrelation analysis aids in understanding the latent structures within spatial datasets, thereby detecting spatial dependencies. The global Moran’s I index assesses the extent to which similar attribute values tend to cluster together or display spatial dispersion, providing a comprehensive measure of spatial autocorrelation across the entire study area. On the other hand, the local Moran’s I employs a more nuanced approach to identify clusters of high or low attribute values within specific areas, revealing spatial hot-spots, cold spots, and outliers. Concurrently, Local Indicators of Spatial Association (LISA) maps facilitate the visual interpretation of these characteristics [56].

Statistical analysis

HRS exhibit heterogeneity in their spatial distribution determinants and are influenced by geographic location changes [57]. This study employs correlation and regression analyses to discuss the impact of various variables on the spatial distribution patterns of HRS, while also accounting for changes in spatial locations. The Pearson correlation coefficient quantifies the strength and direction of the linear relationship between two continuous variables, thus it is utilized to interpret the numerical relationships among variables and the spatial patterns of HRS distribution [54]. In Geodetector, the factor detector and interaction detector use spatial statistical methods to test the influence weight of variables on the spatial distribution patterns of HRS and to detect the impact of combinations of two variables on this proportion [58].

Regression analysis is better suited for exploring the relationships among multiple variables. The early Ordinary Least Squares (OLS) is a global regression model that assumes constant relationships between variables across the entire study area, which is ineffective at capturing spatial variations [59]. GWR addresses this limitation by allowing relationships to vary locally, but it still uses a single bandwidth to smooth spatial data, which may not fully capture multi-scale processes [60]. Therefore, this study adopts MGWR, allowing the analysis of relationships across multiple spatial scales to more accurately understand the spatial heterogeneity of different variables’ impacts [61]. The software version used for MGWR is 2.2. Additionally, to confirm the applicability of MGWR, the results from OLS and GWR analyses are used as supplementary comparisons.

Selections of variables and data sources

Variable collection

In the context of correlation and regression analyses, the outcomes of kernel density analysis for HRS are employed as the dependent variable, denoted as the concentration of HRS (CHRS). This study further investigates the independent variables that exert an influence on this measure. The spatial distribution of rural settlements is affected by both physical geography and socio-economic factors [62, 63]. Table 1 displays the results of the collected independent variables, including detailed information.

Firstly, terrain has a controlling effect on the selection of rural settlement sites [64]. From a macro perspective, elevation (ELV), slope (SLP), and topographical relief (TR) determine the fundamental conditions for regional agricultural production and transportation [64, 65].

Secondly, agricultural production activities influenced by regional climate and production resources subsequently affect the spatial distribution of rural settlements [66]. Temperature and precipitation are basic indicators of regional climate. Research from Guangjin has shown that annual average temperature (AAT) and total annual precipitation (TP) significantly impact the distribution of rural settlements [67], with heterogeneity across different regions; thus, they are included as independent variables. Relevant data often come from suitable open-access spatial datasets. As for production resources, this study focuses on the influence of water resources and arable land resources [68] on the distribution of rural settlements. Access to water sources is crucial for drinking, irrigation, and other daily uses. Fertile soil enhances agricultural productivity, which is the primary economic activity of rural settlements, ensuring food production and agricultural trade.

Thirdly, the current state of regional development also determines the macro distribution and individual development of rural settlements. Population density (PD), gross domestic product (GDP), and road density (RD) respectively represent the community, economy, and transportation development of a region. Higher population density is usually associated with more developed infrastructure and services. Rural areas with higher GDP usually have better economic opportunities and development potential. As for road density, rural commerce and the flow of people often rely on it [5, 69, 70]. It should be noted that regional development can contribute to the development and retention of rural settlements, which is beneficial for the preservation of HRS. On the other hand, urbanization brought by development can also lead to the disappearance of rural settlements [71]. Furthermore, considering the unique influence of the eight ancient paths in the THBX area, this study also employs the distance to the ancient paths (DAP) to analyze their impact on HRS.

Finally, this study introduced the concentration of large cities (CLC) and the concentration of national cultural heritage units (CCHU) to further explore the potential risks of urbanization and over-development of HRS. Among them, large cities are defined as urban area point data of city-level administrative units. The spatial distribution dataset of national cultural heritage units can be publicly accessed.

Table 1 Description and pre-processing of variables

Data sources

The quantitative data sources are shown in Table 2. Firstly, three composite data sources are introduced. Among them, the ASTER Global Digital Elevation Model is a global digital elevation data product jointly released by the National Aeronautics and Space Administration (NASA) and the Ministry of Economy, Trade, and Industry of Japan (METI), completed based on the observations of NASA’s TERRA satellite. This data source can provide quantitative data for ELV, SLP, and TR of THBX. The Climatic Research Unit Time-Series (CRU TS) is one of the most widely used climate datasets, supported by the NERC Centres for Atmospheric Science, UK. CRU TS provides monthly data covering the land surface from 1901 to 2022 at a resolution of \(0.5^{\circ }\) (approximately 1 km). The quantitative data for AAT and TP of THBX in 2022 used in this study are provided by this dataset. OpenStreetMap (OSM) is a free and editable world map containing numerous data such as roads, waterways, and buildings, created and maintained by nearly 5 million registered users and over 1 million map contributors from around the world using free tools and software. The quantification of variables RD and DSR relies on the open data provided by OSM. Moreover, this study also employed various other data sources that support only individual variables. Specific citation information also can be found in Table 2.

Table 2 Data source description

Data pre-processing

Multicollinearity reduces the overall interpretability of a model; therefore, it is necessary to examine multicollinearity among variables before proceeding with further statistical analyses. The Variance Inflation Factor (VIF) provides a quantified metric that indicates how much a predictor variable’s variance is inflated due to its linear relationship with other predictor variables. This metric assists in removing outliers from the data, thereby enhancing the interpretability and stability of the regression model [79]. A VIF value greater than 10 indicates significant multicollinearity, suggesting that the coefficient estimates may be unreliable. Variables with VIF values between 5 and 10 also warrant caution as they may still pose a mild risk of multicollinearity. Given that multicollinearity is a property of synergy among multiple variables, outlier variables often do not appear in isolation. Thus, potential multicollinearity can be mitigated by selectively removing or transforming these outlier variables and retesting.

The VIF diagnostics within this study revealed that the VIF values for SLP and TR were 14.046 and 14.063, respectively, indicating significant multicollinearity. The VIF values for ELV and AAT were also notable at 8.869 and 9.206, respectively. Indeed, the calculation of TR inherently reflects some aspects of the SLP characteristic of HRS, and variations in ELV also influence AAT. Consequently, TR and AAT were removed from the analysis, and VIF diagnostics were performed again. The results indicated that no multicollinearity issues remained among the remaining variables, thereby clearing the path for further statistical analysis.

Results

Analysis of HRS spatial distribution patterns

Fig. 5
figure 5

Kernel density analysis results of HRS in THBX

Fig. 6
figure 6

a Average Nearest Neighbor analysis for HRS in THBX. b Global Moran’s I analysis for HRS in THBX. c LISA clustering results of HRS in THBX based on Local Moran’s I

Using ArcGIS Pro for kernel density estimation, the CHRS in the THBX was mapped as shown in Fig. 5, allowing further exploration of spatial distribution patterns. The characteristics of HRS distribution at the regional level in THBX are as follows:

HRS are more densely distributed in the southern part of the THBX than in the north. Within the southern range of Fukou Xing—Taihang Xing, two prominent high-density areas emerge, with the highest value reaching 0.44 /\(km^{2}\). There is a significant correlation between the spatial position of the Xing (Ancient Paths) and the high-density CHRS. The density of HRS is higher along the sides of the ancient paths than in the central part, which may be related to the central zone of the Taihang Mountains. The terrain of the Taihang Mountains is too steep, making it difficult to establish villages or protect them from natural disasters. Further analysis reveals that HRS on the western or northern endpoints of each Xing show a punctuated clustering distribution. Although high values are present throughout the THBX, there is a substantial difference between high and low values. On the other hand, the highly clustered areas of HRS on the eastern or southern endpoints exhibit a distinct striped or areal distribution, with a generally higher number and more uniform spread.

The results of ANN analysis, as displayed in Fig. 6a, show an nearest neighbor ratio of 0.649771 and a z-score of \(-\)82.626356, both indicating that the likelihood of a random CHRS is less than 1%, with clustering characteristics evident in the distribution. In THBX, the average observed distance among HRS is 1300.3771 ms, compared to an expected average distance of 2001.2842 ms. This clustering pattern of HRS suggests the importance and potential value of investigating underlying determinants in subsequent studies. However, the study of determinants focuses on the relationships between different variables. Therefore, it requires a statistically significant and quantified set of variables (in this study, the kernel density estimates of HRS in THBX are used as the dependent variable, CHRS). This approach allows for a deeper understanding of the factors influencing HRS distribution and contributes to more targeted and effective spatial planning and development strategies within the region.

In addition to the ANN analysis, spatial autocorrelation analysis was conducted to examine the kernel density values at locations of HRS [80]. From the global spatial analysis, the global Moran’s I for the kernel density attribute of HRS in THBX is recorded at 0.967717 with a z-score of 66.006809, indicating a significant clustering trend in the overall distribution of this attribute value (Fig. 6b). Consequently, the kernel density attribute values of HRS in THBX fulfill the prerequisites for further spatial statistical investigations. The Local Indicators of Spatial Association (LISA) map, generated from the analysis of local spatial autocorrelation results (Fig. 6c), allows for more precise determination of the distribution types of HRS within THBX. Viewed across the entire region, most local Moran’s I calculations for HRS show statistical significance. The predominant distribution patterns for HRS are high/high clustering and low/low clustering. Low/high and high/low clustering patterns are mainly found in transitional areas between the two aforementioned clusters. The findings from the spatial autocorrelation analysis indicate that the kernel density values of HRS exhibit spatial dependency, which validates their use as numerical values for CHRS in subsequent statistical analyses. This demonstrates the significance of considering spatial relationships to better understand and model the distribution patterns of rural settlements effectively.

Applicability analysis of selected variables

Univariate applicability assessment

This study initially explores the suitability of selected variables for explaining the CHRS in the THBX from three perspectives (Fig. 7). Firstly, Pearson correlation analysis is employed to examine the similarity in numerical trends among variables. As depicted in Fig. 7a, under the influence of single factors, the driving effects of different variables vary. Significant correlations are widely present among the variables. Notably, all natural geographic variables show significant correlations with CHRS. However, the correlations between RD, CCHU and CLC with CHRS are not significant. Across the entire Pearson analysis, both higher and lower values of coefficients are found in the analysis of correlations among independent variables. The highest correlation coefficient between CHRS and the GDP is 0.29, and the lowest is \(-\)0.28 of DAP, with other coefficients having absolute values less than 0.25. Among them, ELV, SLP, DSR, and CLC exhibit negative correlations with CHRS, whereas TP, PCY, GDP, and RD show positive correlations. The correlation coefficients between the selected variables and CHRS in this study are too low to fully explain the relationships therein [81]. This suggests the need for further investigation, potentially incorporating additional variables or exploring more complex statistical models to capture the nuances of these relationships better.

Secondly, the factor detection tool of the Geodetector generates q-values, which indicate the proportion of the sample variance that an independent variable can explain (q * 100%). This can be interpreted as a method of expressing a variable’s explanatory power or weight. The factor detection results, as shown in Fig. 7c, are significant for all variables. The variables with the highest influence weights are DAP at 0.34, CLC at 0.28, and CCHU at 0.32. Notably, the influence weight of CCHU is evidently stronger than what was observed in the correlation analysis. However, no variable is able to explain more than 50% of the CHRS.

This result highlights the complexity of factors influencing the CHRS in the THBX. The significant q-values suggest that while these variables have a measurable impact, the CHRS is likely influenced by a combination of multiple factors. The lack of any single variable explaining a majority of the variance indicates the potential need for a multifaceted approach in further research, integrating various socio-economic and environmental factors to gain a more comprehensive understanding of CHRS.

The results of the univariate analyses indicate that the influence of single variables alone is insufficient to fully explain the spatial distribution patterns of HRS. A significant feature of the interaction detector in Geodetector is its ability to detect interactions between two factors, thereby revealing potential synergistic or antagonistic effects that may not be apparent when considering single factors alone (Fig. 7b). The detection results for each combination are characterized by two features: bivariate enhancement (55, 63.64%) and nonlinear enhancement (20, 36.36%). This suggests that the combined explanatory power of any two variables on CHRS is better than that of individual variables. Among the variable combinations exhibiting nonlinear enhancement, the combination of RD and DAP has the best explanatory power for the CHRS in THBX, with a q-value of 0.527. Subsequently, three other combinations-CCHU with GDP (q=0.387), DAP with SLP (q=0.353), and CCHU with SLP (q=0.332)-also explain over 30% of the CHRS. For combinations exhibiting bivariate enhancement, more than 50% of the CHRS can be explained by the combination of CCHU and DAP (q=0.570, the highest value in the interaction detection results) or CLC and DAP (q=0.539). Additionally, there are 19 pairs of variable combinations that have a bivariate enhancement effect (34.55%), explaining 30% to 50% of the CHRS. Although some bivariate combinations still possess weaker explanatory power (q<0.3), the results from the interaction detector in Geodetector still indicate that a multivariate integrated analysis can more effectively explain the CHRS in THBX.

Fig. 7
figure 7

Pearson correlation analysis. b Results of interaction detector by Geodetector. c Results of factor detector by Geodetector

Comparative analysis of multivariate regression results

In this section, we conducted spatial heterogeneity analysis using three models: OLS, GWR, and MGWR, and compared their fitting results with data based on four indicators: Optimization criterion (corrected Akaike information criterion, AICc), determination coefficients (\(R^{2}\)), adjusted determination coefficients (\(Adj. R^{2}\)), and residual sum of squares (RSS). In order to facilitate interpretation and comparison of the local parameter outputs of the models, we standardized the data for each variable [82]. This method also effectively guards against potential multicollinearity risks. The results are shown in Table 3. Lower values of RSS and AICc indicate better model performance, whereas \(R^{2}\) and \(Adj. R^{2}\) exhibit the opposite behavior. In addition, Table 3 also displays the bandwidth values (Bw) of the independent variables in the model analysis results, representing the usage of distance metrics for each independent variable. In the three multivariate regression models, the OLS model exhibits the highest AICc value (245933.163), with the lowest \(R^{2}\) (0.238) and \(Adj. R^{2}\) (0.237), and a slightly higher RSS value (5542.798) compared to GWR. This suggests that the OLS model provides the least satisfactory explanation for the CHRS in THBX due to the presence of spatial nonstationarity in the relationship. Subsequently, the GWR is utilised. Its AICc value (32248.356) demonstrates a notable decrease, with \(R^{2}\) (0.342) and \(Adj. R^{2}\) (0.324) slightly surpassing those of the OLS model. However, the GWR’s RSS value (5616.460) is the highest, likely attributable to considerable variations in the spatial scales of different variables, indicating the need for multiscale analysis. MGWR demonstrates the best performance, with the lowest AICc value (29734.703) and RSS value (4865.900), while achieving the highest \(R^{2}\) (0.634) and \(Adj. R^{2}\) (0.622). These findings suggest that the MGWR outperforms both OLS and GWR in explaining the CHRS in THBX, providing a more effective insight into the influencing patterns.

In terms of bandwidth selection, the GWR adopts a single bandwidth (406, 2.67%), whereas the MGWR exhibits a significant disparity in bandwidth selection for different independent variables. It ranges from 90 (0.59%) to 15207 (99.99%), showing a large variation, which indicates that a single bandwidth cannot accurately reveal the scale effects of various factors. Different influencing factors have different ranges of effects on the CHRS in THBX. The bandwidths of GDP (Bw=90), RD (Bw=202), DAP (Bw=307), ELV (Bw=484), PCY (Bw=853) and DSR (Bw=1243) account for 0.42% to 8.17% of the total sample size, indicating their effects operate at a micro-scale, locally. Then, the bandwidths of SLP (Bw=4344) and TP (Bw=6129) account for 28.56% to 40.30% of the total sample size, indicating that their effects exhibit limited spatial non-stationarity. Lastly, the bandwidths of CCHU (Bw=13224), RD (Bw=15207), and CLC (Bw=15207) account for 86.95% to 99.99% of the total sample size, indicating their effects are at a global scale, with little spatial heterogeneity, which indicates that their effects are all at the global scale, with minimal spatial heterogeneity. Additionally, using the same kernel function and bandwidth parameters may exacerbate collinearity among variables, sometimes called concurvity. By permitting bandwidth variation, variables can undergo diverse transformations, thereby circumventing the induction of multicollinearity/concurvity [82]. Hence, during model computations, we conducted thorough local multicollinearity checks using the local condition numbers of GWR and MGWR, without encountering any issues. In summary, we will further interpret the results of MGWR.

Table 3 Model comparison of fitting results between OLS, GWR and MGWR

Overall interpretation of MGWR results

Table 4 summaries the local parameter estimates of MGWR, including “Bandwidth (%)”, “Sig. HRS (%)” (the proportion of effective coefficients of HRS to the total number of HRS when P \(\le\) 0.05), “+” (the proportion of significant positive coefficients of Sig. HRS), and “-” (the proportion of significant negative coefficients of Sig. HRS). The results indicate that the coefficients of PD and CLC, with the bandwidth accounting for 99.99% of the total sample size, are completely insignificant, which also represents a type of global consistency. The remaining independent variables exhibit significance for over 60% of HRS, which holds statistical significance in terms of interpretation. Initially, the entirely significant independent variables include SLP, TP, and PCY, displaying a globally consistent effect feature. The coefficients of TP and PCY are all positive, while that of SLP is negative. Additionally, although the effect of CCHU with bandwidth proportion close to the global scale only explains part of the HRS (66.79%), all significant coefficients exhibit negative impact effects. Subsequently, the significant coefficients of the remaining influencing factors exhibit different effect patterns, with some impacting Sig. HRS positively while others negatively. It is worth noting that, although the effect patterns exhibited by significant coefficients vary, they still show clear tendencies. Whether positive or negative, there will definitely be one type of feature accounting for over 75% of the total.

Table 4 Parameter estimation using MGWR

Figure 8 depicts the maximum, minimum, and average values of the local parameters of effective HRS samples under the interpretation of different variables. Obviously, the fluctuations in the minimum and maximum values of each variable are larger than the mean value. The local parameters of CCHU, SLP, and TP hardly change and their values are close to 0. The range of variation of the local parameters of ELV, PCY, and DSR is larger, but the mean values are still around 0. It is worth noting that the local parameters of RD, GDP, and DAP have the largest variation and produce the highest and lowest values in the variation. The overall interpretation of the MGWR results indicates that the effects of different variables on CHRS are nonlinear, which implies spatial heterogeneity in the distribution of local parameters. Although we can preliminarily characterize the effects of different variables, further research on the spatial distribution of their local parameters is still needed.

Fig. 8
figure 8

MGWR local parameters of each variable

Analysis of spatial heterogeneity in the effects of variables

In the MGWR output, the local parameter of Intercept represents the influence of different geographical locations on the dependent variable when other independent variables are fixed. To some extent, the MGWR intercept can capture factors uniquely determined by geographical location, which influence the CHRS in THBX but are not included in the model. Notably, in this study, the intercept has the smallest bandwidth (Bw=65, 0.42%), indicating a micro-scale variable. Moreover, it shows global significance (Sig. HRS=15208, 100%), highlighting the high sensitivity of THBX’s HRS to geographical changes. In Fig. 9, the global distribution of the intercept’s local parameters is displayed, as well as the global distribution of standardized residuals. Following Sullivan, we should focus on positions where the standardized residuals exceed three times the standard deviation, as these locations may contain outliers [83]. The standardized residuals in THBX have a global distribution of no more than 0.5 standard deviations, suggesting that there are no prominent outliers or outlying points in the MGWR results. Thus, the model offers a good fit to the data, and the explanatory variables employed effectively account for the spatial variation of the dependent variable.

Fig. 9
figure 9

Spatial distribution of standardized (STD) residuals and local parameters (Beta) of intercept in MGWR results

Figure 10 presents the distribution of significant HRS samples and the spatial patterns of local parameters in the THBX across various explanatory factors. It is evident that the use of variable bandwidth offers a more comprehensive and precise view in spatial data exploration. Additionally, the standardization of data enables more meaningful comparisons of specific values of local parameters across different variable factors, aiding in a more intuitive summarisation of the determinants and magnitudes of these factors.

Figure 10a displays the significant samples and spatial distribution of local parameters for ELV. The local parameter range for ELV is between [\(-\)0.231,0.353], suggesting a predominant negative effect on the CHRS in THBX with a minor positive influence as well. In terms of spatial distribution, the positive impacts are predominantly found in the northernmost part of THBX, specifically around Jundu Xing and its vicinity, primarily located in Shanxi Province. Furthermore, certain HRS in the low-elevation areas on the southeastern side of THBX also demonstrate a positive effect from ELV. ELV demonstrates detrimental impacts on two strip-shaped regions: the Taihang Xing - Jing Xing region and the Feihu Xing - Puyin Xing region. It is noteworthy that the local parameters of the samples at the outer edges of the strip exhibit slightly higher than those in the interior. The areas of positive influence may be associated with the Ming Great Wall. The THBX area is closely linked to the Zhenbao Town defense zone. Many of the fortifications in Zhenbao Town were built along the Taihang Mountains, while the passes were constructed over significant paths, such as Juyong Pass, Zijing Pass, Daoma Pass, Gu Pass, Dongyang Pass, Tianjing Pass, and Hengling Pass [84]. The Military Capital Ridge in THBX falls within the Juyong Pass defense area of the Ming Great Wall, a region acting as the closest pass city to the capital, Beijing. As a result, a large number of military settlements were established around this area, typically located at higher altitudes on sites that are easy to defend but difficult to attack [85].

The spatial pattern of significant samples and local parameters for SLP is shown in Fig. 10b. The data summary indicates a significant negative influence of SLP on all HRS in THBX at a global level. The range of local parameter values is [\(-\)0.151, \(-\)0.048], with less variability than ELV. This indicates that in mountainous terrain areas, the clustering trend of HRS in THBX is inhibited with the increase of SLP. The southeastern part of THBX experiences the most pronounced negative impact, while the low-impact zone is located in the eastern part of the central region. With a bandwidth of 4344 for SLP, its influence scale is closer to medium-range, indicating weak spatial heterogeneity and a smoother transition of global local parameters. Due to the more complex terrain and greater variation in slope, the southeastern part of THBX exhibits the most significant negative impact. Conversely, the eastern central region, which is the alluvial fan plain of the Hutuo River and close to Shijiazhuang city in Hebei province, has a relatively gentle slope, resulting in a lesser impact on CHRS in this area.

The results in Fig. 10c indicate that TP can also explain the distribution characteristics of all HRS in the THBX. However, the local parameter variation range for TP is [0.025,0.067], suggesting that its impact on the CHRS is consistently positive, but the specific parameter values are smaller and the variations are very slight. The area with the most pronounced impact is the southeastern part of THBX, while the least pronounced area is on the western side of the Fukou Xing—Bai Xing region. Notably, this area also has the highest density of CHRS. With TP’s bandwidth at 6129, which falls within a medium-scale effect range, its spatial heterogeneity is comparably weak, leading to a smoother transition of local parameters across different regions. The southeastern part of THBX, close to Zhengzhou City in Henan province, is influenced by the Yellow River alluvial fan plain [86]. Henan has historically been one of China’s most populous provinces, hence the relatively high number of settlements in the area. Additionally, the higher ELV and complex terrain in the southeast block much of the summer monsoon, resulting in higher TP than nearby areas, which has the greatest impact on CHRS in this area. Conversely, the area west of Fukou Xing to Bai Xing within THBX is the area with the densest concentration of settlements, influenced by the mountains with lower TP, making it the least affected area by TP.

Figure 10d, e display the spatial patterns of significant samples and local parameters for DAP and DSR, elucidating the relationship between the CHRS in THBX and the main linear elements. In the case of DAP, it is evident that the majority of Sig.HRS distributions are negatively correlated with DAP. The local parameter range for DAP is [\(-\)2.302,0.840], which stands as the most extensive and variable among all independent variable local parameters, emphasizing its significant impact. This suggests that being far from the ancient path is unfavorable for the CHRS. Interestingly, the CHRS at the intersections of the three ancient paths in the south are positively impacted by DAP. This suggests that overly accessible geographic conditions might pose potential risks to the HRS in this location. The eight ancient paths not only played a crucial role as transportation routes in ancient times, such as during the Qin Dynasty when they served as vital routes for imperial travel, but also as main postal roads during the Ming and Qing dynasties. Many of these ancient paths still form part of the routes of modern highways today [84].

For DSR, its local parameter variation range is [\(-\)0.121, 0.400], with both the numerical value and the degree of change being smaller than DAP. However, similar to DAP, the distribution of the majority of Sig.HRS is negatively correlated with DSR. This suggests that surface runoff majorly shows positive influence on the aggregation of HRS. Spatially, the HRS in the Jing Xing—Taihang Xing region is segmented into three clusters with negative local parameters, having lower values internally and relatively higher values at the periphery. However, in the northwestern part of the Jundu Xing— Puyin Xing area, the distribution of some HRS shows a positive correlation with DSR, indicating that surface runoff has caused damage to their aggregation and development. Moreover, the bandwidths for DAP and DSR are 307 and 1243, respectively, with DAP exhibiting more pronounced spatial heterogeneity compared to DSR.

As shown in Fig. 10f, PCY has a significant positive effect on all CHRS in THBX. The local parameter values of PCY vary between [0.112\(-\)0.460], indicating a moderate impact intensity. The HRS in the southeast and northwest parts of THBX are most affected by PCY, while the local parameters around Jing Xing are relatively lower. Although the bandwidth of PCY is 853, falling into the semi-microscopic scale, it still shows a significant global trend in THBX, with higher values in the north and south and lower in the central part. The southeastern and northwestern areas, which are the most positively impacted areas mentioned above, comprise the Yellow River alluvial fan plain and the Datong Basin, respectively [86]. Both areas feature geological conditions conducive to agriculture, leading to a concentration of agriculturally oriented rural settlements.

The results in Fig. 10g display the significant samples and spatial patterns of the local parameters for GDP. The local parameters of GDP vary between \(-\)0.599 and 1.187, showing a large variation range, with mostly positive values. This indicates that GDP mainly has a positive impact on the CHRS in THBX. A minority of HRS negatively impacted by GDP are predominantly located near the eastern endpoint of the Puyin Xing, where GDP growth might deter CHRS. Given a bandwidth of 90, GDP operates on an extremely micro scale, just slightly larger than the intercept. Hence, the high-value regions of GDP’s local parameters in THBX display a scattered distribution, with pronounced variations in influence across different areas. Additionally, influenced by the river systems originating from the Taihang Mountains, numerous alluvial fans have formed along the eastern side of the mountains, bringing fertile soil to the North China Plain. Historically, these lands have been densely populated, forming a north-to-south urban belt along the Taihang Mountain range with cities like Beijing, Baoding, Shijiazhuang, Xingtai, Handan, Anyang, Hebi, and Xinxiang located on these alluvial fans. Consequently, the GDP within the THBX area is generally higher on the eastern side than on the western side.

Fig. 10
figure 10

The spatial distributions of local coefficients for influencing factors of HRS distribution in THBX

In Fig. 10h, regarding the driving characteristics explained by RD, we found that its local parameter variation range is [\(-\)0.408,0.841], mostly positive, indicating that it mainly exerts a positive influence on the aggregation of Sig.HRS. Among them, a small part of the HRS affected negatively by RD mainly gathers in the middle of the Feihu Xing - Puyin Xing region and the west side of the Jing Xing - Fukou Xing region. Due to the bandwidth being 202, RD is a driving factor at the microscale. Consequently, there is considerable variability in the influence among different Sig.HRS aggregation areas. Lastly, in Fig. 10i, the local parameter variation range for CCHU is [\(-\)0.0301,\(-\)0.0172], indicating that the CHRS in THBX is suppressed by the enhancement of CCHU. Compared to the determinants mentioned earlier, the negative impact generated by CCHU is extremely weak. Additionally, CCHU is significant only for the HRS in the southern part of THBX, with the most negative impact occurring at Jing Xing—Fukou Xing. t is worth noting that the analysis related to CCHU is a discussion focused on the early development determinants of HRS. It serves to elucidate the CHRS rather than probing into their future developmental trends.

Discussion

The primary objective of this study is to catalog the HRS that persist today in the Taihang Baxing, establish a GIS-based geospatial data set for these rural settlements, and analysis their spatial distribution characteristics. Additionally, the study examines the factors influencing the distribution of these HRS and the underlying determinants. The research involved the compilation of HRS recorded in the archival materials of the THBX, followed by a comparison with the names of modern villages. Based on various levels of confidence, a total of 15,208 HRS have been identified and preserved to date. The study first employed kernel density analysis to explore the clustering characteristics of these HRS. Subsequently, using the MGWR, it analysed the spatial heterogeneity of the distribution patterns and influencing factors of these historical settlements.

Insights

One of the key focuses of this research is the compilation and analysis of historical documents, and the continuity characteristics of Chinese place names. This led to the development of a novel process for identifying and organizing historical settlements. The research reveals that the CHRS in the THBX generally exhibits a pattern of greater density in the south and sparser distribution in the north, aligning along a north–south axis following the Taihang Mountains. Two primary clustering cores of HRS have formed near the Fukou Xing, Bai Xing, and Taihang Xing, mainly located at the intersections of these three Xings and the eastern end of the Fukou Xing. A secondary core has developed in the central areas of the Feihu Xing and Jundu Xing. Previous studies, such as those by Hui on TCV in the Taihang Mountains, analyzed the distribution characteristics of traditional villages [26]. However, since TCV represent only a subset of HRS, the clustering characteristics identified in those studies show notable differences from the findings of this research. A more comprehensive dataset typically yields clearer and more accurate statistical results regarding settlement distribution. The study continues to explore the underlying driving factors of HRS distribution through deeper analysis using the MGWR.

The analysis using the MGWR indicates that natural factors have consistently been one of the critical influences on the spatial distribution of villages. The ELV demonstrates an alternating pattern of significant and non-significant influences on the spatial CHRS, predominantly exerting a negative impact with clear spatial heterogeneity. The SLP has a universally significant negative impact on the CHRS. Conversely, the TP exhibits a positive influence on the CHRS, although the changes are very subtle. In the initial stages of village formation, natural geographical factors often play a crucial restrictive role in site selection, with environments conducive to survival and labor typically hosting numerous HRS. This finding aligns with the conclusions of Lai, who focused on TCV as his study subjects [37]. Although traditional villages are a subset of HRS, they exhibit similar characteristics and conclusions in some aspects of the research.

The analysis reveals that the DAP has a significant negative correlation with the CHRS, exhibiting the most pronounced effect. The DSR impacts HRS in a manner similar to DAP, but its influence is less significant. These results underscore a clear dependency of HRS clustering on both Xing and water systems, with a stronger reliance on the former. This suggests that transportation conditions play a more critical role in settlement site selection than water systems. Similar findings are echoed in the research by Zhang Xinrui, who also noted the correlation between settlement distribution and proximity to paths and water systems in his study of HRS around the Feihu Xing [27]. Feng Wenlan’s study on the distribution patterns of rural settlements in the upper reaches of the Min River further supports the greater dependency of settlements on transportation networks compared to water systems [87]. This research in the THBX reaffirms these conclusions and further identifies that DAP provides a stronger explanatory power for village clustering than water systems, which exhibit greater spatial heterogeneity.

The analysis shows that the PCY significantly and positively affects the clustering of HRS on a global scale. GDP also exhibits a positive impact in most regions; however, in high-value areas, its influence is dispersed and occurs at a microscale. RD also impacts at a microscale and displays a dispersed influence globally. The effect of RD on the clustering of HRS varies significantly across different regions. In the rapid process of urbanization, the preservation of HRS depends on the contemporary socio-economic and demographic developments. However, the impact of these factors varies by region, necessitating targeted protection and development planning tailored to specific local conditions.

The CCHU have an extremely slight negative impact on HRS clustering, with significant effects observable only in the southern part of the THBX. This contrasts with Hongwei’s findings, where CCHU was positively correlated with the concentration of TCV [25]. The difference might be due to TCV selections being influenced by local cultural preservation units, which are more numerous and place greater emphasis on cultural heritage in certain areas, resulting in more approved traditional villages. However, in a more extensive and comprehensive HRS dataset, the influence of cultural preservation units on village clustering appears to be minimal [25].

The study on the variable factors influencing the CHRS in the THBX reveals the drivers behind the settlement patterns, which reflect the area’s natural environmental conditions and regional socio-economic development levels [30, 87]. In this study, the subject is the HRS within the THBX area that still exist today, with modern external environmental factors as influences. The objective is to analyze the current developmental state to summarize the external environmental characteristics that affect the preservation of rural heritage. The results of this study can provide methodological and theoretical references for the excavation of rural heritage and the evolution of regional rural patterns. Small towns are a significant driving force behind urbanization and rural-focused development in China [88]. Therefore, analyzing the distribution characteristics and variable factors of HRS can provide theoretical guidance for the comprehensive development of China’s contemporary rural settlement system in terms of structure and function.

Limitations and prospects

It must be acknowledged that this study has certain limitations. First, there are issues related to the sources of HRS data. China, with its extensive historical background, has rural settlements that have developed over thousands of years. However, when attempting to organize and excavate historical data, we encounter a lack of uniform standards in the documentation process. For instance, some historical county annals provide detailed records of village names, while others offer only brief mentions. Consequently, this study could not compile an entirely comprehensive dataset of HRS. Secondly, the process of matching HRS with modern place names presents challenges. Despite narrowing the administrative boundaries from provinces to counties, some rural settlements still share names or have undergone name changes. In this study, such rural settlements have been marked with different confidence levels, and only those with a higher degree of match were included in the statistical analysis. Revealing and restoring HRS is a labor-intensive and lengthy task. It is hoped that future research can continuously refine the dataset created in this study to bring the data on HRS closer to their true state. By comparing historical settlements with contemporary villages, or the distribution of villages across different periods, it is possible to summarize the patterns of village formation and development, as well as to better predict the trajectory of rural evolution and development.

In Huang and Oliveira’s research, the correlation between clan culture and the internal spatial form with the development of HRS has been emphasized, though it still remains on an individual scale [89, 90]. Therefore, conducting extensive research and quantification of the spatial form of HRS and traditional culture, and introducing more detailed variables, will help enhance the completeness of the regression model in this study. A more comprehensive regression analysis will support the development of a machine learning-based, multivariate model for predicting the spatial distribution of HRS. This model can be cross-validated with the HRS spatial information acquisition method proposed in this study, advancing the exploration and protection of rural heritage.

Conclusion

As China’s recognition of the value of rural heritage increases, research related to HRS is increasingly encouraged. China, with one of the longest histories of agriculture, provides a favorable environment for the preservation and development of HRS. In this study, we first established an objective methodology for acquiring spatial information on HRS. This involved extracting information about HRS from historical records and matching it with modern rural settlement POI to obtain a spatial distribution dataset for HRS. Subsequently, using this dataset, we analyzed the spatial distribution patterns of HRS in the THBX and employed the MGWR to quantitatively assess the impact of natural geographic and socio-economic factors on the CHRS.

This study has validated the correlation between various factors and the formation of spatial distribution patterns of HRS. Our results from the THBX indicate that the CHRS is the outcome of complex interactions among multiple factors. Additionally, the influence of these factors on the spatial distribution patterns of HRS is non-linear and significantly affected by geographical location changes. The methodology for obtaining HRS data proposed in this study will robustly support heritage exploration efforts in various regions. Furthermore, HRS represent a living heritage, which requires not only preservation but also development. The impact of different factors on the spatial distribution patterns of HRS may indicate potential advantages and disadvantages for their future development. This study also validates the adaptability of the chosen factors in understanding and analyzing HRS.

The findings of this study provide methodological support for the discovery and preservation of rural heritage, as well as insights for optimizing rural spatial patterns and predicting the evolutionary paths of rural areas. Future work should include expanding the scope of the study and examining a broader range of factors to gain a more comprehensive understanding of the logic behind the preservation and CHRS. Further research could focus on cluster analysis of heritages within different regions that have already been studied, which would produce more regionally specific and detailed research outcomes. For the THBX, the next steps could involve verifying the accuracy of the existing HRS dataset and conducting further studies on non-significant or unique areas identified in the spatial regression model analysis. This approach will enhance the precision and relevance of the research, contributing significantly to regional planning and heritage conservation efforts.