Introduction

As urban development and modernization accelerate, cultural heritage, as a sensitive component of the urban fabric, is at risk of suffering irreparable losses1. As early as 1931, with the publication of the first global document on heritage conservation, the Athens Charter for the Restoration of Historic Monuments2, the importance of respecting the unique characteristics and external environment of historic buildings in their urban context was emphasized. In the decades following World War II, cultural property was elevated to a new level of global significance3, where heritage and its surrounding environment were considered inseparable4,5. Heritage conservationists and policymakers gradually recognized the importance of developing effective tools and implementing strict protective measures, especially in urban areas where land use conflicts are increasingly intense6,7,8.

As a key measure outlined in the Operational Guidelines for the Implementation of the World Heritage Convention, the buffer zone study has been receiving much attention in the field of international heritage conservation9. Buffer zone is a crucial tool employed by the World Heritage Committee (WHC) to support the protection of a property’s Outstanding Universal Value (OUV) and attributes, and has been promoted globally to safeguard various forms of heritage and their inherent values10. However, in practice, traditional methods for delineating buffer zones have revealed numerous issues, leading to a series of subsequent challenges. While existing research has explored buffer zones from perspectives such as historical mapping11, cognitive mapping12, and visual integrity13,14—demonstrating the usability of these approaches in specific contexts—current methods lack comprehensive evaluation and merely impose rigid boundaries around architectural heritage, which presents several flaws. On one hand, many low-value areas, which have not been adequately assessed, are included within these zones, leading to overly stringent protective restrictions that may hinder regional development and result in unnecessary allocation of human and material resources15. On the other hand, arbitrary boundary delineations without proper justification can create real-world dilemmas characterized by “unclear boundaries and binary opposition,” fundamentally disconnecting heritage from its adjacent environment16,17.

In recent years, scholars have increasingly recognized the importance of value identification and assessment, leading to value-oriented conservation practices18,19. Guided by international conventions issued by UNESCO and ICOMOS, the value of urban heritage has expanded from a relatively singular focus on aesthetic value to a system that integrates aesthetic, historic, scientific, and cultural dimensions, with a trend toward even greater diversification10. Furthermore, quantitative indicators are now being increasingly incorporated into evaluation frameworks alongside qualitative methods, enhancing scientific rigor20,21,22. This integrated framework, which combines the diverse values of heritage with various forms of attributes, has been used in some pilot areas as a standard to guide conservation and limit development23,24. However, the selection of these evaluation indicators largely depends on the expertise of relevant professionals, raising concerns about the subjectivity of these methods among the public. Additionally, the high costs associated with traditional evaluation methods, both in terms of personnel and time, remain a significant constraint.

In an era of the internet and exponential growth in text data, Natural Language Processing (NLP) has been widely applied across various fields due to its remarkable capabilities in extracting themes and quantitatively analyzing textual content25,26,27. For instance, in urban planning, lexical topic models and sentiment analysis techniques can help extract key information from planning documents while evaluating the text from multiple dimensions, thereby reducing the burden of manual reading28,29. The demand for NLP technology in planning and broader social sciences is expected to be high and continues to grow, presenting significant opportunities for the field of heritage conservation as well30. Heritage conservation documents typically adhere to strict writing standards and focus on specific dimensions, meaning that the rich available data can yield more insightful interpretations31. For example, texts related to World Heritage nominations and comprehensive urban planning often contain standardized descriptions of heritage values, characterized by two main features: first, textual inheritance, where new documents are based on previous ones, maintaining consistent evaluation criteria; second, terminological standardization, using terms like “must” and “should” to precisely define content32,33,34. Therefore, capturing and describing content from these texts can lead to a better understanding of heritage value.

Given the dual pressures of irreversible damage to heritage values in urban areas and the lack of a foundation for local protection planning, it is necessary to reassess the original intention of heritage protection and leverage the role of buffer zones in preserving the authenticity and integrity of heritage and its surrounding environment. Therefore, the primary aim of this study is to utilize NLP-based methods for heritage value identification and attributes evaluation, assisting in the delineation of buffer zones to balance heritage conservation with urban development, ultimately achieving the sustainable development goals of heritage cities. Take Hengdaohezi Town as an example, we explore three specific questions: 1) What heritage values are highlighted in the protection and practice texts of both the WHC and local authorities? 2) How can NLP technology assist in the evaluation of heritage values? 3) Based on the evaluation results, can we delineate the boundaries of buffer zones? If so, how do these boundaries differ from previous protection boundaries? This study focuses on the deconstruction and evaluation of heritage values, aiming to bridge the gap between actions and goals in sustainable heritage management and providing critical insights for strategically protecting historical heritage in the future.

Methods

Materials

The dataset under examination exclusively comprises texts related to urban heritage because the conflicts between conservation and development are most prominent here, and the heritage values are relatively diverse and representative. The NLP dataset includes 24 international conventions, 624 conservation documents from World Heritage Cities, and 312 conservation plans from China. The conservation documents consist of 148 nomination files, 275 state of conservation reports, and 201 periodic reporting documents from 303 sites. The Chinese conservation plans are from historic and cultural towns, as the size and value of these town-level administrative areas are similar to the related concepts of historic sites in the WHC context. Each of these conservation texts corresponds to an urban heritage site, with their geographical distribution shown in Supplementary Fig. 1. To ensure suitability for analysis, the texts underwent a series of processes such as optical character recognition, translation, and calibration, ultimately converting them into plain text format. These texts encompass over 70 million characters and are unstructured, containing a wealth of normative descriptions related to heritage values, thus constituting a valuable corpus for NLP.

The case study for buffer zone delimitation is set in Hengdaohezi, a historic and cultural town in Northeast China (Fig. 1). The name ‘Hengdaohezi’ means ‘a river flowing horizontally’ in Chinese, referring to the river winding through the mountains, which created a rare mountain plain that gave rise to the town and shaped its development and history. Hengdaohezi Town emerged with the construction of the Chinese Eastern Railway (CER) by the Russians and experienced a rich tapestry of different regimes and cultures during wartime. Despite over a century of changes, the town still retains the authentic layout and traditional style of a Russian town, preserving over 200 historical buildings of significant conservation value. In 2018, Hengdaohezi Town received the UNESCO Asia-Pacific Cultural Heritage Conservation Award in recognition of its efforts in preserving and inheriting historical cultural heritage. Given its international influence, unique historical background, diverse cultural intersections, and prevalent conservation challenges, we selected this town as the focus of our study.

Fig. 1
figure 1

Overview map of Hengdaohezi Town: The town still retains the layout established in 1903.

The research involves the integration of GIS data for the protection of buildings, open spaces, landscape elements, and more, including cultural relics protection units, historical buildings, historical streets, significant nodes, gathering squares, ancient and famous trees, and ancient bridges. The layout of the town is determined through a combination of UAV photography, the OpenStreetMap (OSM) platform (https://www.openstreetmap.org/), and on-site surveying data. National and provincial/municipal cultural relics protection units are sourced from lists published by national and local cultural relics bureaus, with locations obtained via the Amap open platform API (https://lbs.amap.com/). Historical streets are manually optimized based on OSM data to include historically significant streets that might have been overlooked due to lower road classification. Additionally, significant nodes, gathering squares, old and famous trees, and ancient bridges are mapped based on local literature and field surveys. All data undergoes processes such as scraping, filtering, optimization, and projection transformation, and is used to create the spatial information database for this research based on the ArcMap 10.8 software platform.

The methods follow the logic of value quantification, attributes evaluation, and spatial mapping. Text datasets and NLP techniques were employed to identify heritage value themes, select evaluation factors, and determine factors weights. Supplementary Fig. 2 illustrates these three main processes. GIS data provided the source for attributes and acted as the primary means of aggregating heritage value within urban spaces. The evaluation maps, synthesized from the aforementioned steps, serve as essential data support and spatial basis for delineating buffer zone boundaries.

Identifying the heritage value themes

This study employs a method based on the lexical topic models, specifically the Latent Dirichlet Allocation (LDA), for clustering and identifying heritage value themes35. This approach uncovers latent themes within texts, providing more precise and descriptive labels compared to classical clustering algorithms like K-means and DBSCAN. It is well-suited for the unstructured texts this study deals with. The steps are as follows: Firstly, the texts are transformed into numerical form through tokenization, stop-word removal, stemming, and TF-IDF, making them amenable to machine learning algorithms. Then, the LDA method is employed to perform topic modeling, representing the input data as a mixture of multiple topics, each containing a set of vocabulary words. Next, a clustering algorithm based on N-grams groups texts with similar topic weights, effectively clustering them together. Finally, the clustering results are evaluated, and parameter tuning is conducted as necessary. This process is iterated until satisfactory results are achieved, as presented in Supplementary Fig. 2a.

Selecting the quantitative factors of value themes

Generally, if two entities in a text are correlated, they are more likely to appear in the same sentence or paragraph, especially as the volume of text data increases. Therefore, segments containing concepts related to “buffer zone”, “setting” and “surrounding” are retrieved for selecting value quantification factors. In NLP, a co-occurrence matrix represents relationships between words that tend to appear together in texts, aiding in text analysis and semantic representation36. This process involves three main stages: data preprocessing, statistical window co-occurrence frequency analysis, and construction of the co-occurrence matrix. Initially, raw text is preprocessed into sequences of words. Then, a fixed-sized window slides through the text, capturing word combinations. The frequency of these combinations is calculated and represented in a two-dimensional matrix, where each element (i, j) indicates the number of times word i and word j co-occur in the text (as shown in Supplementary Fig. 2b). The co-occurring words are sorted by frequency, the bottom 20% are removed, and words with ambiguous semantics are excluded. This refined list serves as the factor repository for our research.

Determining the weight of quantitative factors

The principle of emotional polarity diffusion states that when a sentence or paragraph contains emotional words, the associated emotion spreads throughout the text, influencing other words or expressions37. This process accumulates and permeates emotions within the text, impacting the overall emotional inclination38. Similarly, the texts in this study follow this principle, exhibiting the phenomenon of diffusion of emotional polarity, which can be used to assess the extent to which utterances affirm or negate the associated factors. Based on the HowNet and SenticNet emotion lexicons, modal verbs expressing varying levels of emphasis or negation were used to create lexicons for degree adverbs and negatives. HowNet is specifically designed for Chinese-language texts, making it highly effective for analyzing emotional polarity in a manner that is sensitive to the nuances of Chinese expression39,40. SenticNet, on the other hand, is particularly suited for concept-level sentiment analysis, allowing for a deeper understanding of the emotional weight carried by the heritage-related expressions in our data41,42. To further refine sentiment classification, a weighted sentiment analysis strategy was incorporated, wherein each word in the sentiment lexicon is assigned a weight or score that reflects its emotional tendency. During the sentiment analysis process, we calculate the frequency of each emotional word within the text. Words with higher frequencies have a greater influence on the overall sentiment. The combination of frequency and weight helps to determine the overall emotional tilt of the text. In addition, our approach used Naive Bayes classification to calculate sentiment scores, which provide a probability distribution reflecting the likelihood that the text belongs to different sentiment categories43.

In Supplementary Fig. 2c, the emotional intensity of words, degree adverbs, and negation words were overlaid, considering semantic connections, a foundational algorithm was formulated to compute quantified scores, expressed by the following formula:

$${S}_{i}=\frac{{\sum }_{j=1}^{{n}_{i}}{{C}_{i}^{j}* {neg}}_{i}^{j}* {P}_{i}^{j}}{{n}_{i}}$$
(1)

In the equation, \({n}_{i}\) signifies the number of sub-sentences encompassed within the evaluation factor layer; \({C}_{i}^{j}\) represents the confidence assessment, which is derived from the output of the sentiment analysis process (\({C}_{i}^{j}\,\)= 1 if the confidence level is ≥ 95%, otherwise \({C}_{i}^{j}\) = 040,44); \({{neg}}_{i}^{j}\) designates the emotional polarity of each sub-sentence (-1 indicating negative, 1 indicating positive); and \({P}_{i}^{j}\) reflects the probability of the corresponding emotional inclination for the current sub-sentence.

Evaluating attributes and mapping buffer zone

Based on the universal attributes list provided by international conventions, we have established a comprehensive framework for urban heritage attributes, building upon the collinearity matrix retrieval results30,45. Subsequently, considering the identified value themes and the specific characteristics of the study area, more than 150 properties under 3 categories of attributes were taken into account. To ensure objectivity and minimize the subjective influence of evaluators, particularly the bias in mid-level scores, we employed a three-point scoring system to assess various aspects of value carriers. The scoring system is divided into three levels: 0, 1, and 2. A score of 0 indicates that the value theme is almost irrelevant, 1 indicates partial relevance, and 2 indicates perfect alignment with the value theme. Furthermore, we need to understand the overall performance of each value theme for the case study area. Three groups of experts, each consisting of six members, were selected: heritage conservation experts, local government officials, and residents familiar with local history. Through interviews and focused meetings, the comprehensive score for each sub-theme was determined based on the value themes selected in section 2.2.2 and the evaluation process outlined in section 2.2.3. The evaluation ranged from completely inconsistent to highly consistent, divided into five levels. The final results allow us to quantify the performance of each attribute in different categories.

According to the First Law of Geography, all spatial values are interconnected, with closer values having stronger connections. Therefore, attributes can be considered as interconnected points in geographical space, with their material and immaterial forms constituting the value connotation, i.e., the relationships between points. By logging the scores of each attribute category onto a GIS platform and using interpolation methods, we can create a value evaluation mapping. This attempts to integrate value evaluation and spatial demarcation, providing detailed data support for delineating buffer zones. Additionally, it is necessary to assess the marginal diminishing effect between conservation and development46. Buffer zones must neither be too large, restricting development, nor too small, risking heritage damage. Hence, determining the value thresholds for buffer zone delineation is essential and urgent. To this end, we will select the most appropriate classification method to determine the breakpoints based on existing experiences and methods, such as natural breaks and K-means clustering. Finally, considering the actual layout and natural base of the case study, we will delineate buffer zones of different levels based on real conditions and provide more detailed boundaries.

Results

Value themes identification

Table 1 presents some examples of the preliminary clustering results, where “frequency” indicates the total occurrences within each category, and “score” is positively correlated with frequency and the confidence of the results. Supported by the LDA and N-gram algorithm models, the average clustering score is 105.56, and the median is 77.47, indicating that the preliminary clustering results are satisfactory and meet the research expectations. However, the current clustering is too mixed, encompassing both value themes and attributes, as well as some meaningless results. This configuration cannot be directly used to construct value themes. Therefore, we further manually selected and annotated the existing clusters, excluding those with low scores (below the first quartile) and ambiguous meanings. This refinement process established more detailed correspondences and more accurate classification results. Ultimately, through steps including data cleaning, topic modeling, topic clustering, and manual refinement, combined with retrospective studies on related value connotations (as shown in Supplementary Table 1), we identified and screened corpus-based heritage value themes. These themes cover four aspects and comprise 17 themes in total, as shown in Table 2.

Table 1 Examples of the initial clustering results based on LDA
Table 2 The 4 aspects and 17 heritage value themes

Quantitative factors selection

Using Python tools, a co-occurrence matrix was constructed, yielding 16,383 sets of corresponding relationships. According to Zipf’s Law, the frequency of any word in a natural language is inversely proportional to its rank in the frequency distribution table47. Consequently, the reliability of the obtained relationships was validated using the Zipf formula:

$$f={P\left(\gamma +\rho \right)}^{-B}{\rm{or}}\,{log\,f}=log\,P-B\,log\,\left(\gamma +\rho \right)$$
(2)

Where \(f\) represents the frequency of a word, \(\gamma\) signifies the rank of the word’s frequency, and \(P\), \(B\), and \(\rho\) are parameters of the text that collectively measure the breadth of vocabulary usage. In the logarithmic coordinate system (base 10), linear regression was performed on the 16,383 pairs of relationships. The significance p value of the fitting results was less than 0.05, indicating statistical significance in the linear regression analysis and confirming the reliability of the co-occurrence matrix analysis. Fig. 2a displays the relationship between word frequency and frequency rank of the study’s corpus in a logarithmic coordinate system. The expression for the fitted function is \({lgf}=1.29\mathrm{lg}\gamma +5.732\).

Fig. 2: Validation and selection of quantitative factors.
figure 2

a Represents the validation based on the Zipf formula, and (b) shows the distribution and frequency of mentions for the indicators.

Experience has shown that mid-frequency words, concentrated in the middle segment of the fitted function, often contain a substantial number of meaningful keywords, which are crucial for understanding the structure and features of the text48. Therefore, subsequent factor selection should focus on this portion. Cross-comparisons were conducted according to predefined selection rules, matching and organizing the factors in the factor repository with relevant data. After multiple rounds of screening and in accordance with the requirements emphasized in the Operational Guidelines, a total of 7 general indicators and 4 special indicators were selected for the 4 aspects and 17 value themes. Supplementary Table 2 provides explanations for each factor, while the dendrogram distribution in Fig. 2b illustrates each factor and its mention frequency.

Factor weight determination

To avoid unjust impacts caused by inconsistent sample sizes across different categories of natural language data, the computed results are normalized to eliminate dimensional differences and ensure they operate within the same range. Given the thematic similarity and relatively fewer value categories in heritage valuation, Grey Relational Analysis (GRA) is chosen as the method for testing and optimization. Subsequently, normalized data is processed and ranked to determine the importance order of sentiment values under each criterion. Finally, based on the degree of correlation determined by GRA, comprehensive weight calculation results ensure that interpretations and applications of the findings align with practical needs in heritage valuation assessment and conservation objectives.

Table 3 presents the weights of the subdivided factors for both general and special indicators, with the rankings derived from GRA shown in the rank column. Under the general indicators, the weight of integrity is slightly higher than that of authenticity. This may be due to the relatively better practices in existing conservation efforts, which frequently mention and emphasize the importance of properties’ integrity in conservation documents. Additionally, these texts often place greater importance on the continuation of the heritage’s outstanding universal value, thereby also focusing on their uniqueness, functionality, and influence49. In contrast, the factors of richness and typicality have relatively smaller weights because they conceptually overlap with some of the more important and common indicators (e.g., integrity also encompasses aspects of richness). Regarding special indicators, historical witness has the highest weight, while scientific research has the smallest weight, which is consistent with findings from existing academic research50.

Table 3 The weights of the subdivided factors for general and special indicators

Attributes evaluation and buffer zone delineation

A survey was designed to conduct a census of typical attributes in Hengdaohezi Town, evaluating them across 17 value themes from 4 aspects. We ultimately surveyed over 100 preserved buildings, detailed 31 of them, recorded more than 30 open spaces, and documented over 20 landscape elements (Supplementary Notes 1, Fig. 3). For different properties, we assessed their alignment with value themes using a three-point scoring system as the basis for attribute value evaluation. Additionally, based on expert panel meetings and factor weight determinations, a detailed value evaluation table for Hengdaohezi Town was developed. Table 4 shows the scores for different indicators under each theme. Most value themes exhibit authenticity, typicality, uniqueness, and influence, but lack integrity, functionality, and richness. Among the value themes, b3, c2, a4, d1, and c1 performed the best, while b4 and d5 performed relatively poorly. This is consistent with previous findings: while Hengdaohezi Town’s historical built heritage is well-preserved, its surrounding environment has suffered significant damage, and many buildings, such as the church, which was once a core of religious culture, have lost their original functions.

Fig. 3: The surveyed attributes and their distribution.
figure 3

Distinct colors represent different types of attributes, including overall layout, street spaces, railway facilities, landscape nodes, historical buildings, and old and famous trees.

Table 4 Evaluation result of value themes

The NLP-driven heritage value evaluation method assigns scores to different value themes for each attribute (Supplementary Notes 2). By coupling these 17 themes, Fig. 4a presents a value evaluation map based on the Kriging interpolation method. We selected the most suitable classification method by comparing the existing experience, the natural breaks method and K-means clustering. The first control group used existing experience to select the scores of ancient elm trees, and the locomotive depot for attribute evaluation, setting thresholds at 1.414 and 6.162. The natural breaks classification results, obtained using the ArcGIS platform, divided the values into three categories: [0-2.79), [2.79–6.16), and [6.16-). K-means clustering, performed using the SPSS platform, produced three cluster centers with values of 0.60, 2.54, and 5.49, using the first two as cut-off points. Figure 4b shows the classification results of these three methods, with different performances in various regions of the old town. The experience-based classification was strict with narrow value ranges, while the natural breaks method performed well but resulted in a larger highest value area. Comparatively, K-means clustering was more comprehensive, with a broader middle-value range and a high-value range that included core attributes and their surroundings. Therefore, the study uses the K-means clustering results as the basis for defining classification boundaries.

Fig. 4: The value evaluation map based on the Kriging interpolation method.
figure 4

a Represents the value evaluation map using the Kriging interpolation method, and (b) provides a detailed comparison of different classification methods.

The buffer zone delineation results are shown in Fig. 5 and Supplementary Fig. 3, divided into the conservation core zone and the development control area. Due to the spatial dispersion of attributes, the buffer zone is distributed across multiple areas in the northern part of Hengdaohezi Town, forming three main zones: the locomotive depot, the churches, and the big white building. These areas contain national cultural relics protection units and are the best-preserved sites for historical patterns and traditional features.

Fig. 5: The buffer zone boundaries of Hengdaohezi Town.
figure 5

a Represents the area surrounding the locomotive depot, (b) represents the area around the big white building, and (c) represents the area surrounding the church.

The area around the locomotive depot is one of the earliest planned districts in Hengdaohezi Town (see Fig. 5a) and has the highest concentration of historical preservation buildings. Block A includes two groups of national cultural relics protection units: the locomotive depot in the west and the Russian-style wooden house group in the center, which are important components of the town’s rich historical atmosphere. The conservation core zone of the locomotive depot includes the main building, auxiliary facilities (such as the old transfer room), the distribution plaza, and railway facilities, with boundaries set by surrounding walls and disused tracks. The protection scope of the Russian-style wooden house group includes typical Russian wooden log buildings and their courtyards, as well as nearly 400 meters of street space. Given the high-value content of this block, the development control area extends north to the mountain and south to the railway, defined by national road boundaries and abandoned tracks. Block B mainly includes the old Hengdaohezi Station and its auxiliary facilities, with development control areas defined by railways, protected buildings, and courtyard spaces.

The setting of the railway big white building includes a large residential section, adding cultural and living atmosphere to the town (Fig. 5b). Block C contains the big white building, with boundaries set by courtyard space and retaining walls. Blocks D and F are defined similarly, using preserved buildings and their courtyard spaces to determine conservation core zones, and incorporating lower-level historical buildings into the development control areas, with outer boundaries set by existing artificial and natural features (including roads, courtyards, terrain, and rivers). Block E, although not covering particularly valuable historical buildings, has high internal value due to open spaces and landscape elements, especially historical and cultural values, and is thus designated as a development control area.

Figure 5c shows one of the most diverse blocks in the old town, featuring many brick and wooden buildings. Historical streets such as Old Street and Postal Road contain numerous historical buildings and open spaces, with significant inherent value. The core conservation zone in the western part of Block G includes the terrace where the Russian church is located and nearly 250 meters of Old Street space, with boundaries defined by terrain and building lines. The southern part also has two small core conservation zones covering several important protected buildings. Considering the presence of some open spaces, the development control area extends along Old Street to the northwest end of the street, with other parts defined by roads and building lines.

Discussion

In the context of the imbalance between conservation and development, understanding the value of urban heritage and effectively protecting it has always been a challenging research area. This paper employs NLP methods to extract 17 value themes across 4 value aspects from 960 conservation documents and establishes a heritage value evaluation system with 11 factors. By quantitatively scoring over 180 heritage attributes, our approach offers a more precise identification of the value of historical heritage and its surrounding areas, providing an objective and robust basis for delineating buffer zones.

This study explores methods for determining the boundaries of the conservation core zone and the development control area following the logic of value quantification, attributes evaluation, and spatial mapping. In converting the value evaluation map into tangible protection boundaries, we adhere to the following principles:

  1. (1)

    Core conservation zone: Boundaries are primarily defined by key persevered buildings, courtyard spaces, and historical district red lines. The value evaluation map reveals significant value discontinuities around higher-level heritage buildings and historical districts, confirming their immense heritage value. These attributes reflect the evolution of Hengdaohezi Town from a strategic location along the CER to a residential town, providing valuable evidence of the assimilation of Russian culture into the local civilization. Therefore, these areas warrant the highest level of attention and the strictest protection. Additionally, due to historical gaps in protection planning and oversight, some illegal constructions exist around national cultural relics protection units. Although these might fall within the core conservation zone, they do not possess corresponding heritage value and require further Heritage Impact Assessments (HIA) to determine their fate.

  2. (2)

    Development control area: Boundaries are often delineated based on existing artificial and natural borders (including roads, courtyards, terrain, and rivers). Using these existing boundaries helps maintain the original appearance and structure of the heritage site, preserving its multiple value themes for better understanding and appreciation by visitors and researchers. This also aids in maintaining cultural landscape consistency. Hengdaohezi Town, initially planned around the railway and its associated buildings, can use railway lines as artificial boundaries for some parts of the development control area to preserve historical consistency. Additionally, these existing boundaries are well-known and accepted by the community and local residents, facilitating community support and cooperation, which is crucial for the implementation of protection plans. For residents of Hengdaohezi Town, the pattern of residential areas (south of the town) and historical-cultural center (north of the town) delineated by the Hengdao River is well ingrained. Therefore, part of the buffer zone is delineated using the river as a boundary, considering the community’s familiarity with these borders. Finally, using existing boundaries is often more economical and feasible for heritage protection projects. In the northern part of Hengdaohezi Town, areas such as the land around the big white building and east of the Russian-style wooden houses have many clear retaining walls that can serve as boundaries for the development control area. This approach is typically more cost-effective and reduces the need for new boundary constructions, supporting the sustainable development of the historic and cultural town.

The buffer zone delineation of existing protection plans is based on two primary sources: the Hengdaohezi Town Master Plan (2010-2030) and the Hengdaohezi Historical District Conservation and Tourism Plan. The master plan stipulates a simple delineation of the conservation core zone based on building height (2-5 times the height of the main building) and the development control area (4-8 times the height of the main building). The conservation plan designates the heritage buildings and their auxiliary courtyards as the core conservation zone, with the development control area defined by road boundaries. However, this one-size-fits-all approach has undoubtedly increased the difficulty of protection and utilization, leading to the increasingly prominent conflict between the town’s conservation and development.

In contrast, this study optimizes and refines the existing zoning methods by providing a more reliable and detailed division of buffer zones based on the technical route of “deconstruction-quantification-mapping” of values, thus promoting the positive spatial evolution of “ conservation-development” within the historic and cultural town, to a certain extent avoiding the subjectivity inherent in existing conservation planning. The defined conservation core zone covers 22.97 hectares and the development control area covers 34.28 hectares, with the following main advantages: 1) The analysis of hundreds of protection practice materials using NLP, as shown in Fig. 3, considers a more diverse range of value themes and property categories, resulting in more detailed evaluation outcomes. 2) While the original protection boundaries were delineated based on simple rules, the proposed zoning model, which considers the continuity of heritage values, community practice feasibility, and heritage protection sustainability, provides more reliable and detailed boundary results. 3) The original core conservation zone primarily consisted of the outlines of historical buildings or their auxiliary courtyards, whereas the new zoning results include several open spaces and landscape elements with rich value accumulation (e.g., Block E in Fig. 5b and the southeast side of Block G in Fig. 5c).

It should also be noted that this study has certain limitations. Firstly, while the value evaluation map was employed to determine the protection boundaries, these evaluations are based on interpolation methods, which may introduce uncertainty. When the value evaluation results of two geographically close attributes vary significantly, this method struggles to accurately predict value fluctuations within the spatial field, potentially leading to excessive boundary volatility or irrational delineation. Moreover, the evaluation results of the attributes are only assigned to the centroids of their spatial locations, disregarding the actual shapes and extents of these attributes. This could result in some areas with significant cultural value being inadequately included within the designated buffer zones16. In addition, although the study emphasizes the use of existing natural and artificial boundaries to define buffer zones, this approach may rely too heavily on fixed physical boundaries, overlooking the dynamic cultural landscape of Hengdaohezi Town and its broader social and cultural values, such as community lifestyles and intangible cultural heritage. Such an approach risks rendering heritage protection rigid, limiting its ability to adapt to future changes in cultural landscapes and community development needs. Furthermore, while the study advocates for using existing boundaries to reduce costs, this conclusion lacks comprehensive justification and does not fully elaborate on the economic feasibility. The assessment of the value of heritage buildings also demonstrates inconsistencies, particularly in cases where illegal constructions still exist within the core conservation zone. This reveals weaknesses in heritage management that could compromise the integrity and effectiveness of the buffer zone. Lastly, although the study highlights the importance of community support and collaboration, it does not clearly outline the substantive role of the community in the boundary delineation process. By overlooking the needs and opinions of local residents, the planning risks becoming detached from on-the-ground realities, potentially exacerbating tensions between heritage preservation and development. Future research should adopt a more holistic approach, taking into account the social context, planning objectives, and active community participation51. Establishing a comprehensive HIA framework that integrates the interests of all stakeholders would help ensure that the delineation of buffer zones is both scientifically robust and sustainable52.