Machine-learning-based reconstruction of Ming-dynasty defensive corridors in Yuxian

Zhao, Peiyuan; Xue, Qing; Qin, Han; Zhao, Ming; Yan, Rong; Yuan, Kun

doi:10.1038/s40494-026-02438-2

Download PDF

Article
Open access
Published: 09 April 2026

Machine-learning-based reconstruction of Ming-dynasty defensive corridors in Yuxian

Peiyuan Zhao¹^na1,
Qing Xue¹^na1,
Han Qin¹,
Ming Zhao¹,
Rong Yan² &
…
Kun Yuan¹

npj Heritage Science volume 14, Article number: 225 (2026) Cite this article

This article has been updated

Abstract

During the Ming Dynasty, Yuxian occupied a strategic position on the southern edge of the Xuan-Frontier Towns, linking Great Wall outposts with the Shanxi-Hebei hinterland and concentrating passes, forts, and military farms. Understanding its role within Nine Frontier Defense Garrisons clarifies the spatial organization of fortress settlements. This study proposes a methodological framework that integrates the Minimum Cumulative Resistance (MCR) model with machine learning to predict defense corridors. By constructing resistance surfaces based on the nonlinear location preferences of military defense ruins, multi-tiered defense corridors were identified. Results indicate that line-of-sight density and settlement density dominate corridor orientation. The reconstructed system exhibits peripheral corridors along the Great Wall and passes, as well as in-depth corridors toward hinterland fortress settlements, and shows strong coupling with reconstructed post road networks. This study provides quantitative evidence for reconstructing Yuxian’s Ming defense system and offers a transferable model for Great Wall defense analysis.

Temporal and spatial distribution characteristics of the Ming Great Wall

Article Open access 07 March 2024

Integrity protection of the Chang Zhen Great Wall heritage corridor based on minimum cumulative resistance

Article Open access 29 September 2025

Resource supply and demand model of military settlements in the cold weapon era: case of Zhenbao Town, Ming Great Wall

Article Open access 07 November 2024

Introduction

As a strategic pivot point along the Xuan-Da Frontier Towns within the Ming Dynasty’s “Nine Frontier Defense Garrisons System”, Yuxian (historically Yuzhou) stood out not only as a critical pass on the outer edge of the Great Wall but also controlled the mountain passes at the junction of Shanxi and Hebei. Its unique geographical characteristics prompted the Ming court to establish dense defenses here, thus forging an in-depth hierarchical defense network composed of Yuzhou guard city, numerous fortress settlements, and beacon towers deployed along the Great Wall¹. During the Jiajing Period, in response to the hunger-induced unrest among northern nomadic tribes triggered by the “Little Ice Age”, the Ming court further strengthened this defensive layout. It constructed Great Wall fortifications around Yuzhou, deployed additional military garrisons, and prompted military-civilian farmland. These measures aimed to address the dual needs of defending the frontier against enemies and securing local food supplies. Under this layout philosophy, Yuzhou guard city was situated in an open area along the Huliu River, boasting the dual advantages of defense and cultivation, epitomizing the principle of integrated military-civilian planning. Most affiliated fortresses were also deployed along foothill plains and strategic valley passes, leveraging topographical advantages to develop military garrison points. This thus constructed the first line of defense guarding Taihang passes and shielding the inland regions. It can be argued that natural mountains and rivers configuration of Yuxian was highly coupled with the man-made military defense system. Steep mountain ridges served as natural barriers, dangerous passes were fortified with forts and checkpoints, and fertile plains supplied military provisions. This combination of strategic terrain and key controlling thoroughfares enabled Yuzhou to long serve a pivotal role as strategic stronghold within the Ming Dynasty frontier defense corridors.

Surrounding this spatial configuration, historical geography and military history research have explored Yuxian’s defense system and the defensive system of the Nine Frontier Garrisons at various scales^2,3. At the micro level, scholars including Shang H. discovered through field surveys that as many as 425 ancient fortresses remain in Yuxian, exhibiting an unusually high density of concentration for northern China’s frontier regions⁴. These fortresses display a settlement pattern of “integration of villages and forts”, which could be converted into fortifications by closing gates during wars. Building on this, Yan R. et al. regarded Yuzhou’s fortresses, together with the Great Wall and beacon towers, as a hierarchically distinct and functionally complementary frontier defense system, and sorted out the history of interactions between Ming-dynasty defense and geographical environment⁵. At the macro level, Zhao X. et al., approaching from the perspective of the “Nine Frontier Defense Garrisons System” and strategy, proposed that the military garrisons of the Ming Dynasty’s northern frontier adopted a flexible adjustment model combining the addition or removal of strongholds with functional transformation during different phases of frontier threats⁶. Overall, existing studies on Yuxian and its affiliated defense system mostly rely on such macro-institutional analysis and case-based historical document verification. Although a few studies have begun to incorporate GIS mapping and spatial analysis methods, the predominant research approach remains qualitative narrative, lacking systematic quantitative verification based on spatial modeling.

Against this backdrop, the advancement of geospatial information technology has offered a new spatial modeling perspective for the reconstruction of historical and cultural corridors^7,8. Among related methods, the Minimum Cumulative Resistance (MCR) model is often employed to derive cultural linear corridors on a comprehensive cost surface, revealing suitable zones between heritage sources and proposing corresponding conservation strategies^9,10. However, when the focus shifts to military sites themselves, quantitative evaluation poses greater complexity. This is because the spatial distribution and location preferences of ancient sites typically stem from the combined effects of physical geographical conditions and institutional decisions. Factors affecting their defensive capacity extend beyond the geographical environment to include political power structures, frontier governance systems, military expenditure provision, and troop deployment. While these elements are often scattered throughout ancient documents, they prove difficult to quantify on a unified scale^11,12. Therefore, considering that the operation of ancient defense systems relied heavily on the efficiency of information transmission and alert response, evaluating the defensive capacity of sites from a visual perspective has emerged as a relatively feasible and reasonable technical approach under multiple constraints^13,14,15. Nevertheless, existing studies have certain limitations in cost factor modeling. The weights of different cost factors mostly rely on empirical settings or linear models. For instance, Principal Component Analysis often leads to the loss of geographical semantics due to information compression, while the Analytic Hierarchy Process is prone to the influence of subjective judgements, making it difficult to characterize the nonlinear structures in the location logic of ancient defensive facilities. Meanwhile, quantitative research on defense corridors remains relatively scarce, with both the identification of spatial continuity and the interpretation of cost structures requiring further exploration.

In cases where cost structures and spatial continuity cannot be fully characterized by linear methods, the introduction of machine learning offers a new approach to exploring complex nonlinear relationships between research variables. Compared with traditional linear regression or single-index superposition, methods such as ensemble learning can automatically identify key variables and their interactions in a high-dimensional factor space. They characterize threshold effects and combination effects through nonlinear decision tree structures, and alleviate the overfitting problem caused by spatial autocorrelation of samples under the constraint of spatial cross-validation. Meanwhile, SHAP (SHapley Additive exPlanations) based on interpretability analysis can avoid interpretation challenges arising from the “black-box effect” of machine learning. In recent years, related methods have been applied in fields such as archeological prediction models and ecological security patterns. On the one hand, models like Random Forest (RF) and Gradient Boosting Decision Trees have been used to evaluate the relative importance of environment factors for known site samples¹⁶. On the other hand, ensemble learning models such as XGBoost have been combined with the MCR model to derive landscape resistance and identify ecological corridors based on ecological samples and environmental factors, forming a data-driven resistance surface and corridor system¹⁷.

Based on the above analysis, this study takes the military defense system of Yuxian in the Ming Dynasty as the research object and constructs an integrated defense corridor deduction framework combining interpretable machine learning, cost function and MCR. The framework involves: using XGBoost/RF with spatial block cross-validation and SHAP main effects to construct a data-driven comprehensive defense cost surface, starting from the nonlinear threshold preferences of military defense site selection, and identifying a multi-tiered defense corridor network under the MCR model. By conducting spatial comparisons between the predicted corridors, the distribution of military defense sites, and the reconstructed post road network, this study further reveals the coupling relationships between military nodes, topographic patterns, and transportation networks, as well as their significance in defense corridor identification and frontier defense system reconstruction.

On this basis, this study makes three contributions. Methodologically, interpretable machine learning is employed to extract nonlinear threshold preferences from Ming-dynasty military defense site selection and translate them into a continuous defense cost function, replacing traditional empirically or subjectively weighted resistance surfaces for MCR-based corridor deduction. Empirically, using Yuzhou as a representative frontier prefecture in the Ming Dynasty, we reconstruct a multi-tiered defensive corridor system, quantify its coupling with military nodes, terrain patterns, and transportation networks, and fill a gap in spatial reconstruction and quantitative analysis of the local defense system. Application-wise, the framework is formalized into a parameterizable and reusable template that can support frontier-defense heritage identification and spatial planning in other regions.

Methods

Research area

Given that Yuxian served as a key guard city under Xuanfu, which was one of the major strongholds of the Great Wall’s “Nine Frontier Defense Garrisons”, and considering its unique topographic structure and strategic pattern, this study focuses primarily on the natural geographical characteristics of Yuxian and their relationship with the defense system. The “Nine Frontier Defense Garrisons” refers to the frontier military system gradually established during the Hongwu to Jiajing periods of the Ming Dynasty, marked by the implementation of the Zongbing Regional Garrison System¹⁸.

Yuxian in the Ming Dynasty was subordinate to Datong Prefecture(Datong Fu), governing three counties including Guangling, Lingqiu, and Guangchang. It is located in the northwestern part of Hebei Province, at the northwestern foot of the Taihang Mountains, and stands at the southernmost tip of Zhangjiakou. The geographical pattern Ming-dynasty Yuzhou comprised deep mountains in the south, river valleys in the middle, and hills in the north, with mountainous terrain surrounding it on all sides, naturally forming a lot of mountain passes and strategic passages. During the Ming Dynasty, in order to safeguard the capital and central plains, the central government constructed a defense system of inner and outer Great Walls west of Beijing. Yuxian occupied a critical buffer zone between these two walls and fell under the military jurisdiction of Xuanfu together with three neighboring towns, forming a mutually supportive strategic triangle. Key military garrisons such as Yuzhou guard city and Guangchang Defense Qianhu Suo (battalion) were established here, thus constructing the northern and southern defense lines of Yuzhou. Leveraging water systems, specifically the Huliu River, this county implemented an army farm strategy, making it a vital logistical supply base within the Great Wall defense system. This was also one of the reasons why Yuxian became the frequent target of raids by the northern pastoral nomads (Fig. 1).

Data sources

The data involved in this study can mainly be classified into the following categories:

Documentary Sources: Topographic data were sourced from the ASTER GDEM v2 digital elevation model (30 m resolution) provided by the Geospatial Data Cloud platform¹⁹. Following the clipping, filling and smoothing of the DEM, further raster derivatives such as slope, terrain undulation and Topographic Position Index (TPI) were generated. These metrics characterize natural conditions including mountainous resistance and gully-ridge structures, exerting constraints upon the layout of Yuxian’s peripheral zones and potential defensive corridors in the Ming Dynasty.

Historical Map Data: This study employed administrative division data from The Historical Atlas of China (Vol. 7), edited by Tan Qixiang²⁰, and the China Historical GIS (CHGIS) database from Harvard University²¹. The former provides a visual reference for the boundaries and military-administrative facilities of Yuxian around 1581, while the latter supplies the coordinates of administrative seats and their administrative affiliations during the Ming Dynasty.

The dataset of military sites is categorized into two subsets: research anchor data and model sample data. The research anchor data is derived from 65 heritage sites related to Yuxian as recorded in The Historical Atlas of China (THAC). Given the spatial positioning deviations inherent in historical maps, the precise coordinates of these sites were acquired through field surveys and UAV aerial photogrammetry. These coordinates were used for subsequent confidence assessment and study scope rectification, establishing a foundational point set for anchoring the research area. The model sample data was sourced from the China Great Wall Heritage Site²², which includes site locations of the Great Wall ramparts, watchtowers, horse-faces, and beacon towers. This dataset is utilized for subsequent defensive cost modeling and corridor inference.

Ancient transportation data were primarily drew from this research team’s prior systematic compilation of Yuxian’s historical postal relay and patrol dispatch systems. During the Ming Dynasty, the postal relay system fell under the jurisdiction of the Ministry of Rites and the Office of Communications, undertaking transportation and communication functions such as document delivery and reception of envoys. As non-military defensive nodes, relay stations and reconstructed post road networks were thus regarded as transportation nodes and passage frameworks, serving to construct road accessibility factors. In contrast, the patrol dispatch system operated within the military garrison system, primarily responsible for sentry duties and border outpost surveillance. Endowed with distinct military attributes for frontier containment and wartime support, patrol posts thus formed a military settlement system alongside guard cities, garrison towns, fortress settlements, and mountain passes. This system was employed to calculate defense core density, visual network indicators, and comprehensive defensive capability.

Historical Documents: Local gazetteers such as the Gazetteers of Yuxian County and the Gazetteers of Guangling County^23,24,25,26, along with their accompanying maps and other historical materials, were employed to verify the names, functions, and approximate locations of historical sites including guard cities, garrison towns, fortress settlements, mountain passes, and patrol posts. These findings were cross-referenced with the positions of surveyed archeological sites and historical base map data.

Research framework and technical route

The technical framework of this study comprises four interrelated stages, (Fig. 2). First, preliminary spatial analysis and input construction are conducted. Based on heritage site anchors, the validity of the historical study scope is verified. Elements such as spatial patterns, viewsheds, and the ancient post road network are preprocessed and quantified into indicators to generate basic raster variables. Accordingly, characteristic scales and spatial constraints are determined to provide a basis for subsequent stratified sampling and spatial cross-validation.

Subsequently, defense costs are defined, and machine learning modeling is performed. Based on cost factor screening and collinearity analysis, the defensive suitability variable is explicitly defined. A stratified sampling strategy combined with machine learning algorithms is then employed to train predictive models.

Following this, SHAP interpretability analysis is introduced to quantify the relative contribution of each factor and identify their directionality and non-linear effects on site distribution.

Finally, the construction of MCR defensive corridors is completed. Based on SHAP main effect curves, piecewise functions are constructed to map and normalize cost factor rasters via non-linear transformation, generating a new comprehensive resistance surface. Potential defensive corridors are then extracted based on this surface for further spatial interpretation.

Preliminary spatial analysis and input construction

Rationality verification of military defense settlement sites and historical boundaries

Prior to constructing the defensive cost model, we conducted a systematic confidence assessment of the heritage anchors to validate the rationality of the candidate study area. Based on Ming-dynasty frontier documents and historical maps of Yuxian, we geolocated and matched 65 site points within the unified spatial reference framework of THAC, and constructed an evidence-weighting index from three aspects—positional accuracy, chronological verifiability, and source reliability²⁷,as follows:

$$Con{f}_{i}=0.6{C}_{p,i}+0.3{C}_{t,i}+0.1{C}_{s,i}$$

(1)

In Eq.(1), ${\mathrm{Conf}}_{i}$ denotes the point-wise confidence,${C}_{p,i}$ represents position confidence; ${C}_{t,i}$ denotes chronological confidence; ${C}_{s,i}$ indicates source reliability. Each component is assigned according to a unified scoring sheet using a four-level ordinal scheme, and the weights are determined through expert elicitation.

$$Con{f}_{overall}=\frac{1}{N}{\sum }_{i=1}^{N}Con{f}_{i}$$

(2)

In Eq.(2), $N$ is the number of site points included in the assessment (this study:$N$=65).${\mathrm{Conf}}_{\mathrm{overall}}$ is the arithmetic mean of the ${\mathrm{Conf}}_{i}$ values,which reflects the overall reliability of the military site dataset within the study area and supports the plausibility of the study-area delineation (Table S1).

Its comprehensive confidence is approximately 0.68. Based on this, a confidence-weighted site kernel density was generated (Fig. 3a) to explicitly address sample uncertainty arising from incomplete historical records (Table S2).

**Fig. 3: Rationality verification of heritage anchors and the study-area mask.**

Because THAC does not provide county-level boundaries for the Ming dynasty, we compared the Datong Prefecture boundaries across the time slices available in THAC. Relative to 1369, the 1371 prefectural boundary shows a pronounced southeastward expansion, and this spatial configuration remains broadly stable through the Wanli period. In light of historical records on Yuzhou’s administrative reassignment to Datong Prefecture in the early Hongwu reign²⁸, this increment corresponds closely to the documented adjustment in Yuzhou’s administrative affiliation. We therefore treat the incremental area of the 1371 prefectural boundary relative to 1369 as a proxy for the early-Ming administrative extent of Yuzhou, rather than as county-boundary ground truth.

We then assessed the plausibility of this proxy extent using three quantitative indicators (Table 1): terrain conformity, edge-settlement distance to the boundary, and boundary coverage. The proxy extent performs consistently well across these indicators. Moreover, when overlaid with the present-day administrative boundaries of Yuxian and its three neighboring counties (Guangling, Lingqiu, and Laiyuan), it shows strong overall agreement in macro-scale form (IoU ≈ 0.92; Fig. 3b). High-overlap segments between the proxy and modern boundaries are concentrated along high-elevation ridgelines and watershed divides (Fig. 3c). In addition, more than 90% of high-confidence defensive settlements fall within the intermontane basins and Great Wall–pass corridors enclosed by the proxy extent, suggesting that the study area is jointly shaped by persistent topographic structure and the defensive system. Accordingly, we adopt the 1371-derived proxy extent as the analytical study area, and within a 1-km buffer of this boundary we screen 278 defensive heritage sites as positive samples for subsequent modeling (Fig. 3d).

Table 1 Verification of rationality for the to-be-verified boundary

Full size table

Spatial pattern Analysis

Compared with the independent dataset of heritage anchors previously used to define the study scope, this study further selected 278 field-surveyed military sites within the study area for spatial pattern analysis (Table 2). First, kernel density estimation was applied to smooth the point sites into a continuous defensive intensity raster, yielding the spatial intensity distribution of military defense facilities in the study area(Fig. 3d). The formula is as follows:

$$\hat{f}(s)=\frac{1}{n{h}^{2}}{\sum }_{i=1}^{n}K\left(\frac{d(s,{s}_{i})}{h}\right)$$

(3)

In Eq.(3), $\hat{f}(s)$ denotes the estimated probability density at location$s$; $n$ is the number of sample point; $h$ is the bandwidth;and${s}_{i}$ is the coordinate of the $i$-th sample point. Second, the average nearest neighbor distance of the point-like pattern was calculated under Euclidean distance measurement using the site elements as input. Based on the Complete Spatial Randomness (CSR) assumption, the theoretical expected distance and Nearest Neighbor Ratio (R) were derived, and the Z-score and p-value were used to test whether it significantly deviates from a random distribution.The formula is as follows:

$$R=\frac{{\bar{d}}_{o}}{{\bar{d}}_{e}},\,{\bar{d}}_{e}=\frac{1}{2}\sqrt{\frac{A}{n}}$$

(4)

In Eq.(4), ${\bar{d}}_{o}$ is the observed mean nearest-neighbor distance, ${\bar{d}}_{e}$ is the expected mean distance under complete spatial randomness (Poisson),$A$ is the study area and n is the number of points. $R$>1 indicates dispersion and ${R}$<1 indicates clustering.

Table 2 Results of spatial clustering tests for military settlements

Full size table

Finally, the Moran’s I index of military settlements was calculated under inverse distance weighting and Euclidean distance adjacency to test their spatial autocorrelation.

The formula is as follows:

$$I=\frac{n}{W}\frac{{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}{w}_{ij}({x}_{i}-\bar{x})({x}_{j}-\bar{x})}{{\sum }_{i=1}^{n}{({x}_{i}-\bar{x})}^{2}}$$

(5)

In Eq.(5),n denotes the number of military defense settlement sites,${x}_{i}$ is the kernel density (or settlement density) value associated with site$i$. $I$ > 0 indicates clustered military settlements,$I$ < 0 indicates dispersed settlements.

In subsequent model construction, these high-density areas were used as candidate regions for potential defense corridors, serving to construct settlement density costs and assist in determining the stratified sampling scope.

Construction of visual network and extraction of visual distance control factors

The military defense system of the Ming Dynasty exhibited distinct linear visual control characteristics. Particularly along the Great Wall and border defense lines, a beacon tower system and military settlements formed an information transmission network relying on visual distance connectivity, aiming to enable timely defensive responses to early warning messages. To quantitatively evaluate the spatial control capability of Yuxian in the Ming Dynasty, this study constructed a regional viewshed network using cumulative viewshed analysis and linear intervisibility analysis. Observation and target height offsets were set according to the different structural heights of military settlement facilities: a uniform 1-m eye-level height was added as the observation offset for simulating observers, while an 8-m smoke height was added as the target offset for simulating observed objects¹³. For any observed beacon tower site, the offset was the sum of its own facility height and the smoke height (Table 3). Some sites along the Great Wall showed low visibility frequencies in the cumulative viewshed. However, they were actually connected by walls and possessed visual connectivity. Therefore, the wall connection relationship was approximately introduced to assist in judging the viewshed connectivity of such facilities.

Table 3 Settings of offset values for different types of sites

Full size table

Complex Network Analysis (CNA) identifies the importance of military settlements in the visual network by indexing the intervisibility relationships between nodes. This study selected indicators including node degree, betweenness centrality, and closeness centrality^29,30,31 (Table 4), to evaluate three key aspects: the settlements’ visual distance connectivity capability within local areas, their significance as relay hubs in the overall network, and their average path length to all other visible nodes, reflecting their control efficiency and information response speed in the global structure. Meanwhile, network density and visual inclusiveness were used to assess the integrity and node coverage rate of the global visual network (Fig. 4). Indicators such as Intvis_den and Betweenness, constructed based on cumulative viewshed and intervisibility network, collectively show (Fig. 4 and Table 5) that the visual network possesses certain connectivity and node heterogeneity. This provides a structural foundation for subsequent extraction of defensive cost factors including visual line density and node centrality (Fig. 5).

**Fig. 4: Viewshed network of military defense in Ming-dynasty Yuxian.**

**Fig. 5: Node centrality of military settlements in the visibility network.**

Table 4 Indicators of Complex Network Analysis in the visual network

Full size table

Table 5 Overall statistics of indicators for the viewshed network (n = 146)

Full size table

Reconstruction of ancient post road network and construction of transportation cost factors

The transportation corridors of the Ming Dynasty played a crucial role in troop deployment and material transportation within the military defense system, with their passability and accessibility regarded as core constraints for road selection. First, this study identified transportation nodes such as post stations and mountain passes based on records of the postal relay and patrol dispatch systems in ancient documents. Combined with the county territory maps attached to local gazetteers (Fig. 6), the study spatially registered each symbol of the postal relay and patrol dispatch systems as well as military settlement on the maps one by one. Furthermore, according to the node accessibility relationships indicated by the post route lines on the maps, preliminary connection paths were drawn between adjacent transportation nodes, serving as a prior guide for directions and corridors. Finally, drawing on the research methods for post road reconstruction in the Datong Prefecture proposed by Yingchun Cao et al.³², a transportation cost surface was constructed with core factors including slope, terrain undulation, and river crossing cost. On this basis, the Least Cost Path (LCP) method was used to connect various nodes to generate the ancient post road network within Yuxian. The reconstructed ancient post road network was used not only to construct the cost factor of the distance to ancient post road, but also to provide a comparative benchmark for judging the coupling relationship between defense corridors and the existing transportation network.

**Fig. 6: Reconstruction of ancient post roads in Ming-dynasty Yuxian.**

Construction of the defensive cost model

The defensive cost model constructed in this study is essentially a predictive framework based on environmental features, designed to quantify the locational preferences of Ming Dynasty military facilities using machine learning algorithms.The model designates spatial “defensive suitability” as the target variable. Based on the foundational raster variables obtained from the preceding analysis and multi-source data, a system of three cost factor categories was constructed. Subsequently, a stratified sampling strategy was implemented to generate a negative sample set equal in size to the positive samples. Logistic Regression (LR), Random Forest (RF), and XGBoost models were then employed to derive (or predict) suitability probabilities. Given the intrinsic robustness of the selected ensemble tree models to feature scaling, data normalization was not applied globally during the machine learning phase. Instead, standardization was applied exclusively to the input features for the LR model. Global feature normalization was deferred to the subsequent stage of constructing the SHAP-based resistance surface.

Selection of cost factors

During the Ming Dynasty, the beacon tower system in Yuxian was interconnected relying on terrain, forming a hierarchically distinct early warning network. Meanwhile, the transportation system connected various nodes in series through post roads, constituting the basic traffic framework. Correspondingly, for military key points such as guard cities, garrison towns, and mountain passes that undertook strategic maneuver and defensive deployment, the construction of the defense corridors behind them also followed specific spatial preferences and selection logic.

The quantification of defensive capabilities of military defense sites and landscape nodes helps explain their relative importance¹³, as well as the spatial logic underlying the formation of defense corridors. Existing archeological site prediction models³³ mainly analyze the relationship between geographical factors and the spatial distribution of sites to predict the probability of sites appearing in specific spatial areas. In such models, topographic and environmental characteristics are usually regarded as potential cost factors closely related to defensiveness^34,35. Based on this, settlement density, cumulative viewshed, visual line density, and node centrality were selected as defensive cost factors to reflect the defensive performance of sites in terms of information transmission, viewshed control, and spatial connectivity. Meanwhile, indices such as elevation, slope, TPI, TRI, and surface relief were chosen as geographical cost factors, and distance to post roads, distance to the Great Wall, distance to defensive nodes, and distance to water sources as distance cost factors. Together, these constitute the candidate cost system for the subsequent defensiveness model (Table 6).

Table 6 Preliminary selected cost factors and relevant information

Full size table

Screening of correlation and multicollinearity among cost factors

Prior to conducting defensive cost modeling based on machine learning, to ensure the consistency of various raster factors in terms of spatial location and resolution, unified spatial preprocessing was performed on all raster data: All data were projected to the WGS_1984_UTM_Zone_50N coordinate system, with the 30 m resolution DEM used as the Snap Raster. Bilinear interpolation was adopted for resampling continuous variables, while the Inverse Distance Weighting method was used to supplement a small number of missing values in node centrality indicators and generate continuous raster surfaces. For distance cost factors, Euclidean distance rasters were used as continuous variable inputs to avoid subjectivity caused by classification thresholds and loss of rank information. The above-mentioned processing provided a consistent-scale raster data foundation for subsequent correlation analysis and cost factor screening.

On the basis, it is necessary to test and control the correlation and multi collinearity among various cost factors. Existing studies have shown that high correlation among environmental factors will increase the risk of model overfitting, weaken the physical meaning of the interpretation of variable importance, and thus make the model more likely to fit noise rather than true patterns^36,37. To eliminate potential multicollinearity among candidate cost factors, this study first adopted Spearman’s rank correlation coefficient to analyze the monotonic correlation between variables, and subjected highly correlated factors to preliminary elimination. Subsequently, the Variance Inflation Factor (VIF) was calculated to identify sources of redundancy and perform iterative elimination, so that the multicollinearity among the retained variables was controlled within an acceptable range (VIF < 5)³⁸. This ensures the stability and interpretability of the modeling process.

Spearman’s rank correlation analysis shows (Fig. 7) that the absolute values of correlation coefficients between most cost factors are less than 0.5, indicating a relatively low overall correlation level. An extremely strong positive correlation is only observed between Slope and the TRI (ρ = 0.94). In addition, there exist moderate positive correlations between the kernel density of military settlements (KD_military) and node betweenness centrality (Betweenness), between KD_military and visual line density (Intvis_den), as well as between Betweenness and Intvis_den (|ρ | = 0.48–0.62). However, none of these correlations exceed the empirical threshold of 0.8³⁹. Subsequently, the calculated results of Variance Inflation Factor (VIF) are presented in Table 7. The VIF values of DEM, Slope, and TRI are 12.51, 15.96, and 14.68 respectively, which are significantly higher than 10, indicating severe linear overlap among the three factors. The VIF values of all other variables range from 1.72 to 4.84, reflecting a generally low level of multicollinearity. Combined with the correlation coefficient matrix, it can be concluded that the excessively high VIF values are mainly caused by the redundancy of slope-related indicators.

**Fig. 7: Results of cost factor screening.**

Table 7 VIF values of each cost factor before and after TRI elimination

Full size table

Referring to the processing principle of eliminating variables with |ρ | >0.8 in previous studies, as well as the physical meaning and interpretability of indicators, this study retained Slope (representing the first-order slope gradient) and Rugged (indicating the terrain ruggedness), while eliminating TRI, which is highly redundant and most strongly correlated with Slope. Subsequently, a repeated correlation check showed that the absolute correlation coefficients among the remaining 12 cost factors were all below 0.6. Regarding the VIF results, all variables had VIF values below 5 except for DEM (12.51). Considering that DEM has an independent and critical physical significance in geomorphological processes, and the subsequent main ensemble learning models such as RF and XGBoost are insensitive to linear multicollinearity between variables, DEM was retained. On the other hand, since most military defense facilities were built adjacent to the Great Wall, including the “distance to the Great Wall” in the model would easily excessively overstate its explanatory power, thereby obscuring the true contributions of other cost factors. Thus, this variable was eliminated. Finally, a total of 11 cost factors were selected for modeling.

Stratified sampling strategy

This study used 278 verified measured site points, which were screened based on previously validated boundaries, as positive samples, and adopted a lightweight stratified sampling strategy constrained by the Great Wall survey lines. First, the Great Wall section from Langya Pass to Wulong Castle, as well as military settlements such as mountain passes and forts along this section, were selected to construct a linear constraint zone of defensive settlements with defensive attributes. This defensive settlement zone has been depicted as an important defensive belt in the southern part of the county in various ancient maps and documents^40,41,42. Subsequently, Near analysis (Near) was used to calculate the shortest distance from each site point to the Great Wall survey lines, and their distribution characteristics across different distance gradients were counted. On this basis, five distance belts (2 km, 5 km, 10 km, 20 km, and 40 km) were set. Within each distance belt, background points were randomly selected as negative samples at a 1:1 ratio relative to the number of positive samples (Table 8). To avoid spatial overlap and category interference in geographical locations, the selection of negative samples was subject to the condition that they did not coincide with site points within a 300 m circular buffer zone. This stratified sampling strategy ensures that positive and negative samples have comparable environmental conditions near the Great Wall buffer zone, thereby improving the model’s ability to distinguish subtle differences in the site selection preferences of historical settlements.

Table 8 Number of generated samples in each distance belt

Full size table

Machine learning modeling

This spatial pattern analysis in the previous section (Table 2) indicated that military defense sites exhibit a significantly clustered distribution (Mean Nearest Neighbor Ratio R = 0.42, z = –18.53, p < 0.001), and the defensive intensity field shows strong spatial positive autocorrelation (Moran’s I = 0.967, z = 15.96, p < 0.001). If traditional random cross-validation is adopted, it will lead to a high degree of interdependence between the training set and the test set at the 1–3 km scale, thereby underestimating the generalization error. To mitigate the impact of spatial autocorrelation, this study employed spatial block cross-validation: Regular grids with a side length of approximately 7 km were generated within the study area, dividing it into 154 spatial blocks. These blocks were further partitioned into 5 folds, with each fold containing a set of spatially relatively continuous grids and approximately 20% of the sample points. During model training, 4 folds were alternately used as the training set, and the remaining 1 fold as the test set. This ensures that the typical spatial interval between the test set and the training set is greater than approximately 3 km, which exceeds the main distance scale of the aforementioned spatial autocorrelation effect. To compare the bias effect of traditional methods, a control group with 5-fold random cross-validation (CV) was also established (Fig. 8). To reduce fluctuations caused by random partitioning and improve the reproducibility of results, a fixed random number seed (42) was used for initialization when constructing cross-validation and training models involving random processes (such as RF and XGBoost).

**Fig. 8: Schematic diagram of fold in spatial cross-validation.**

Within the framework of spatial block cross-validation, three models including RF, LR, and XGBoost were selected to invert and calculate the influence weights of each cost factor. The LR model can explain the directionality and linear weights of each cost variable, while the RF and XGBoost models use the results of the LR model as a linear baseline to further reveal the nonlinear influence preferences of each cost variable and their importance ranking. Based on the 556 sample points obtained from the aforementioned stratified sampling strategy, a binary response variable for modeling was first constructed. The dependent variable was defined as ‘defensive locational suitability’, assigning a value of 1 to positive samples and 0 to negative samples, thereby obtaining a label variable y taking values in {0,1} to distinguish between the positive and negative sample sets.

Furthermore, a grid search was conducted to optimize the hyperparameter combinations for each model. Specifically, for the LR model, input features were pre-standardized to ensure the convergence speed of gradient descent and the effectiveness of regularization. The model was configured using L2 regularization (penalty = ‘l2’) and the ‘liblinear’ solver. The optimal value of the regularization parameter C was determined via cross-validation within the range of {0.01, 0.1, 1, 10} to prevent model overfitting. The LR model formula is as follows:

$$logit{(}{p}_{{i}}{)}{=}ln\frac{{{p}}_{{i}}}{{1}{-}{{p}}_{{i}}}{=}{\beta }_{{0}}{+}{{\sum }}_{{j}{=}{1}}^{{p}}{\beta }_{{j}}{x}_{{i}{j}};\,$$

(6)

$${p}_{i}=P({y}_{i}=1{\rm{| }}{x}_{i})$$

In Eq (6),${p}_{i}$ represents the predicted probability of site/settlement occurrence for sample $i$, given the vector of cost factors ${x}_{i}$.

For the RF model, the number of decision trees was set to the range {100, 200, 300} to control the scale of the forest; the maximum depth was set to {5, 7, None}, where “None” indicates no restriction on tree depth, serving to adjust the complexity of individual trees; the minimum number of samples per leaf node was set to {1, 3} to limit the minimum sample size contained in leaf nodes, thereby achieving regularization and smoothing of predictions. A subset of features was randomly selected at each node for splitting to enhance the diversity among individual trees and reduce the risk of overfitting. For the XGBoost model, the learning rate (learning_rate) was set to {0.01, 0.05, 0.1} to control the update step size of each tree on the overall model; the max_depth was set to {3, 5, 7}, with shallow trees helping avoid over-splitting when the sample size is limited, while deep trees are used to characterize complex nonlinear relationships; the number of trees (n_estimators) was set to {100, 200, 300}, corresponding to a medium-scale ensemble of trees; the subsample ratio (subsample) was set to {0.8, 1.0}, which not only ensures sample utilization efficiency but also introduces a certain degree of regularization through random subsampling of samples. Other regularization-related hyperparameters, such as the column sampling ratio (colsample_bytree), the minimum sum of instance weights needed in a child (min_child_weight), and the L2 regularization term coefficient (reg_lambda), adopted the default settings of XGBoost to avoid excessive expansion of the parameter space. Finally, AUC (Area Under the ROC Curve), PR-AUC (Area Under the Precision-Recall Curve), and Brier score were used to comprehensively evaluate model performance. AUC measures the overall ability to distinguish between positive and negative samples; PR-AUC focuses more on the recognition performance for positive classes (sites); the Brier score characterizes the consistency between predicted probabilities and actual observations. In general, AUC and PR-AUC values higher than 0.8 and close to 0.9 or above are considered to indicate good discrimination performance, while a Brier score significantly lower than approximately 0.25 (the baseline value for random predictions) indicates high probability prediction performance. The Brier score formula is as follows:

$${BS}=\frac{1}{n}{{\sum }_{i=1}^{n}\left({p}_{i}-{y}_{i}\right)}^{2},\,{y}_{i}\in \{0,1\}.$$

(7)

In Eq.(7), ${{\rm{p}}}_{{\rm{i}}}$ is the predicted probability of site occurrence output by the model, and ${{\rm{y}}}_{{\rm{i}}}$ is the observed label (0/1).

SHAP interpretability analysis

To further explain the site selection patterns of military defense settlements learned by the machine learning model and quantify the relative contributions of each cost factor to the probability of site occurrence, SHAP was introduced for interpretability analysis based on the selected optimal model⁴³. SHAP is based on the game-theory-derived Shapley value principle⁴⁴, which regards the model output as the result of collaborative contributions from various features. By comparing the predictive changes of the model before and after adding or removing a specific feature, the direction and magnitude of the marginal contribution of that feature to the prediction of individual samples are calculated. For each sample-feature combination, a positive SHAP value indicates that the corresponding cost factor will increase the predicted probability of a defensive settlement appearing at that location, while a negative value indicates an inhibitory effect. The absolute value reflects the intensity of the influence. The formula is as follows:

$${{\rm{\phi }}}_{i}(f,{\bf{x}})={\sum }_{S\subseteq N\setminus \{i\}}\frac{{|S|}!({|N|}-{|S|}-1)!}{{|N|}!}\left[f({{\bf{x}}}_{S\cup \{i\}})-f({{\bf{x}}}_{S})\right]$$

(8)

In Eq.(8), $N$ denotes the entire set of features, $S$ represents the subset excluding feature ${\rm{i}}$, and ${\rm{f}}({{\rm{x}}}_{{\rm{S}}})$ denotes the model output when only the feature subset ${\rm{S}}$ is used.

On the global scale, averaging the SHAP values of each cost factor yields a set of model structure-independent feature importance rankings. This ranking is used to measure the relative weights of each cost factor in explaining the spatial distribution of defensive settlements, complementing the linear weights derived from LR coefficients. On the local scale, SHAP scatter plots and dependence plots are employed to examine the nonlinear responses of predicted probabilities to changes in the values of individual factors, thereby identifying key threshold intervals. Examples include “the optimal distance range from the Great Wall for fortress construction” and “the threshold intervals for slope and viewshed conditions”. This method can convert the implicit site selection preferences embedded in black-box models into interpretable spatial rules, providing a basis for subsequent defensive suitability mapping and defense corridor extraction⁴⁵.

Defense corridor modeling based on MCR

To quantify the preference effect of defensive cost factors on potential defense corridors, this study constructed non-linear resistance functions based on SHAP main effect curves. Through continuous reclassification, the suitability contribution of each factor was transformed into resistance values. Subsequently, a comprehensive resistance surface was generated via weighted summation, and the MCR model was employed to identify the defense corridors. First, spline smoothing was applied to SHAP main effect curves and pairwise relationship plots to obtain continuous main effect functions ${g}_{k}({z}_{k})$. For factors showing an overall monotonic trend within their value ranges, the sign and slope of ${g}_{k}$ were directly used as the basis for the direction and magnitude of cost changes with factors. For factors with a unimodal or U-shaped trend, the curves were divided into several local monotonic intervals at the global extreme values. Within each interval, piecewise linear approximation was performed using endpoint connection lines, thereby obtaining the continuous approximate main effect ${\widetilde{C}}_{k}({z}_{k})$. Subsequently, one-dimensional linear interpolation was conducted on the factor values of each pixel, followed by linear normalization within the range of the 5th to 95th percentiles. The results were then truncated to the [0,1] interval to generate standardized single-factor resistance rasters. The formula is as follows:

$${C}_{\mathrm{total}}(x)={\sum }_{k=1}^{k}{w}_{k}{C}_{k}(x)$$

(9)

In Eq.(9), $x$ denotes a raster cell, ${C}_{k}(x)$ represents the normalized single-factor cost raster, ${w}_{k}$ is the corresponding weight, and $k$ is the number of cost factors involved in the superposition. Equal weight setting is uniformly adopted for each cost factor, i.e., ${w}_{k}$ = 1/K. SHAP values are not directly used as the source of weights.

To identify potential defense corridors, this study took military settlements and mountain passes with defensive attributes as source points, and calculated the MCR surface using cost-distance analysis on the comprehensive defense cost surface. Subsequently, equal-area and natural breaks classification was performed on the cumulative resistance values, and several levels with the lowest resistance were extracted as potential corridor zones with the highest defensive suitability. These zones were further simplified into a continuous defense corridor network through skeleton extraction and raster-to-line conversion. On this basis, to test the sensitivity of the predicted defense corridors to cost factor configuration and topographic data resolution, two sets of robustness experiments were designed: (1) With other settings unchanged, three types of key factors including visual line density, slope, and kernel density of military settlements, were removed respectively to reconstruct the comprehensive cost surface and corridor network; (2) The modeling process was repeated using 90 m resolution DEM to obtain resolution-controlled corridors. By comparing indicators such as the overlap ratio within the 1 km buffer zone, total length change, line position offset, and shape overlap degree under the 1 km buffer zone between each scenario and the 30 m DEM baseline corridor, the two sets of experiments evaluated the robustness of the model of parameter setting.

Results

Model performance evaluation and primary model selection

It can be seen from the boxplots in Fig. 8a–c that the overall performance fluctuation of each model across the five folds is relatively small, indicating stable results across different training-validation subsets. Under the same model, the AUC and PR-AUC of random stratified CV are generally slightly higher than those of the 7 km spatial block CV, while the Brier score is slightly lower (Fig. 9a–c). This reflects that in the presence of spatial autocorrelation, random partitioning will overestimate the model performance to a certain extent, and the spatial block CV is closer to real-world application scenarios. Therefore, the subsequent analysis mainly relies on the spatial CV results. When comparing the three models, regarding both AUC and PR-AUC, the boxplots of XGBoost under the two cross-validation methods are generally higher than those of LR and RF, with the smallest inter-fold dispersion (Fig. 9a, b). Meanwhile, XGBoost also achieves the lowest Brier score (Fig. 9c), followed by RF, while LR performs the worst with a larger variance under spatial CV. This indicates that tree-based models have significant advantages in handling nonlinear feature relationships and spatial heterogeneity, which is mutually corroborated by the mean values of indicators presented in Table 10.

**Fig. 9: Boxplots for performance comparison of the three models under spatial and random cross-validation.**

Further examination of the ROC curves, PR curves, and calibration curves under the 7 km spatial CV (Fig. 10a–c) verifies the aforementioned conclusions based on their overall shapes. The ROC curves of all three models obviously lie above the random baseline, among which the curves of XGBoost and RF are generally closer to the upper left corner of the graph, corresponding to higher AUC values (Fig. 10a). The PR curves indicate that while maintaining a high recall rate, XGBoost and RF can sustain higher precision (Fig. 10b). The calibration curves show that the predicted probabilities of the three models are generally close to the ideal diagonal line, with the XGBoost curve being the closest to the ideal line, which corresponds to the lowest Brier score (Fig. 10c). Comprehensively considering the boxplots (Fig. 9), ROC/PR/calibration curves (Fig. 10), and the summary of indicators (Table 9), under the data conditions and spatial validation framework of this study, XGBoost performs optimally in terms of accuracy, stability, and probabilistic interpretability. Therefore, it was selected as the primary model for subsequent cost factor contribution analysis and spatial prediction.

**Fig. 10: ROC, PR, and calibration curves of the three models under 7 km spatial block cross-validation.**

Table 9 Optimal parameter combinations and average performance indicators of the three models under spatial and random cross-validation

Full size table

Contribution analysis of cost factors based on SHAP

After determining the optimal XGBoost model, the SHAP method was further adopted for interpretive analysis of each cost factor,aiming to quantify the contribution of different environmental and defensive cost factors to the predicted probability of“potential defense corridors”. From the perspective of global importance ranking (Fig. 11a), Intvis_den and KD_military have the highest contribution degrees, accounting for approximately 23.5% and 20.8% of the total explanatory power respectively. This indicates that visual dominance and the agglomeration degree of military nodes are the primary factors controlling the route selection of defense corridors. TpI and Slope also show relatively high contribution degrees, accounting for 14.1% and 9.4% of the total explanatory power, which suggests that topographic relief and slope characteristics exert significant restrictive effects on the site selection of defense corridors. Betweennes and Dis_road contribute 9.1% and 8.8% respectively, reflecting the corresponding preferences of defense corridors for the location of connectivity hubs and traffic accessibility. The contribution degrees of the remaining factors are all below 5%, mainly playing a role in detailed adjustment.

**Fig. 11: SHAP global importance of cost factors and distribution of sample-level effects.**

The SHAP beeswarm plot (Fig. 11a) and single-factor curves (Figs. 12a–f and S2) further illustrate the nonlinear response characteristics and threshold results of key cost factors on the model output. A local analysis of the top six cost factors in terms of contribution degree reveals that the curves of Intvis_den and KD_military show a monotonically increasing trend, shifting from negative to positive at the thresholds (0.17 and 325 respectively) before stabilizing. This means that once the visual conditions and military defense agglomeration degree exceed this critical level, the location is determined to be within the corridor; otherwise, it is identified as outside the corridor. The TpI curve exhibits an obvious inflection point at the threshold of 6.72. When the value is lower than this threshold, it mostly shows a negative contribution, and gradually turns positive when exceeding this value. This indicates that valley floors and gentle slopes are not conducive to corridor layout, while corridors are more appropriately arranged along mountain ridges and shoulder slopes. The Slope curve reflects a site selection preference for moderate slopes, with a positive contribution within the threshold of 11.34. Beyond this slope gradient, the SHAP values show a decreasing trend, suggesting that excessively steep slopes significantly inhibit corridor route selection. Betweennes and Dist_road characterize the connectivity and accessibility effects respectively: raster cells exhibit a significant positive contribution only when the betweenness centrality is within the range of (less than 0.01 and greater than 0.03); Dist_road corresponds to a slight positive contribution when the value is within (less than 0.01 and greater than 0.14) (i.e., adjacent to roads), and rapidly turns negative as the distance increases. This indicates that defense corridors mainly rely on the existing transportation framework.

**Fig. 12: SHAP-LOWESS main effect curves and threshold characteristics of key cost factors.**

On this basis, the SHAP interaction matrix and interaction curves reveal the synergistic and inhibitory effects between key factors (Figs. 13 and S3). The values of Intvis_den, KD_military, TpI, Slope, Betweennes, and Dist_road on the diagonal of the interaction matrix are far higher than those of other factors, indicating that the model is mainly driven by the main effects of these variables. A small number of off-diagonal elements (such as Intvis_den-KD_military, KD_military-TpI, and Intvis_den-Betweennes) exceed 0.05, suggesting the existence of significant interactions. Furthermore, the analysis of several factor pairs with the highest interaction intensity shows that when both KD_military and TpI take high values, the SHAP value of KD_military is significantly higher than the contribution generated by the same kernel density under valley floor terrain. This indicates that topographic advantages can amplify the positive effect of military defense agglomeration. The interaction between KD_militar and Betweennes demonstrates that if a raster cell is simultaneously located at a hub position in the visibility network, the gain brought by high military defense kernel density is more pronounced. The interaction curve between Intvis_den and KD_military presents a typical “visibility × node density” synergy: when high visibility and high kernel density occur simultaneously, the improvement in predicted probability exceeds the simple superposition of their individual main effects. On the contrary, the interaction between Intvis_den and Rugged shows that in areas with excessively rugged terrain, even if the visibility conditions are favorable, their positive contributions will be partially offset, reflecting the fundamental constraint of passability on corridor formation. Overall, the spatial preferences of potential defense corridors are mainly dominated by SHAP main effects; the interaction effects of a few key factors perform fine-tuning on this, making the model results more consistent with defense logic and actual geographical conditions.

**Fig. 13: SHAP interaction matrix and interaction curves between cost factors.**

Defense corridor reconstruction and spatial patterns

Single-factor cost functions were constructed based on the SHAP main effect curves. During the process of piecewise linear interpolation, the continuous reclassification of original variables was realized, and their hierarchical order was automatically determined by the contribution magnitude of each value interval to the location within the defense corridor. A comprehensive cost surface was generated through weighted summation (Fig. 14). Furthermore, the study area was divided into defensive suitability levels, and the potential defense corridor network was identified.

**Fig. 14: Construction process of the comprehensive cost surface.**

Defensive suitability analysis

Based on the construction results of the comprehensive cost surface, the cost distance tool and natural breaks classification method in ArcGIS Pro were used to divide the study area into five levels: highly suitable, relatively suitable, moderately suitable, lowly suitable, and unsuitable areas (Fig. 15). Among them, the highly suitable areas are mainly distributed in zonal patterns along the dense belts of military settlements and the existing transportation framework, mostly located in valley-shoulder transition zones and ridge front edges. These areas are characterized by moderate topographic relief and favorable visibility conditions. Their spatial locations highly overlap with the high-value zones of kernel density and visibility density identified in the previous analysis, so they can be regarded as the main defense corridors connecting multiple satellite cities, fortresses, and mountain passes. The relatively suitable areas are mostly distributed in zonal or patchy patterns surrounding the main corridors. On one hand, they provide buffer and redundant corridors for the highly suitable areas; on the other hand, they form a secondary connectivity network together with secondary settlements and post road nodes. The moderately suitable areas have the widest distribution in the study area, mainly located between different corridors and in hinterland regions, and primarily serve as potential mobile and supply corridors in the defense system.

**Fig. 15: Generation process of defense corridors.**

The lowly suitable areas mostly correspond to regions with fragmented terrain, steep slopes, or significantly obstructed visibility. The unsuitable areas are concentrated in high-altitude mountainous areas and the outermost topographic barriers. Their comprehensive resistance is much higher than the global average, functioning more as natural barriers rather than priority corridor spaces for defense construction.

Defense corridor prediction and spatial classification

Further, potential defense corridors were generated using cost path analysis based on the cumulative resistance surface (Fig. 15c), serving as the potential structural framework connecting military defense sites. The spatial scale of defense corridors directly affects the actual effectiveness of their defensive functions. Referring to the spatial zoning methods for predicted corridors in existing studies^46,47, nearest neighbor analysis was adopted to determine the spatial scale threshold of major defense corridors. The results show that within the 2-km buffer zone of the corridor centerlines, the defense corridors can cover approximately 72% of the military defense sites (181 sites in total); when the buffer radius is expanded to 5 km, the coverage rate increases to about 83% (232 sites in total). Based on this, according to the spatial relationship between the sites and the corridors, this study divided the corridors into three functional levels using buffer radii of 2 km, 5 km, and 10 km respectively. These levels define the core control zones, radiation coverage zones, and peripheral impact belts of the corridors, thereby more accurately characterizing their spatial organizational logic and the boundaries of defensive capabilities (Fig. 16a). The core control zones (0–2 km) represent the primary defensive level, featuring the highest settlement density and defensive efficiency. The radiation coverage zones (2–5 km) provide redundancy and linkage for the main corridors, forming defensive resilience. The peripheral impact belts (5–10 km) cover scattered individual sites, mostly reflecting the diminishing defensive effect of spatial extrapolation. Finally, the suitable width range for the defense corridors of Yuxian during the Ming Dynasty was determined to be 5–7 km, which covers 83–94% of the heritage sites with a total area of approximately 368.92 square kilometers.

**Fig. 16: Distribution characteristics of topographic factors of predicted defense corridors.**

Spatial coupling between defense corridors, natural terrain, transportation, and the Great Wall system

First, this study examined the relationship between defense corridors and topographic factors such as elevation, slope, and curvature from the perspective of natural terrain, and calculated the length proportion of the predicted defense corridors across five-level terrain intervals. The results show that the defense corridors are generally significantly concentrated in areas of moderate elevation, gentle to moderate slopes, and ridge shoulder zones with moderate topographic relief. In contrast, their length proportion in terrain units such as high-altitude steep slopes and valley floors is relatively low (Fig. 16). This indicates that natural terrain exerts an obvious fundamental restrictive effect on the route selection of defense corridors.

Furthermore, the linear nearest neighbor ratio and traversal rate through high defensive suitability areas were adopted to analyze the spatial relationship between defense corridors, transportation corridors, and the Great Wall System. The linear nearest neighbor ratio is used to measure the spatial alignment degree between the two. Buffers of 500 m and 1000 m were established around transportation corridors respectively to calculate the proportion of overlapping segments with defense corridors. The results show that the total length of the predicted defense corridors is approximately 515 km, among which 65.65% (about 337.96 km) are closely adjacent to transportation corridors within 500 m. When the buffer range is expanded to 1000 m, 79.56% (about 409.57 km) of the defense corridors achieve spatial coupling with transportation corridors (Fig. 17b). The proportion of post roads traversing high defensive suitability areas reaches 66.78% (Fig. 17c). Based on a comprehensive analysis of these two indicators, it can be concluded that defense corridors exhibit a relatively obvious spatial alignment with the existing transportation framework in site selection, while still retaining a considerable number of defensive routes deviating from transportation corridors. This indicates that although both are constrained by geographical factors, they differ in cost composition and functional orientation⁴⁸. The analysis suggests that compared with defense corridors, transportation corridors place greater emphasis on low cost and high efficiency for material and military transportation, and their costs are mainly determined by resistance factors affecting transportation, such as road conditions, slope gradient, and valley passability. In contrast, defensive military facilities (such as beacon towers and mountain passes) focus more on visibility control over key passages and border areas, and their layout logic is dominated by visibility accessibility, occupation of commanding heights, and interlocking surveillance relationships among nodes⁴⁹.

**Fig. 17: Spatial coupling relationships between defense corridors, post roads, and the Great Wall System in Yuxian.**

Considering that the Great Wall was mostly built along ridge lines, the 3-km zone on both sides of it concentrated most military garrisons and pass nodes⁵⁰, which can be regarded as the primary defensive zone along the border walls. Based on this, a 3 km buffer zone was established around the Great Wall line to measure the spatial correspondence between defense corridors and the Great Wall defense line (Fig. 17d). The results show that the length of defense corridors located within the 3 km buffer zone of the Great Wall line is 121.89 km, accounting for 23.68%. The proportion of the Great Wall line traversing highly suitable defensive zones reaches 58.37%. These results indicate that defense corridors tend to spread out in hinterland mountain valleys and river valley corridors, serving the functions of in-depth connection and mobile deployment. Approximately half of the Great Wall segments are located in highly suitable defensive zones, reflecting that the route selection of the Ming Dynasty border walls was highly dependent on spatial locations with favorable visibility conditions, dense military settlements, and advantageous terrain. To a certain extent, this result corroborates that the Great Wall and pass system played a skeletal role in the peripheral defense corridors of Yuxian.

Robustness and sensitivity analyses

To verify the sensitivity of the predicted results of defense corridors to cost factor selection and topographic data resolution, this study designed two sets of robustness experiments based on the benchmark model. A uniform set of linear indicators was adopted for comparison, including total corridor length, overlap ratio of the 1 km buffer zone with the baseline (ovlp_base), symmetric mean offset (mean_off), and approximate Hausdorff distance (Table 10). Among these indicators, both mean_off and Hausdorff distance were subjected to equidistant sampling with an interval of 250 m.

Table 10 Results of ablation and sensitivity experiments

Full size table

For the cost factor ablation experiments, three types of key factors—visibility density (noVIS), military settlement kernel density (noKED), and slope (noSLOPE)—were removed respectively while keeping other settings unchanged, and the comprehensive cost surface and minimum cost corridor network were reconstructed accordingly. Compared with the benchmark model, the total length of corridors in the three ablation scenarios exhibited a small range of variation. Approximately 36% of the ablation corridors overlapped with the baseline corridors within the 1 km buffer zone, indicating that the spatial trends of some major corridors remained consistent. The results of mean_off and Hausdorff distance demonstrated that obvious offsets had occurred in the locations and trends of local corridors. Among them, the scenario with visibility density removed experienced the largest decrease in the overlap ratio with the baseline. This indicates that visibility conditions exerted the most significant controlling effect on the corridor trends, while the impacts of slope and settlement density could be partially compensated by other topographic factors to a certain extent.

To evaluate the impact of topographic data resolution on the identification of defense corridors, while keeping the form of cost functions and non-topographic factors unchanged, a new set of cost surfaces and corridor networks was reconstructed using topographic factors derived from the 90 m DEM (Fig. 18d). The results show that under the 90 m DEM scenario, the total length of the corridors was slightly higher than that of the benchmark model. The overlap ratio with the 1 km buffer zone of the baseline was approximately 32%, with the mean_off and Hausdorff distance being 2.58 km and 22.75 km respectively. Compared with the cost factor ablation experiments, the offset amplitude increased slightly. This indicates that the reduction in topographic data resolution locally amplified the detailed differences of the corridors, but the overall corridor framework still remained within the spatial framework of the same scale as the baseline.

Discussion

Taking the military defense system of Yuxian in the Ming Dynasty as a case study, this research integrated data on military settlements, the Great Wall border walls, ancient transportation, and multi-source topographic data. It combined cost functions extracted by explainable machine learning with the MCR model to construct a data-driven comprehensive defense cost surface and a multi-level defense corridor network. The results show that the defense system of Yuxian formed a corridor pattern composed of three components: peripheral corridors distributed along the Great Wall and passes, in-depth corridors connecting passes, satellite cities, and county seats, and composite corridors highly coupled with ancient post roads. Overall, this structure is highly consistent with the reconstructed ancient post roads. The research effectively reveals the coupling relationships among military nodes, topographic patterns, and transportation networks, providing a transferable methodological template for the identification of defense corridors and the spatial planning of border defense heritage in other border prefectures along the Nine Frontiers of the Ming Dynasty.

First of all, the peripheral corridors roughly parallel to the Great Wall and major passes vividly embody the border defense logic of “taking the Great Wall as the boundary and passes as the gates” under the Ming Dynasty’s Nine Frontiers system. Distributed mainly along river valley passes and low saddles, this type of corridor closely adjoins the Great Wall border walls and frontline passes, controlling potential northward invasion routes. By overlapping with high-density areas of military settlements and highly suitable defensive zones, it intercepts potential threats beyond the county boundary, forming a peripheral defense line with a blocking function. This spatial pattern reflects the dual constraints of topographic constriction zones and the layout of strategic nodes: the Great Wall and passes provide an institutionalized boundary framework, while river valley mouths and saddles offer low-cost and defensible topographic corridors.

Second, the in-depth corridors extending northward and southward from passes and connecting satellite cities with county seats reflect a highly organized vertical linkage mechanism. In this system, passes are responsible for controlling strategic chokepoints, military garrisons undertake military force assembly and emergency response, and county seats assume the functions of financial supply and administrative integration. This enables military forces to achieve rapid reinforcement, rotational defense, and strategic retreat between the border walls and the hinterland. Furthermore, topographic elevation differences and slopes not only affect accessibility but also amplify the strategic value of areas along ridge lines and ridge shoulders. They transform mountain terrain differences into in-depth defensive advantages, thereby constructing vertical defense corridors with redundancy and maneuvering space.

Furthermore, the paths that diagonally cut through the piedmont plains, locally form polygonal segments, and highly overlap with ancient post roads do not imply complete isomorphism between defense routes and the transportation framework. Instead, they reflect a synergistic trade-off between defensive demands and accessibility efficiency. In route selection, defense corridors rely more on visibility and commanding heights, emphasizing line-of-sight accessibility, interlocking surveillance among nodes, and terrain overlooking relationships. By contrast, transportation routes attach greater importance to the constraints of slope gradient, topographic relief, and water systems on transportation costs. At key topographic locations such as mountain passes and river valley corridors, the two often exhibit high coupling on a large scale. However, local route adjustments are made to meet surveillance and blocking requirements, forming a composite corridor structure “with accessibility as the framework and visibility control as the supplementary enhancement”. This spatial organization of embedding defensive functions into transportation corridors embodies the compromise and coordination between military defense deployment and transportation organization in Yuxian.

However, this research still has several limitations that need to be further addressed in future studies.

Data and sample limitations: The spatial locations of military defense settlements, fortresses, and passes are primarily derived from the comprehensive registration of historical maps in local gazetteers, the Great Wall heritage database, and UAV aerial survey results. Inevitably, there exist uncertainties in their temporal attributes and coordinate accuracy. To mitigate these issues, this paper employs single-point confidence assessment and confidence-weighted kernel density for sample selection and correction. Additionally, independent terrain and military defense indicators are introduced in candidate boundary verification to minimize the impact of incomplete historical records on the results. Despite these efforts, it remains difficult to completely eliminate the interference of age misplacement or location offset of individual points on cost modeling and corridor identification.

Temporal and institutional limitations: The comprehensive cost surface is constructed based on the “average defensive logic” of Yuxian during the Ming Dynasty over a relatively long time period, with the default assumption that the military defense layout and frontier governance maintained a certain degree of stability overall. This study has not yet refined the distinction between the phased impacts of adjustments in the focus of border defense, the disbandment of military garrisons, and changes in the military and administrative system after the Chenghua to Jiajing reign periods on defense corridors. Case studies of war history and historical records on military force deployment are only used as contextual materials for qualitative verification. Therefore, the depiction of the evolution of the defensive pattern in different periods still remains at a holistic and static inferential level.

Methodological and path framework limitations: In terms of methodology and path framework, this study employs XGBoost and SHAP main effect curves to construct single-factor cost functions, and forms a comprehensive cost surface through piecewise linear interpolation and equal-weighted superposition. This approach strikes a balance between interpretability and operability, but the interaction effects between variables and difficult-to-quantify social factors such as institutional systems, population, and finance have not yet been incorporated into the defense model. Meanwhile, we adopt the MCR model to identify defense corridor zones, without further extracting a single least cost path network. This is more consistent with the spatial characteristics of the defense system, such as multi-path redundancy and zonal deployment. However, it also means that this study focuses on the corridor-level suitability pattern rather than accurately reconstructing specific historical military march routes. For future research, based on the same comprehensive cost surface, a variety of path models such as the LCP and Circuit theory can be introduced to conduct more detailed comparison and verification of the relationships among “main corridors–suboptimal path families–actual post roads”. Additionally, the impact of different model assumptions on the reconstruction results of defense corridors should be systematically evaluated.

Data availability

Publicly available datasets used in this study are available from the sources cited in the Methods section. The minimal derived dataset necessary to interpret, replicate, and build upon this study (including sample points with extracted predictor values, processed predictor rasters, the derived defence cost surface, reconstructed corridor outputs) and the analysis scripts are available from the corresponding author upon reasonable request.

Change history

31 March 2026
In the original version of this article, the in-text citations for Figs. 5 and 6 were incorrect. These have now been corrected.

References

Yan, R. & Zhao, M. Analysis on site determination and spatial layout of ancient fortress-villages in Yu County, Hebei Province, China. Int. J. Sustain Dev. World Ecol 27, 284–292 (2020).
Article CAS Google Scholar
Yuan, L., Li, Z., Wang, Y., Hao, Z. & Yu, C. Interactions between the Ming Yansui Great Wall heritage and geographical environment via Monte Carlo simulation. npj Herit. Sci. 13, 260, https://doi.org/10.1038/s40494-025-01816-6 (2025).
Article Google Scholar
Cao, Y. & Zhang, Y. The fractal structure of the Ming Great Wall Military Defense System: a revised horizon over the relationship between the Great Wall and the military defense settlements. J. Cult. Herit. 33, 159–169, https://doi.org/10.1016/j.culher.2018.03.010 (2018).
Article Google Scholar
Shang, H., Cheng, C. & Guan, Q. Mingqing yilai Yuxian zhuangbao simiao diaocha yu yanjiu [Survey and research on fortress villages and temples in Yu County since the Ming-Qing period] (Shanghai Guji Press, 2023). [in Chinese]
Yan, R. A study of the spatial forms of traditional fortress settlements in Yu County, Hebei Province. PhD thesis, Beijing Forestry Univ. (2023). [in Chinese]
Zhao, X. Ming changcheng shidai de kaishi: Changcheng shehui shi shiye xia Yulin changcheng xiujian yanjiu [The beginning of the Ming Great Wall era: a study of the construction of the Yulin Great Wall from the perspective of Great Wall social history] (Lanzhou Univ. Press, 2014). [in Chinese]
Huang, Y. et al. Spatial evolution of traditional waterside settlements south of the Yangtze River and the distribution of settlement heritage: evidence from the Nanxi River Basin. npj Herit. Sci. 13, 62 (2025).
Article Google Scholar
Huang, Y. et al. Interpretation of the Jiangnan landscape and countryside (Shan-shui) pattern: evidence from the classification and spatial form of traditional settlements in the Nanxi River Basin. Buildings 15, 413 (2025).
Article Google Scholar
Zhou, J., He, J., Tan, L. & Yan, S. Integrity protection of the Chang Zhen Great Wall heritage corridor based on minimum cumulative resistance. Herit. Sci. 13, 479, https://doi.org/10.1038/s40494-025-02044-8 (2025).
Article Google Scholar
Lin, F., Zhang, X., Ma, Z. & Zhang, Y. Spatial structure and corridor construction of intangible cultural heritage: a case study of the Ming Great Wall. Land 11, 1478, https://doi.org/10.3390/land11091478 (2022).
Article Google Scholar
Danese, M., Masini, N., Biscione, M. & Lasaponara, R. Predictive modeling for preventive archaeology: overview and case study. Cent. Eur. J. Geosci. 6, 42–55, https://doi.org/10.2478/s13533-012-0160-5 (2014).
Article Google Scholar
Carrer, F. An ethnoarchaeological inductive model for predicting archaeological site location: a case-study of pastoral settlement patterns in the Val di Fiemme and Val di Sole (Trentino, Italian Alps). J. Anthropol. Archaeol. 32, 54–62, https://doi.org/10.1016/j.jaa.2012.10.001 (2013).
Article Google Scholar
Zhu, J., Nie, Y., Gao, H., Liu, F. & Yu, L. GIS-based visibility network and defensibility model to reconstruct defensive system of the Han Dynasty in Central Xinjiang, China. ISPRS Int J. Geo. Inf. 6, 247, https://doi.org/10.3390/ijgi6080247 (2017).
Article Google Scholar
Luan, S., Li, Y. & Ma, X. Defense layout characteristics of Tang dynasty military sites in Aksu prefecture of China. npj Herit. Sci. 13, 125, https://doi.org/10.1038/s40494-025-01684-0 (2025).
Article Google Scholar
Menze, B. H. & Ur, J. A. Mapping patterns of long-term settlement in Northern Mesopotamia at a large scale. Proc. Natl. Acad. Sci. USA 109, E778–E787, https://doi.org/10.1073/pnas.1115472109 (2012).
Article PubMed PubMed Central Google Scholar
Castiello, M. E. & Tonini, M. An explorative application of random forest algorithm for archaeological predictive modeling: a Swiss case study. J. Comput. Appl. Archaeol. 4, 110–125, https://doi.org/10.5334/jcaa.71 (2021).
Article Google Scholar
Sun, D. et al. Ecological security pattern based on XGBoost-MCR model: a case study of the Three Gorges Reservoir Region. J. Clean. Prod. 470, 143252. https://doi.org/10.1016/j.jclepro.2024.143252 (2024).
Article Google Scholar
Zhao, X. H. A study of the military town system of the Nine Frontiers in the Ming dynasty [Doctoral dissertation]. Northeast Normal University: Changchun, 2005).
Google Scholar
Geospatial Data Cloud [Internet]. Beijing: Chinese Academy of Sciences; [cited 2025 Nov 7]. Available from: http://www.gscloud.cn/.
Tan, Q. Zhongguo lishi dituji, Vol. 7: Yuan, Ming shiqi [Historical atlas of China, Vol. 7: Yuan and Ming periods] (Ditu Chubanshe, 1982). [in Chinese]
CHGIS. China Historical GIS, Version 6. Cambridge: Harvard Yenching Institute; Shanghai: Fudan University Center for Historical Geography. Available online: https://verse.harvard.edu/dataverse/chgis (2016).
China Great Wall Heritage Network [Internet]. Beijing: Chinese Academy of Cultural Heritage; c2016–2025 [cited 2025 Nov 7]. Available from: http://www.greatwallheritage.cn/.
Guo, L. (ed.) Guangling Xianzhi 廣靈縣志 (Gazetteer of Guangling County) 1754, Zhongguo Fangzhi Congshu (Cheng Wen Publishing House, 1968).
Yue, H. (ed.) Lingqiu Xianzhi 靈丘縣志 (Gazetteer of Lingqiu County) 1684, Zhongguo Difangzhi Jicheng (Phoenix Publishing House, 2005).
Wang Y. (ed.) Yuxian Zhi 蔚縣志 (Gazetteer of Yu County) 1739, Zhongguo Fangzhi Congshu (Cheng Wen Publishing House, 1968).
Zhao Y. (eds). Guangchang Xianzhi 廣昌縣志 (Gazetteer of Guangchang County) 1760, Gugong Zhenben Congkan (Hainan Publishing House, 2001).
International Organization for Standardization. ISO 19157:2013 Geographic information — Data quality (ISO, 2013).
Zhang, T. Y. et al. (ed.) Ming shi 明史 (History of the Ming dynasty) 1735, (Zhong Hua Book Company, 2005).
Scott, J. Social network analysis: a handbook, 3rd edn. (Sage Publications, 2012).
Wasserman, S. & Faust, K. Social network analysis: methods and applications (Cambridge University Press, 1994).
Earley-Spadoni, T. Landscapes of warfare: intervisibility analysis of Early Iron and Urartian fire beacon stations (Armenia). J. Archaeol. Sci. Rep. 3, 22–30, https://doi.org/10.1016/j.jasrep.2015.05.008 (2015).
Cao, Y., Zhang, Y. & Zhang, H. Road restoration based on GIS for the Ming Dynasty Great Wall military atlases: a case study of Datong Garrison. J. Hebei Agric Univ. 37, 138–144, https://doi.org/10.13320/j.cnki.jauh.2014.0053 (2014).
Article Google Scholar
De Reu, J. et al. Measuring the relative topographic position of archaeological sites in the landscape: a case study on the Bronze Age barrows in northwest Belgium. J. Archaeol. Sci. 38, 3435–3446, https://doi.org/10.1016/j.jas.2011.08.005 (2011).
Article Google Scholar
Siart, C., Eitel, B. & Panagiotopoulos, D. Investigation of past archaeological landscapes using remote sensing and GIS: a multi-method case study from Mount Ida, Crete. J. Archaeol. Sci. 35, 2918–2926, https://doi.org/10.1016/j.jas.2008.06.013 (2008).
Article Google Scholar
Marsh, E. J. & Schreiber, K. Eyes of the empire: a viewshed-based exploration of Wari site-placement decisions in the Sondondo Valley, Peru. J. Archaeol. Sci. Rep. 4, 54–64, https://doi.org/10.1016/j.jasrep.2015.08.014 (2015).
Article Google Scholar
Dormann, C. F. et al. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36, 27–46, https://doi.org/10.1111/j.1600-0587.2012.07348 (2013).
Article Google Scholar
Zuur, A. F., Ieno, E. N. & Elphick, C. S. A protocol for data exploration to avoid common statistical problems. Methods Ecol. Evol. 1, 3–14, https://doi.org/10.1111/j.2041-210X.2009.00001.x (2010).
Article Google Scholar
O’Brien, R. M. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 41, 673–690, https://doi.org/10.1007/s11135-006-9018-6 (2007).
Article Google Scholar
Hemphill, J. F. Interpreting the magnitudes of correlation coefficients. Am. Psychol. 58, 78–79, https://doi.org/10.1037/0003-066X.58.1.78 (2003).
Article PubMed Google Scholar
Gu, Z., Shi, H. (ed.) Du Shi Fang Yu Ji Yao 讀史方輿紀要 (Essentials of Reading History and Geography), (Zhong Hua Book Company, 2005).
Yu, C. (ed.) Ji Fu Tong Zhi 畿輔通志 (Gazetteer of the Metropolitan Region) 1885, Jingjinji Jifu Wenxian Congkan (Beijing Yanshan Publishing House, 2019).
Liu, R. (ed.) Guangchang Xianzhi 廣昌縣志 (Gazetteer of Guangchang County) 1875, Zhongguo Fangzhi Congshu (Cheng Wen Publishing House, 1968).
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017).
Google Scholar
Shapley L. S. A value for n-person games. in Contributions to the Theory of Games (eds Kuhn H. W., Tucker A. W.) (Princeton University Press, 1953).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67, https://doi.org/10.1038/s42256-019-0138-9 (2020).
Article PubMed PubMed Central Google Scholar
Liu, Y. & Ma, X. Heritage corridor construction in the Sui–Tang Grand Canal’s Henan section based on the minimum cumulative resistance (MCR) model. Land 14, 2128, https://doi.org/10.3390/land14112128 (2025).
Article Google Scholar
Zhang, F. et al. Recreational spatial scope and level of the grand canal cultural belt. Area. Res. Dev 38, 80–84 (2019).
Google Scholar
Tan, L. et al. Influence of environmental factors on the site selection and layout of ancient military towns (Zhejiang Region). Sustainability 14, 2572, https://doi.org/10.3390/su14052572 (2022).
Article Google Scholar
Li, X. et al. Research on the construction of intangible cultural heritage corridors in the Yellow River Basin based on geographic information system (GIS) technology and the minimum cumulative resistance (MCR) model. Herit. Sci. 12, 271, https://doi.org/10.1186/s40494-024-01387-y (2024).
Article CAS Google Scholar
Jiang, B. et al. The distribution pattern and spatial morphological characteristics of military settlements along the Ming Great Wall in the Hexi Corridor Region. Buildings 15, 1136, https://doi.org/10.3390/buildings15071136 (2025).
Article Google Scholar

Download references

Acknowledgements

Thanks to all team members who participated in field surveys and data collection, especially the local cultural relics departments of Yuxian for their support. This research was funded by the Major Project of the Special Program for Excellent Traditional Chinese Culture of the Ministry of Education (Category A) (Ni Shan World Confucianism Center/China Confucius Foundation Project): “Research on the Protection and Development of Traditional Fortress Settlements in Yuxian from the Perspective of Rural Revitalization” (Gran No. 23JDTCZ002).

Author information

These authors contributed equally: Peiyuan Zhao, Qing Xue.

Authors and Affiliations

College of Landscape Architecture, Beijing Forestry University, Beijing, China
Peiyuan Zhao, Qing Xue, Han Qin, Ming Zhao & Kun Yuan
Museum of Beijing Forestry University, Beijing, China
Rong Yan

Authors

Peiyuan Zhao
View author publications
Search author on:PubMed Google Scholar
Qing Xue
View author publications
Search author on:PubMed Google Scholar
Han Qin
View author publications
Search author on:PubMed Google Scholar
Ming Zhao
View author publications
Search author on:PubMed Google Scholar
Rong Yan
View author publications
Search author on:PubMed Google Scholar
Kun Yuan
View author publications
Search author on:PubMed Google Scholar

Contributions

Methodology, software, formal analysis, investigation, data curation, visualization, writing–original draft, writing–review & editing, P.Z.; Investigation, resources, data curation, writing–original draft (introduction and historical literature), writing–review & editing, Q.X.; Writing–review & editing (translation and language polishing), H.Q.; Conceptualization, methodology, supervision, writing–review & editing, funding acquisition, Z.M.; Writing–review & editing, R.Y.; Writing–review & editing, K.Y. All authors have read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Ming Zhao or Kun Yuan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, P., Xue, Q., Qin, H. et al. Machine-learning-based reconstruction of Ming-dynasty defensive corridors in Yuxian. npj Herit. Sci. 14, 225 (2026). https://doi.org/10.1038/s40494-026-02438-2

Download citation

Received: 20 June 2025
Accepted: 06 March 2026
Published: 09 April 2026
Version of record: 09 April 2026
DOI: https://doi.org/10.1038/s40494-026-02438-2

Abstract

Similar content being viewed by others

Temporal and spatial distribution characteristics of the Ming Great Wall

Integrity protection of the Chang Zhen Great Wall heritage corridor based on minimum cumulative resistance

Resource supply and demand model of military settlements in the cold weapon era: case of Zhenbao Town, Ming Great Wall

Introduction

Methods

Research area

Data sources

Research framework and technical route

Preliminary spatial analysis and input construction

Rationality verification of military defense settlement sites and historical boundaries

Spatial pattern Analysis

Construction of visual network and extraction of visual distance control factors

Reconstruction of ancient post road network and construction of transportation cost factors

Construction of the defensive cost model

Selection of cost factors

Screening of correlation and multicollinearity among cost factors

Stratified sampling strategy

Machine learning modeling

SHAP interpretability analysis

Defense corridor modeling based on MCR

Results

Model performance evaluation and primary model selection

Contribution analysis of cost factors based on SHAP

Defense corridor reconstruction and spatial patterns

Defensive suitability analysis

Defense corridor prediction and spatial classification

Spatial coupling between defense corridors, natural terrain, transportation, and the Great Wall system

Robustness and sensitivity analyses

Discussion

Data availability

Change history

31 March 2026

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links