Introduction

From a design perspective, landscapes are three-dimensional constructions that evolve over time, involving the articulation of abstract notions into physical structures1. These structures warrant analysis through visual appreciation, particularly examining the inherent attributes and qualities manifest in the composition and configuration of spatial elements, collectively termed “spatial visual characteristics”2,3. These characteristics encompass two fundamental dimensions: first, “what exists” referring to the presence, quantity, distribution, and proportion of spatial elements; and second, “how they are arranged” encompassing the complexity of arrangement, position, orientation, and form4,5.

Viewed through this lens, traditional Chinese gardens, a type of landscape architecture, are not as random and loose as they seem; rather, their spatial visual characteristics are carefully orchestrated to create physical structures with highly complex spatial hierarchies from multiple perspectives6,7,8. Building on these characteristics, prominent landscape architects and scholars such as Peng Yigang, Zhou Weiquan and Pan Guxi have identified recurring patterns of spatial composition and configuration, which they termed “scenic archetypes” in traditional Chinese vocabulary and concepts9,10,11. Scenic archetypes translate specific combinations of spatial visual characteristics into culturally embedded organizational principles that guide the arrangement of space to produce distinctive aesthetic experiences12. The influence of these archetypes extends far beyond regional boundaries. Historical evidence demonstrates that the spatial logic they embody has profoundly shaped garden design across cultural contexts, from East Asia to Europe, with particularly notable impacts in Japan, South Korea, and the United Kingdom13. This transregional significance is further reflected in the international recognition of several representative gardens, including the Humble Administrator’s Garden, the Lingering Garden, and Hangzhou Westlake, all inscribed on the UNESCO World Heritage List in acknowledgment of their enduring design wisdom and innovation13,14,15. In parallel with this international visibility, recent research on landscapes and historic gardens has increasingly emphasized a more precise understanding of spatial organization, particularly as represented by the spatial visual characteristics of scenic archetypes, whose underlying principles have influenced landscape design at broader scales16.

However, previous studies on scenic archetypes remains predominantly grounded in experiential descriptions, which limits their potential for systematic knowledge transfer and methodological development in heritage conservation. Early scholars often characterized these archetypes through poetic metaphors such as “scenes changing as steps move (步移景异)”, “perceiving the vast in the small (小中见大)” and “winding paths leading to secluded spots (曲径通幽)”9,17. While such expressions convey the phenomenological essence, they represent personal interpretation rather than spatial attributes that can be objectively described and consistently verified by different observers. More recent studies have sought to document the spatial visual characteristics of scenic archetypes with greater precision: Wang18 investigated the specific “size” and “scale” of windows and doors in framed scenery; Tong19 utilized binary oppositions such as “sparse-dense” and “tortuous-straight” to analyze the spatial configurations of obstructive and framed scenery at the Lingering Garden’s entrance. Nevertheless, as Liu3 observes, even these refined vocabularies suffer from terminological inconsistencies and a lack standardization, with identical terms interpreted variably across studies, thus hindering systematic analysis.

To overcome the limitations of purely descriptive approaches, recent studies have introduced quantitative and computational methods, essentially mapping techniques, to measure spatial visual characteristics with greater precision1,3,20. As the fundamental means of capturing, analyzing, and communicating such characteristics, mapping visualizes abstract spatial knowledge and integrates the spatial organization of landscape spaces in both qualitative and quantitative terms21, embodying what Corner22 describes as “ways of seeing” that construct new realities rather than merely recording existing ones. For example, Zhou et al.23 used angle segment and visibility graph analyses to quantify Daming Temple’s mean depth, connectivity and intelligibility. Chen and Yang24 linked narratology with VGA and isovist analysis to calculate integration and connectivity in the Humble Administrator’s Garden. Zhang et al.25 employed DepthmapX for convex, axial and visibility analyses, deriving depth, connectivity, isovist area and integration at West Shu Garden. Chen et al.26 combined space‑syntax indicators (connectivity, step depth, integration) with DBSCAN clustering to extract five indicators: permeability, curvature, visibility, accessibility and differentiation. While these methods offer rigorous tools for measuring individual spatial visual characteristics, they lack the capacity to capture how such characteristics integrate into the coherent spatial gestalts that define scenic archetypes. Addressing this gap calls for approaches capable of linking detailed element-level detection with the holistic perception of design patterns, and one promising direction lies in the use of advanced computer vision mapping. Today’s computer vision technologies represent a transformative expansion of these capabilities: semantic segmentation precisely identify spatial elements across thousands of images27,28, depth estimation reveals three-dimensional relationships from photographs29,30, and cutting-edge methods like scale-invariant segmentation31, super-resolution mapping32, and hyperspectral detection33 push analytical boundaries even further. Yet a critical limitation still persists: though these technologies excel at element detection, they cannot recognize how elements combine to form scenic archetypes. A wall, for example, may be identified simply as a wall, without understanding its role as the defining “frame” in framed scenery. This disconnect between physical detection and design pattern recognition necessitates developing systematic methods that encode the compositional and configurational logic of scenic archetypes in traditional Chinese gardens.

Above all, understanding the spatial visual characteristics of scenic archetypes is essential for explaining how culturally specific design principles are materialized in physical space and for supporting their conservation and contemporary application. However, existing studies remain largely descriptive or focus on isolated attributes, lacking a systematic framework that can both depict these characteristics holistically and interpret them in a replicable way. This study addresses this gap through two research objectives. First, it establishes an analytical framework that systematically transforms the spatial visual characteristics of scenic archetypes from cultural concepts into measurable attributes enabling holistic representation and verifiable interpretation. Second, it develops an AI-based multimodal mapping methodology that integrates semantic precision with quantitative rigor, ensuring that experiential understanding is preserved while producing measurable, comparable outputs. Using the Hangzhou Westlake (HWL) as a representative case that embodies the full complexity of scenic archetypes, the study applies the proposed framework to translate tacit design knowledge into explicit, measurable guidelines. The resulting outputs provide a basis for evidence-based conservation strategies and inform contemporary design practices, thereby advancing both the theoretical understanding and the practical implementation of scenic archetype analysis.

Methods

Theoretical foundation and framework overview

To transform scenic archetypes into their constituent spatial visual characteristics, this study first integrates semiotic and phenomenological theories to structure the decomposition process. In semiotics, Peirce’s34 triadic relationship among sign, object, and interpretant provides a conceptual structure for explaining how meaning is generated and interpreted in relation to physical form35. The phenomenological perspective complements this by explaining the cognitive universality of spatial perception. According to Kant, space constitutes “the subjective condition of sensibility” through which “outer intuition is possible for us”36. This conceptualization suggests that spatial perception represents an a priori form of human cognition, implying fundamental universality in how individuals perceive and comprehend spatial configurations. Such universality provides crucial justification: if spatial perception follows common cognitive patterns, these patterns can be systematically identified and interpreted. Merleau-Ponty37 extends this understanding by emphasizing the dialectical relationship between individual exploration and sensorial responses. His work indicates that spatial experience, while transcending pure subjectivity, emerges from observable interactions between humans and their physical environment. Furthermore, several established analytical frameworks have demonstrated the feasibility of translating broad conceptual categories into detailed spatial-visual measures, including Tveit’s38 landscape visual characterization scheme, Bell’s2 framework for aesthetic structure, and Liu’s3 landscape design syntax. These approaches show how abstract design principles can be systematically translated into concrete spatial attributes through multiple interpretive levels, creating a foundation for developing measurable indicators.

Building upon these theoretical and methodological precedents, the proposed framework establishes four hierarchical tiers: the abstract concept level (scenic archetypes) corresponds to the intentional level of meaning; the dimensional level (spatial layers) reflects phenomenological modes of perception; the attribute level (variables) represents concrete spatial manifestations; and the measurable indicator level (metrics) corresponds to identifiable and quantifiable physical attributes (Fig. 1). The first tier establishes scenic archetypes as umbrella concepts that extract recurring design patterns from Jiangnan garden-making techniques, as documented in foundational studies9,17,39,40,41,42. These archetypes represent the highest level of abstraction, encapsulating centuries of accumulated design wisdom. The second tier introduces spatial layers, which deconstruct each archetype into foreground, middle ground, and background components from the observer’s horizontal perspective1,3,43. This tripartite division reveals a critical mechanism: the foreground functions as a mediating element that transforms middle ground and background components from isolated objects into integrated scenic compositions. Significantly, each archetype manifests a distinctive spatial organization pattern that becomes most apparent through foreground characteristics, establishing these as primary indicators for archetype identification. The third tier operationalizes visual characteristics through variables, categorizing perceptual qualities within each spatial layer into four measurable physical attributes. These variables, shape, position, size, and texture were strategically selected from Bell’s2 comprehensive inventory of eleven variables based on their capacity to capture essential distinguishing characteristics while minimizing subjective observer bias. Each variable serves a specific analytical function: shape delineates archetypal boundaries and defines spatial enclosure; position reveals hierarchical spatial relationships and compositional strategies; size indicates visual prominence and establishes perceptual hierarchy; and texture differentiates surface treatments and material qualities. The fourth and final tier implements metrics through three complementary quantification methods: distribution trend analysis, absolute value measurement, and relative relationship assessment. These methods collectively transform traditionally experiential interpretations of spatial visual characteristics into verifiable, reproducible data analyses applicable across all spatial layers. This quantitative approach maintains analytical rigor while preserving the phenomenological richness inherent in scenic archetype appreciation.

Fig. 1: Framework of scenic archetype-spatial layer-characteristic-metric.
Fig. 1: Framework of scenic archetype-spatial layer-characteristic-metric.The alternative text for this image may have been generated using AI.
Full size image

This diagram illustrates the four-tier hierarchy that transforms abstract scenic archetypes into measurable metrics by decomposing them into spatial layers (foreground, middle ground, and background) and visual variables (shape, size, position, and texture).

Four-tier analytical framework components

The first tier of the proposed framework encompasses scenic archetypes, which represent fundamental spatial organizational principles in traditional Chinese gardens. Since this study aims to quantitatively assess the spatial visual characteristics of scenic archetypes rather than measuring observers’ subjective responses, we focus on archetypes whose inherent spatial properties enable systematic analysis. Based on the nine scenic archetypes identified by Lu and Liu44, scenic archetypes can be fundamentally distinguished by their operational mechanisms: static archetypes that create stable spatial configurations versus dynamic archetypes that unfold through temporal and kinesthetic experiences. This distinction reflects different modes of spatial engagement and perceptual activation. Static archetypes, comprising framed scenery, obstructive scenery, porous scenery, and sandwiched scenery, operate through fixed spatial relationships that maintain consistent visual qualities across viewing positions. These archetypes manifest as stable compositional structures:

  • Framed scenery is composed of four-sided spatial elements, including door frames, window frames, trees, or rocks, which “frame” a specific field of view, producing a “picture frame” effect that highlights the selected scene9,40,41.

  • Obstructed scenery refers to the partial blockage or interruption of lines of sight through specific spatial elements such as buildings, trees, rocks, or walls, creating a visual effect where the field of view is partially hidden, evoking a sense of mystery9,17,41,42.

  • Porous scenery refers to the use of spatial elements such as latticed windows, perforated walls, doorways, railings, or bamboo fences that offer partial glimpses of the field of view through openings, resulting in the visual interplay of concealment and exposure9,17,40.

  • Sandwiched scenery is formed by the placement of spatial elements, such as buildings, trees, rocks, walls, or corridors, on both sides of a field of view. This guides the observer’s sight toward a focal point within the framed scenery, often creating a strong sense of composition and visual direction9,39,40,42.

In contrast, dynamic archetypes, including borrowed scenery, hidden scenery, informed scenery, opposite scenery, and segmented scenery, fundamentally rely on temporal unfolding, bodily movement, or cognitive associations that transcend static spatial configurations9,39,40,44,45. Du and Ji46 illuminate this distinction through their analysis of “farness” experience in Chinese gardens, where perceived depth fluctuates dramatically with movement: spaces appearing shallow from one position reveal unexpected depth from another. This spatial instability characterizes hidden scenery (藏景) and segmented scenery (隔景), which require kinesthetic exploration to fully manifest. Borrowed scenery (借景) exemplifies a different form of dynamism through its dependence on temporal conditions and intentional cognitive processes. As Lu and Liu14 demonstrate, this archetype requires distinguishing deliberate visual connections to distant elements from incidental views. This distinction relies on cultural knowledge and atmospheric variability rather than stable spatial relationships. Similarly, opposite scenery (对景) creates reciprocal viewing relationships requiring physical movement between two points to experience the complete spatial dialogue, while informed scenery (点景) operates through metaphorical associations linking physical forms to literary and philosophical concepts40.

The selection of the four static archetypes for this study emerges from their inherent potential for systematic analysis. Their stable spatial configurations enable the development of reproducible analytical methods, while contemporary computer vision technologies offer unprecedented capabilities to capture and quantify their compositional logic. This technological potential, combined with the archetypes’ fundamental reliance on measurable spatial relationships, creates opportunities to transform traditionally experiential knowledge into explicit analytical frameworks. Despite these advances, a critical gap persists: no systematic framework currently exists to translate the spatial organizational logic of these static archetypes into quantifiable, reproducible analytical standards.

The second tier of our framework comprises spatial layers, which represent the horizontal stratification of scenic archetypes from the observer’s perspective. Each archetype embodies a distinct spatial strategy for directing and modulating visual appreciation through systematic organization of perceptual depth. Spatial layers constitute the visual layout of scenic archetypes as perceived from a horizontal vantage point, specifically from the observer’s eye level during spatial exploration3,9. Within this conceptual framework, the observer’s perception of distance and spatial orientation assumes critical importance and conventionally divides into three components: foreground, middle ground, and background1,43. Significantly, spatial elements occupying the foreground exert dominant influence in shaping visual perception and mediating the presentation of middle ground and background elements19,47. Through this meditative function, the foreground facilitates the transformation of spatial elements in subsequent layers from mere physical objects into integral components of coherent scenery, thereby enabling observers to comprehend and appreciate the compositions and configurations of spatial elements through culturally specific modalities19. Consequently, when identifying and differentiating scenic archetypes, their unique spatial visual characteristics derive primarily from foreground rather than middle ground and background. The latter function principally as contextual elements that enrich the overall scenic archetypes. Building upon this understanding, this study proposes semantic mappings for each scenic archetype based on the “foreground, and middle ground and background” spatial layers as well as the analysis of framed, obstructive, porous, and sandwiched scenery.

  • For framed scenery, the foreground functions as a visual window, directing the observers’ line of sight through defined boundaries (Fig. 2). Beyond this frame, the middle ground forms the primary visual focus through elements such as pavilions or bridges, while the background provides contextual support to complete the scenery.

Fig. 2: Semantic mapping of framed scenery.
Fig. 2: Semantic mapping of framed scenery.The alternative text for this image may have been generated using AI.
Full size image

The visualization depicts the framing mechanism where the foreground acts as a visual window or boundary, directing the observer's line of sight toward the middle ground focal point.

  • For obstructed scenery, the foreground creates visual barriers that limit direct viewing using elements such as trees or walls (Fig. 3). Both middle ground and background remain concealed behind these obstacles.

Fig. 3: Semantic mapping of obstructive scenery.
Fig. 3: Semantic mapping of obstructive scenery.The alternative text for this image may have been generated using AI.
Full size image

The diagram demonstrates the blocking mechanism where foreground elements (such as trees or walls) create visual barriers that partially conceal the middle and background layers to limit direct viewing.

  • For porous scenery, the foreground offers selective views through gaps, creating focused visual corridors (Fig. 4). The middle ground becomes the focal point through these apertures, while the background typically includes natural elements like mountains or the sky.

Fig. 4: Semantic mapping of porous scenery.
Fig. 4: Semantic mapping of porous scenery.The alternative text for this image may have been generated using AI.
Full size image

The visualization illustrates the leaking mechanism where the foreground offers selective views through apertures or gaps, creating focused visual corridors toward the middle ground.

  • For sandwiched scenery, the foreground uses two major elements to channel views in specific directions (Fig. 5). The middle ground becomes the dominant focus through the visual entrance created by these distinct spatial elements, while the open background extends the perspective.

Fig. 5: Semantic mapping of sandwiched scenery.
Fig. 5: Semantic mapping of sandwiched scenery.The alternative text for this image may have been generated using AI.
Full size image

The diagram shows the sandwiching mechanism where bilateral foreground elements channel views in specific directions, making the middle ground the dominant focus through a created visual entrance.

The third tier of our framework comprises variables that categorize and measure spatial visual characteristics across the foreground, and middle ground and background of scenic archetypes. Following the establishment of spatial layers for each archetype, the selection of appropriate variables becomes essential for systematic measurement. From Bell’s2 comprehensive inventory of eleven visual variables, this study strategically selects four variables: shape, position, size, and texture (Fig. 6). This selection is grounded in three interconnected rationales that ensure both theoretical rigor and methodological feasibility. First, these four variables demonstrate objectivity and perceptual stability essential for reliable analysis. Visual perception theory establishes shape as the primary characteristic enabling object recognition, with research demonstrating that silhouettes alone suffice for accurate identification48,49. Position functions as the foundation of spatial relationships and represents the most accurately perceived dimension according to Gestalt psychology50. Size and texture correspond respectively to scale perception and surface characteristic recognition, both fundamental to spatial comprehension. In contrast, other variables proposed by2, including color, visual force, and direction, exhibit excessive variability due to lighting conditions, seasonal changes, and viewing angles in garden contexts, thereby lacking the requisite stability for systematic analysis.

Fig. 6: Pattern mapping of four variables.
Fig. 6: Pattern mapping of four variables.The alternative text for this image may have been generated using AI.
Full size image

The figure details the operationalization of shape, size, position, and texture variables, showing how they are extracted from the foreground, middle ground, and background layers for analysis.

Second, these variables align precisely with the spatial organizational principles inherent in traditional Chinese gardens. These gardens achieve specific modes of spatial appreciation through deliberate manipulation of element configuration (shape), dimensional control (size), compositional arrangement (position), and material differentiation (texture)9. This alignment manifests distinctly across archetypes: framed scenery delineates visual fields through shape definition; obstructive scenery modulates sight lines through strategic positioning; porous scenery generates perceptual contrast through textural variation; and sandwiched scenery constructs spatial sequences through size relationships. Such correspondence between analytical variables and design principles ensures that the framework captures authentic spatial logic rather than imposing external categories. Third, computer vision technology has achieved sophisticated capabilities in recognizing and quantifying these specific variables. Semantic segmentation accurately extracts element shapes and boundary conditions27, while depth estimation reliably determines relative spatial positions51. This technological maturity enables systematic data analysis at scales previously unattainable through manual methods. The convergence of theoretical validity and computational feasibility positions these four variables as optimal choices for bridging experiential knowledge and quantitative analysis. Building upon these conceptual foundations, the study operationalizes each variable through specific definitions and measurement protocols.

Shape constitutes the category of spatial visual characteristics generated by element configuration, encompassing the visual appearance of outlines or boundaries that define geometric properties in two-dimensional or three-dimensional space through lines, edges, or surfaces48,49,52. Within scenic archetypes, shape specifically denotes the primary contours and boundaries formed by foreground elements, which assume decisive importance in archetype classification and identification47. The critical nature of foreground shape reflects the fundamental principle that object identity can be conveyed through basic outline alone2.

Size, functioning as a complementary variable to shape, refers to the magnitude, dimension, or scale of spatial elements53. Larger forms generate stronger visual impressions and historically convey power or dominance through physical and psychological presence, while smaller elements create subtler visual impacts, particularly when dispersed2. In scenic archetypes, the foreground employs size variations to enhance stylistic expression and reinforce spatial hierarchies.

Position represents spatial visual characteristics arising from element configuration, specifically denoting coordinates, orientations, and relational arrangements within three-dimensional space54. For scenic archetypes, position describes both the specific locations where elements are composed from particular viewpoints and their relative placements2,26,42,55,56. This variable assumes particular significance in Jiangnan gardens, where spatial penetration and hierarchical variation emerge through careful calibration of element separation and connection.

Texture, serving as a complementary variable to position, encompasses spatial visual characteristics generated by element composition, particularly the effects created by the interplay of obstructed and unobstructed visual elements forming recognizable patterns at finer scales2. This dual functionality proves crucial for archetype differentiation, as texture influences both the quantitative aspects (presence, quantity, distribution, proportion) and qualitative dimensions of spatial complexity.

The fourth tier operationalizes variables through three complementary metrics that transform experiential interpretations into analyzable data across all spatial layers. Distribution trend metrics employ statistical methods to reveal patterns of spatial variation and compositional dynamics. Absolute value metrics quantify fundamental characteristics including element dimensions, areas, and distances, establishing objective baselines for cross-archetype comparison. Relative relationship metrics utilize ratios and percentages to express proportional analyses, revealing hierarchical relationships and how foreground elements mediate perception of subsequent layers. This tripartite system captures both quantitative measurements and relational logic inherent in scenic archetypes, enabling systematic analysis while preserving the nuanced spatial organizations of traditional garden making. Table 1 presents detailed specifications and mathematical formulations for each metric type.

Table 1 Definition of each metric developed in this study

Case study site and data collection

With the systematically analytical framework established, we now turn to examine this framework through empirical application. HWL was selected as a case study for the application and examination of the proposed framework for the following reasons: (1) HWL constantly interacts with the contemporary environment, representing the harmonious evolution of human activity and nature15. This dynamic interaction provides evidence and insights into the applicability of scenic archetypes in contemporary landscape spaces. (2) HWL includes several traditional Chinese gardens17, offering a rich data-set of photographic images for scenic archetypes from multiple dimensions, including type, scale, location, and function. (3) As a UNESCO World Heritage Site, information on HWL is publicly available and easily accessible, facilitating data collection and analysis.

To collect data for examination and given the absence of Google Street View coverage on HWL pedestrian paths, we conducted systematic on-site photography using an iPhone 12 Pro, which has a 0.5× zoom that matches human visual perception (120° horizontal field of view) as well as GPS functionality. We capture images at 20-m intervals perpendicular to the path at a height of 1.7 m; three images were taken per location, each facing a different direction (left, front, and right). All photographs were captured during late September under clear daylight conditions, ensuring consistent image quality and optimal visibility. While seasonal variations might affect vegetation density, our semantic segmentation model (trained on ADE20K with diverse seasonal imagery) maintains robust performance across different vegetation states. More importantly, the spatial visual characteristics of the four sceinc archetypes are determined by geometric forms (shape), spatial arrangements (position), and proportional relationships (size) remain constant across environmental variations. While texture, particularly of deciduous vegetation, exhibits seasonal variations, these changes occur within predictable ranges that do not alter the archetype’s essential spatial structure. The framework captures the underlying organizational principles rather than ephemeral surface qualities.

From the comprehensive data-set of 1045 photographs, this study selected 168 images representing four scenic archetypes: 54 framed scenery, 20 obstructive scenery, 41 porous scenery, and 53 sandwiched scenery. The remaining photographs were excluded due to their ambiguous classification, as they typically represented transitional views lacking the distinctive spatial configurations that characterize traditional design patterns. The selected images ensure comprehensive coverage of all identifiable scenic archetypes along West Lake paths while demonstrating clear spatial stratification across foreground, middle ground, and background (Fig. 7). The uneven sample distribution reflects the actual prevalence of scenic archetypes at HWL rather than sampling bias. This distribution, articulately the lower frequency of obstructive scenery, reflects both historical design preferences and contemporary landscape modifications. Each scenic archetype was analyzed independently to reveal its specific spatial visual characteristics. This approach not only aligns with our research goal of characterizing distinct design patterns but also mitigates potential biases arising from the uneven sample distribution, as each archetype’s characteristics are identified without reference to the prevalence of other types.

Fig. 7: Data collection for four scenic archetypes.
Fig. 7: Data collection for four scenic archetypes.The alternative text for this image may have been generated using AI.
Full size image

The left panel displays the visiting route and sampling points mapped at Hangzhou Westlake; the right panel provides representative photographic examples for each of the four collected scenic archetypes.

Image processing and mapping

Following data collection, image processing operationalizes the analytical framework through computational measurement. The four-tier framework (Fig. 1) establishes what spatial visual characteristics require measurement, specifically the decomposition of scenic archetypes into foreground-middle-background layers characterized by shape, size, position, and texture variables, while computational tools execute these specifications through systematic mapping protocols. This distinction between characteristic specification (framework-determined) and characteristic extraction (tool-executed) positions the study’s contribution in analytical logic transferable across multiple computational implementations, thereby prioritizing replicability over algorithm-specific reproducibility. This methodological stance addresses the distinction between reproducibility (exact numerical replication with identical tools) and replicability (comparable pattern identification with transferable logic), prioritizing the latter as appropriate for design pattern research. Each image underwent processing according to the framework’s hierarchical structure to generate three types of spatial visual characteristic units (Fig. 8): element mapping, openness mapping, and depth mapping. All images were standardized to 600 × 800 pixels at 72 dpi prior to processing.

Fig. 8: Data pre-processing based on the framework.
Fig. 8: Data pre-processing based on the framework.The alternative text for this image may have been generated using AI.
Full size image

This flowchart outlines the image processing pipeline, including the generation of element, openness, and depth mappings, followed by a two-stage correction process to refine foreground andbackground masks.

Element mapping employed PSPNet with ResNet-101 backbone for initial semantic segmentation, pre-trained on ADE20K57. This architecture was selected for its demonstrated robustness in complex scene understanding, achieving 41.96% mIoU and 80.64% pixel accuracy on ADE20K validation, representing 4.73% absolute mIoU improvement over ResNet-50 baseline27. The pyramid pooling module’s multi-scale context aggregation proves particularly effective for traditional gardens, where elements span multiple scales. However, automated segmentation exhibits systematic limitations: vegetation overlap produces boundary ambiguity, shadow variation reduces contrast discrimination, and irregular organic surfaces challenge recognition algorithms. We therefore implemented interactive boundary refinement using SAM ViT-H variant58, selected for its exceptional zero-shot generalization across 16 of 23 evaluation datasets and superior mask quality (7–9/10 ratings in human assessments)58. This human-in-the-loop mechanism positions artificial intelligence as supervised augmentation: when PSPNet generated indeterminate boundaries, particularly at vegetation-architecture interfaces or shadow-obscured regions, manual point-click intervention through SAM overrode automated outputs. This refinement additionally incorporated six culturally-specific categories absent from ADE20K’s universal taxonomy (herbaceous plants, aquatic plants, lawn, embankment, architectural inscriptions), yielding 34 semantic classes balancing automated efficiency with supervised accuracy.

Openness mapping incorporated cross-modal validation to ensure data quality. Element maps received binary occlusion labels indicating each component’s contribution to visual permeability, directly supporting texture variable calculation. We implemented cross-modal validation that systematically exploits redundancy across independent data sources to detect processing errors. Element maps, openness maps, and depth maps must maintain logical consistency; violations signal errors undetectable through single-modality analysis. For instance, an element labeled ‘tree’ in semantic segmentation yet appearing transparent in openness mapping indicates classification error. Similarly, depth orderings contradicting element positions reveal spatial inconsistencies. When such contradictions emerged, manual inspection and correction were triggered. This validation mechanism transforms multiple mapping outputs from parallel data streams into mutually reinforcing error-detection architecture.

Depth mapping employed ordinal validation to ensure perceptual accuracy. MiDaS v3.1 DPT-Hybrid generated relative depth maps, selected for its robust zero-shot cross-dataset transfer capability (36% relative error reduction versus v3.0 baseline)51. Critically, MiDaS performs relative depth estimation, outputting ordinal depth relationships (0=nearest, 1=farthest) appropriate for phenomenological requirements: foreground-middle-background stratification emerges from perceptual depth experience rather than absolute measurements, as demonstrated in visual landscape analysis43. The model’s demonstrated generalization across diverse scene types, from indoor spaces to outdoor scenarios to general web imagery, which ensures reliability for varied spatial compositions in traditional gardens51. We validated MiDaS outputs against visual inspection to ensure generated depth orderings matched phenomenological perception, with implausible stratification at occlusion boundaries receiving manual resolution. These protocols embody a fundamental methodological principle: artificial intelligence augments analytical efficiency while human oversight ensures interpretive reliability, treating computational models as supervised assistants requiring critical evaluation rather than autonomous systems.

Upon completion of initial mapping procedures, these AI-generated mappings underwent division into discrete mapping units to facilitate calculation of spatial visual characteristics across foreground, middle ground, and background layers. This critical segmentation ensures dedicated input data for each metric while preventing overlaps or interactions between different spatial visual characteristics, thereby maximizing measurement precision. The specific procedural steps encompassed converting color depth maps to grayscale using the standard Inferno color map and delineating preliminary foreground, middle ground, and background masks through the natural breaks classification method in ArcGIS, thus generating area mappings. To refine these preliminary results, we implemented a two-stage correction protocol designed to enhance area mapping accuracy. First, we removed openness elements located in the foreground and enclosed elements in the middle and background that properly belonged to the foreground, based on openness mapping data. Second, we adjusted element boundaries between foreground, and middle ground and background according to element mapping information. In the final processing phase, element, openness, and depth mappings were intersected with area mappings to generate definitive mapping units. Through this systematic approach, three mapping units were produced for both the foreground, and the middle ground and background of each image: element mapping, openness mapping, and depth mapping, with each map corresponding to at least one analytical metric.

Data measurement and statistical analysis

Following image processing and mapping, data analysis proceeded through two distinct phases: measurement and assessment. During the measurement phase, twelve proprietary mapping algorithms developed for this study processed the four variables of shape, size, depth, and texture. Notably, only the texture variable applies to all spatial layers (foreground, middle ground, and background), while the remaining variables were developed exclusively for foreground analysis. The operational architecture of these algorithms comprised four integrated components: mapping unit reading, characteristic extraction, spatial calculation, and result output (Fig. 9). First, the mapping unit reading module converts the pixel-based spatial visual characteristic units into arrays, generating the raw data used for subsequent calculations. Next, the characteristic extraction module extracts the specific values required for metric calculations. Then, the spatial calculation module analyzes the distribution, absolute values, and relative relationships of these values to generate results. Finally, these results are converted into two primary outputs in the result output module: (1) precise and unique values, and (2) output mappings that explain the computational process.

Fig. 9: Operational modules of the algorithms used to generate the metrics.
Fig. 9: Operational modules of the algorithms used to generate the metrics.The alternative text for this image may have been generated using AI.
Full size image

The diagram visualizes the four integrated components of the mapping algorithms: mapping unit reading, characteristic extraction, spatial calculation, and result output.

To complete our analytical framework, during the assessment phase, various statistical methods were used to conduct an in-depth analysis of the measurement results, including descriptive statistical analysis, importance analysis, correlation analysis, and element contribution analysis. For basic characterization, descriptive statistics were used to summarize the basic trends and patterns in the numerical representations of the four scenic archetypes, focusing on measures of central tendency, dispersion, and distribution. For more advanced analysis, importance analysis was conducted using a random forest algorithm that transformed the identification of the target scenic archetypes into a binary classification problem59,60. This method assessed the importance of each characteristic in improving model performance, identifying the most important characteristics in terms of classifying the different scenic archetypes. Additionally, Spearman’s correlation coefficient, a non-parametric statistical method suitable for analyzing small samples, non-linear relationships, and datasets with outliers, was used to evaluate correlations between composite items in the dataset61,62. To complete our comprehensive analysis, element contribution analysis was conducted by comparing the original measurement results with the results obtained after removing the target element, which quantified the contribution of the target element to a specific composite item.

Results

Examining the hierarchical structure of the framework

The hierarchical structure of the proposed framework, from scenic archetypes to metrics, was examined using descriptive statistical analysis and importance analysis via random forest classification. First, importance analysis was used to evaluate the conversion of scenic archetypes into spatial layers; the results showed that the framework exhibited high recognition accuracy for the different scenic archetypes, achieving an overall accuracy of 94.12% and a weighted F1‑score of 0.94. This validates the division of scenic archetypes into the foreground, and the middle ground and background. Specifically, random forest models used in the importance analysis exhibited recognition accuracy of 97.06% for framed scenery, 91.18% for obstructed scenery, 97.06% for porous scenery, and 100% for sandwiched scenery, supporting the framework’s theoretical premise that “the foreground plays a dominant role by shaping the observer’s view and mediating the presentation of middle and background elements.” Second, upon conversion from spatial layers to variables, the random forest models revealed that the most influential spatial visual characteristics were concentrated in the foreground across all four variables: shape (e.g., S_I contributing ≈ 31.5%), size (e.g., S_ADI contributing ≈ 12.1% and S_VFR contributing ≈ 9.9%), position (e.g., P_LV ≈ 24.3% in obstructed scenery), and texture (e.g., T_IVI contributing ≈ 9.3% and T_ISI ≈ 8.7%). These foreground spatial visual characteristics collectively contributed over 70% of total importance, validating the premise that “the foreground facilitates the transformation of spatial elements in the middle ground and background from mere “objects” into integral parts of the “scenery.”” Finally, when converting from variables to metrics, descriptive statistical analysis with 95% confidence intervals revealed that 55.36% of metrics displayed a near‑normal distribution (absolute K-value < 10, absolute Sk < 3; see refs. 63,64), while 33.93% exhibited a positive skew, indicating that metrics developed based on distribution trends, absolute values, and relative relationships can reliably capture the spatial visual characteristics of each scenic archetype. Additionally, significant differences in the mean values (with non-overlapping confidence intervals) of identical metrics across scenic archetypes highlighted the ability of metrics to effectively identify and distinguish different scenic archetypes. Together, these findings demonstrate that the framework’s hierarchical structure is capable of converting abstract concepts into specific, quantifiable metrics.

Summarizing and characterizing four scenic archetypes

Descriptive statistical analysis was combined with correlation analysis and element contribution analysis to characterize each scenic archetype based on their position, size, shape, and texture (Figs. 1011).

Fig. 10: Correlation heatmap of metrics of four scenic archetypes.
Fig. 10: Correlation heatmap of metrics of four scenic archetypes.The alternative text for this image may have been generated using AI.
Full size image

a Framed scenery, b obstructive scenery, c porous scenery, and d sandwiched scenery.

Fig. 11: Line chart of contribution with weight by category for metrics of four scenic archetypes.
Fig. 11: Line chart of contribution with weight by category for metrics of four scenic archetypes.The alternative text for this image may have been generated using AI.
Full size image

a Framed scenery, b obstructive scenery, c porous scenery, and d sandwiched scenery.

Framed scenery is characterized by three interconnected aspects: foreground framing, focal point formation, and spatial hierarchy enhancement. The results of the correlation analysis are detailed in Fig. 10a, and the results of the descriptive statistical analysis are presented in Table 2. The results show that the foreground effectively frames the view through its carefully orchestrated physical attributes. The analysis shows that the foreground maintains regular outlines (S_ERI) with smooth edge continuity (P_ECI), creating a stable framing structure. This framing structure is strategically positioned, starting near the observer’s viewpoint (P_LN) and extending to sufficient depth (P_LV), with a significant positive correlation between the starting position and depth transition (r = 0.656, p < 0.01). The foreground occupies a substantial portion of the field of view (S_VFR) with an optimal area distribution (S_ADI). This framing structure effectively directs attention to create a clear visual focal point. This is achieved through deliberate textural contrasts between the foreground and the middle ground and background. The foreground exhibits enclosed, simple characteristics (T_ESI, T_ISI, T_ER), while the middle ground and background exhibit greater openness and diversity (T_ESI, T_ISI, T_ER). The effectiveness of focal point creation is further supported by significant correlations between regular edge shapes and positioning (r = 0.331, p < 0.05) as well as between texture intervals and area distribution (r = −0.326, p < 0.05), indicating coordinated visual guidance. Finally, the spatial hierarchy is enhanced through sophisticated layering mechanisms. The analysis reveals consistent texture variations (T_IVI autocorrelation = 0.287, p < 0.05) that strengthen the hierarchical contrast in space (T_ESI and P_LN correlation: r = 0.594, p < 0.01). The spatial structure maintains stability despite complex texture variations (T_ISI and T_IVI correlation: r = 0.699, p < 0.01), while area distribution and edge continuity complement each other in reinforcing spatial depth (S_ADI and P_ECI correlation: r = −0.66, p < 0.01). This hierarchical organization is materially supported by specific spatial elements, including trees and walls, which exhibit strong and centralized contributions (mean r > 0.3) across multiple foreground metrics (Fig. 11a).

Table 2 Descriptive statistical analysis of each metric and their interpretations in the context of framed scenery

These measured correlations align with spatial organization principles documented in traditional Chinese garden treatises and recent empirical validations. The positive correlation between foreground shape regularity (low S_ERI) and smooth depth transitions (high P_ECI, r = 0.656, p < 0.01) observed at HWL can be interpreted through the lens of principles articulated in Ji40, though not in these modern terms. Ji Cheng emphasized that architectural openings such as door and window frames should “collect fine views while excluding mundane sights” (佳境宜收,俗尘安到). The conceptual foundation of framed scenery as a deliberate compositional device was further developed by Li41, who described creating “frameless paintings” (无心画) by positioning paper borders around window openings to transform architectural apertures into pictorial frames (Fig. 12a). This historical precedent demonstrates that designers understood framing not merely as structural necessity but as a sophisticated tool for controlling visual perception and spatial depth—principles that manifest in contemporary West Lake gardens through systematic foreground geometry (Fig. 12b).

Fig. 12: Framed scenery: historical precedent and case manifestation.
Fig. 12: Framed scenery: historical precedent and case manifestation.The alternative text for this image may have been generated using AI.
Full size image

a Historical illustration from Li41 depicting the conceptual origin of “frameless painting” (无心画), where architectural apertures transform scenery into composed views. b Rectangular stone window frame at HWL demonstrating systematic structuring of depth perception through foreground geometry.

While Ji Cheng did not employ geometric or perceptual psychology terminology, as his philosophy centered on “skillful borrowing and appropriate adaptation” (巧于因借,精在体宜) and organic flexibility rather than geometric regularity, contemporary research has validated that framing elements do systematically structure visual depth perception. Experimental studies demonstrate that frame positioning, rather than frame geometry alone, determines depth perception in framing contexts65. The significant relationship between foreground positioning, denoted as P_LN, and depth extension, denoted as P_LV, reflects what traditional texts described as systematic spatial layering40,41, where designers deliberately positioned framing elements to create hierarchical views. Contemporary computational analyses confirm that buildings function as primary mechanisms for controlling views and creating framed scenery, with Building Visual Index exerting the strongest influence on visual complexity in garden spaces at β = 0.683, p < 0.0566. Thus, the statistical patterns observed in framed scenery at HWL reflect not incidental geometric arrangements but intentional compositional strategies encoded in traditional design knowledge, though expressed through cultural vocabularies distinct from modern spatial analysis terminology, and validated through both historical documentation and contemporary empirical research.

Obstructive scenery is characterized by three interrelated aspects: visual barrier creation, sight line guidance, and spatial hierarchy enhancement. The results of the correlation analysis are detailed in Fig. 10b, and the results of the descriptive statistical analysis are presented in Table 3. In obstructive scenery, the foreground effectively establishes a visual barrier through precise control of its physical attributes. Analysis reveals that the foreground maintains distinct outlines (S_ECI) with smooth depth transitions (P_ECI) to create a clear obstruction. This obstruction is strategically configured through optimal area distribution (S_ADI) while maintaining moderate visual field occupation (S_VFR). Furthermore, there is a significant negative correlation between area and visual field ratio (r = −0.642, p < 0.01) ensuring effective visual concentration. The obstruction is positioned closer to the observer (P_LN) at a controlled depth (P_LV), demonstrating a strong position–depth correlation (r = 0.912, p < 0.01) for precise obstruction control. This visual barrier systematically guides the observer’s line of sight through deliberate textural manipulation. The foreground exhibits simplified texture characteristics (T_ISI, T_ER, T_ESI); this is in contrast to the richer middle ground and background textures (T_ISI, T_ER, T_ESI), which exhibit distinct variation patterns (T_IVI). This guidance is reinforced through coordinated relationships between textural elements and spacing (r = 0.728, p < 0.01), textural variation and intervals (r = 0.566, p < 0.01), as well as depth transitions and positioning (P_LV and P_LN with P_ECI: r = 0.55 and 0.531 respectively, p < 0.05). Finally, spatial hierarchy is enhanced through sophisticated visual and spatial mechanisms: The analysis reveals systematic relationships between visual field proportion and spatial depth (r = 0.778, p < 0.01) and positioning (r = 0.695, p < 0.01), which are indicative of coordinated spatial progression. Texture intervals are found to significantly influence visual guidance (T_ISI and S_ADI correlation: r = −0.655, p < 0.01), while precise depth control ensures a smooth spatial transition. This hierarchical organization is materially supported by specific spatial elements, such as trees, walls, and plants (primarily shrubs and tall herbaceous species), which all have a strong and centralized contribution across multiple foreground metrics (Fig. 11b).

Table 3 Descriptive statistical analysis of each metric and their interpretations in the context of obstructive scenery

These measured spatial patterns, particularly the deliberate textural contrasts between simplified foreground elements and enriched middle-ground/background compositions, correspond to traditional design principles documented in traditional treatises such as Li41, which emphasizes the strategic placement of screening elements such as trees, walls, and rockeries to control visual sequences and create progressive spatial revelation. Historical visual evidence illuminates how these principles operated in practice: Zhang67 systematically depicts rockeries as foreground obstructions, as shown in Fig. 13a, where the monumental Taihu stone formation exemplifies the measured positioning control denoted as P_LN and depth management indicated by P_LV identified in our statistical analysis. The rockery’s textural complexity characterized by high T_ER against simplified surrounding vegetation demonstrates the same foreground-background contrast measured through T_ESI and T_ISI indices that our data reveal as characteristic of obstructive scenery. This compositional strategy persists in contemporary practice: at HWL, Taihu stone rockeries continue to function as carefully positioned visual barriers, as illustrated in Fig. 13b, their placement near the observer with controlled depth extension creating the “conceal-then-reveal” effect termed (先藏后露) that defines obstructive design.

Fig. 13: Obstructive scenery: historical precedent and case manifestation.
Fig. 13: Obstructive scenery: historical precedent and case manifestation.The alternative text for this image may have been generated using AI.
Full size image

a Historical illustration from Zhang [67] showing monumental Taihu stone formation as foreground visual barrier in Zhiyuan Garden Album (Ming Dynasty). b Taihu stone rockery at HWL maintaining the traditional role of foreground visual barrier.

The strong correlation between foreground positioning and depth control (P_LN and P_LV, r = 0.912, p < 0.01) observed in our data quantitatively validates what traditional treatises describe qualitatively as “systematic obstruction” (障). HWL examples demonstrate that effective obstructive scenery requires precise calibration: the obstruction must be close enough, reflected in low P_LN values, to command attention yet allow sufficient depth variation through P_LV to guide visual exploration around its edges. The textural hierarchy we measured, showing simplified foreground through T_ISI, T_ER, and T_ESI against enriched backgrounds, aligns with the visual strategy evident in both Zhang67 and WHL, where screening elements heighten anticipation through deliberate contrast. Recent computational analyses confirm that while excessive enclosure reduces dwell duration with β = −0.789 and p < 0.001, strategic screening creates offset-alternating synergies that enhance spatial experience67. These convergences between our statistical findings from HWL, Ming Dynasty garden albums and traditional design treatises, demonstrate that the spatial patterns characterizing obstructive scenery reflect centuries-refined compositional strategies rather than incidental arrangements.

Porous scenery is characterized by three interconnected aspects: partial perspective exhibition, balanced spatial enclosure, and spatial hierarchy enhancement (Fig. 10c). The results of the correlation analysis are detailed in Fig. 10c, and the results of the descriptive statistical analysis are presented in Table 4. In porous scenery, the foreground effectively creates partial perspective openings through sophisticated control of its physical attributes. Analysis reveals that the foreground maintains complex and diverse outlines (S_PSI) with significant depth variations (P_ECI) that help establish a sophisticated porous structure. The porous structure is strategically positioned close to the observer (P_LN) and occupies a substantial portion of the visual field (S_VFR) while maintaining a balanced influence on the middle ground and background (S_ADI). This considerable depth extension (P_LV) further supports the systematic revelation of external scenery through these openings. This design achieves a delicate balance between openness and enclosure through deliberate textural manipulation. The foreground exhibits uniform textural characteristics (T_ESI) with varied openness patterns (T_IVI) and relative enclosure (T_ISI), that contrasts with the more fragmented (T_ESI), stable (T_IVI), and open (T_ISI) middle ground and background sections. This balance is reinforced through significant correlations between foreground segmentation and texture variation (r = 0.492, p < 0.01), as well as negative relationships between the visual field ratio and both textural similarity (r = −0.598, p < 0.01) and segmentation (r = −0.384, p < 0.05), indicating a systematic regulation of visuals through these perspective openings. Spatial hierarchy in this scenic archetype is enhanced through sophisticated layering mechanisms. The analysis reveals continuous textural variations (T_IVI autocorrelation = 0.341, p < 0.05) that are coordinated with depth transitions (T_IVI and P_ECI correlation: r = 0.472, p < 0.01) to ensure smooth spatial progression. The foreground elements remain relatively simple (T_ER) compared to the richer middle ground and background sections (T_ER), creating a clear visual distinction. This hierarchical organization is materially supported by specific landscape elements, such as trees, grass, walls, buildings, and columns, which all have a strong contribution across multiple foreground metrics (Fig. 11c).

Table 4 Descriptive statistical analysis of each metric and their interpretations in the context of porous scenery

The balanced spatial configuration and textural variations characterizing porous scenery correspond to the emptiness-substance (虚实) principle systematically articulated in traditional garden theory. This principle found systematic codification in Ji40, which documented canonical window designs embodying “adjacency to emptiness everywhere, framed views in every direction” (处处邻虚,方方侧景), specifically the strategic placement of porous openings to create “seemingly separated yet not separated” (似隔非隔) spatial relationships (Fig. 14a). For example, the floral lattice window at HWL (Fig. 14b) demonstrates the enduring application of these design principles, where ornamental geometry functions as both aesthetic object and spatial filter to achieve the characteristic “half-transparent” effect. Song et al.68 document how traditional designers employed “substance within emptiness” and “emptiness within substance” to create complementary pairings: architecture with trees and water bodies, rockeries with vegetation and water, creating variations of density and sparseness that highlight layered spatial structures. The significant negative correlations between visual field ratio and both textural similarity (r = −0.598, p < 0.01) and segmentation (r = −0.384, p < 0.05) observed in our data reflect this principle’s practical implementation, whereby designers systematically regulated visual access through openings to balance enclosure and revelation. The complex foreground outlines (high S_PSI) combined with varied depth transitions (high P_ECI) align with the documented design goal of creating “limitless vision, endless recurrence” through strategic arrangement of sightlines and pathways69. Space syntax analyses confirm that traditional designers used doorways and windows to implicitly frame scenic moods, creating clustering centers positioned to maintain attractive scenery while requiring multiple turns for complete spatial perception70. The statistical patterns in porous scenery observed at HWL thus validate the traditional design strategy of achieving spatial richness through controlled permeability rather than uniform openness or complete enclosure; this principle has now been empirically confirmed through contemporary computational heritage studies.

Fig. 14: Porous scenery: historical precedent and case manifestation.
Fig. 14: Porous scenery: historical precedent and case manifestation.The alternative text for this image may have been generated using AI.
Full size image

a Historical window lattice designs from Ji [40] showing Two-Section Style (两截式) and Ice-Crack Pattern (冰裂纹). b Floral lattice window at HWL demonstrating contemporary implementation of the “half-transparent” filtering effect.

Sandwiched scenery is characterized by three interconnected aspects: structural clamping formation, bilateral visual guidance, and spatial hierarchy enhancement. The results of the correlation analysis are detailed in Fig. 10d, and the results of the descriptive statistical analysis are presented in Table 5. In sandwiched scenery, the foreground effectively establishes clamping structures through the precise control of its physical attributes. The analysis reveals that the foreground maintains highly symmetrical outlines (S_API) with dynamic depth transitions (P_ECI) to create a stable sandwiching structure. This sandwiching structure is strategically positioned close to the observer (P_LN) and demonstrates optimal area distribution (S_ADI) while occupying a moderate visual field (S_VFR). The systematic formation of sandwiching structures is further supported by its considerable depth extension (P_LV). Bilateral elements in the scene effectively guide visual attention through deliberate textural manipulation. The foreground exhibits enclosed characteristics (T_ISI) with consistent morphological variations (T_IVI), simple and uniform textures (T_ESI), and fewer elements (T_ER); this is in contrast to the more open (T_ISI), similarly varied (T_IVI), fragmented (T_ESI), and rich (T_ER) middle ground and background regions. This guidance is reinforced through significant correlations between symmetrical forms and depth variation (r = 0.291, p < 0.05), element richness and spatial hierarchy (r = 0.434, p < 0.01), and enclosure and visual concentration (r = 0.49, p < 0.01). Spatial hierarchy is enhanced through sophisticated organizational mechanisms. Specifically, the analysis reveals systematic relationships between visual field proportion and several variables, including spatial depth (r = 0.549, p < 0.05), element richness (r = 0.532, p < 0.01), and area distribution (r = −0.588, p < 0.01), indicating coordinated spatial progression. Morphological variations exhibit significant consistency (T_IVI correlation: r = 0.764, p < 0.01), with element richness being positively correlated with morphological variation (r = 0.532, p < 0.01) and negatively correlated with area distribution (r = −0.71, p < 0.01). This hierarchical organization is materially supported by specific spatial elements, including aquatic plants, trees, plants (primarily shrubs and tall herbaceous species), and windowpanes; which all have a strong and centralized contribution across multiple foreground metrics (Fig. 11d).

Table 5 Descriptive statistical analysis of each metric and their interpretations in the context of sandwiched scenery

The bilateral symmetry and channeling characteristics defining sandwiched scenery reflect design principles embedded in traditional Chinese garden-making practice. While Ming-dynasty treatises such as Wen71 emphasized aesthetic principles of spatial rhythm, specifically “positioning with balance between density and sparseness” (位置疏密), and Ji40 systematically articulated spatial sequencing through borrowed scenery techniques, the specific organizational logic of bilateral framing developed through centuries of iterative practice rather than explicit theoretical formulation. The measured high symmetry in foreground outlines (S_API mean = 43.241) and its correlation with depth variation (r = 0.291, p < 0.05) empirically validates this embedded design intention.

Historical precedents confirm this spatial logic as conscious design knowledge. (Fig. 15a) The Humble Administrator’s Garden, created in the 1510 s, demonstrates the systematic application of sandwiched scenery: curved corridors and elevated walkways lined with bilateral plantings of bamboo and flowering trees create multi-tiered lateral framing that channels sightlines toward the “borrowed” North Temple Pagoda beyond the garden boundary, representing an integration of sandwiched scenery and borrowed scenery documented in Wen’s album71 depicting the garden’s spatial sequences. This Ming-dynasty built example substantiates that bilateral framing was an intentional compositional strategy for creating “two-sided obstruction, visual guidance, and endpoint emphasis.” HWL exhibits these same organizational principles: trees positioned along both shores generate visual corridors across the water surface toward distant pagodas, demonstrating the persistence of this spatial logic in contemporary landscape experience (Fig. 15b). The systematic relationships observed between visual field proportion and multiple variables (depth: r = 0.549, element richness: r = 0.532, area distribution: r = −0.588, p < 0.01) reflect the intentional orchestration of bilateral elements to channel visual attention and structure spatial progression. Recent fractal analyses demonstrate that successful Chinese gardens exhibit scale-dependent complexity ranges that correspond to traditional design goals72; in these studies, designers did not scale proportionally but applied different compositional strategies at different scales, validating that measured patterns reflect systematic design knowledge. The convergence of our statistical findings with space syntax validations, historical built evidence, and fractal geometry analyses establishes that sandwiched scenery’s spatial patterns observed at HWL reflect conscious compositional strategies developed through centuries of practice and transmitted through both built examples and theoretical discourse.

Fig. 15: Sandwiched scenery: historical precedent and case manifestation.
Fig. 15: Sandwiched scenery: historical precedent and case manifestation.The alternative text for this image may have been generated using AI.
Full size image

a Curved corridors and bilateral plantings in the Humble Administrator’s Garden, Suzhou (Ming dynasty, 1510 s). b Trees positioned along both shores at HWL creating visual corridors across the water surface.

In summary, the full implementation of the proposed framework is capable of capturing, presenting, and measuring the spatial visual characteristics of scenic archetypes. The results are consistent with our hypotheses, demonstrating the applicability and effectiveness of the framework in quantifying scenic archetypes.

Discussion

This study advances the analysis of scenic archetypes by addressing persistent limitations in both theory-driven and measurement-driven approaches. Existing research on traditional Chinese gardens has evolved along four loosely connected paradigms: (1) the inheritance of abstract design vocabulary, which vividly describes spatial effects yet lacks systematic procedures for extraction and comparison; (2) landscape representation frameworks, which provide hierarchical categorization but do not capture the multi-layered spatial configurations specific to scenic archetypes; (3) quantitative visual indicators, which measure isolated spatial properties without accounting for the configurational relationships that generate overall spatial effect; and (4) computational vision systems, which deliver high detection accuracy but cannot directly translate element recognition into spatial-visual analysis. These paradigms, while valuable in isolation, share a methodological gap: they are unable to connect high-level theoretical concepts with verifiable spatial metrics in a structured and reproducible way. Our framework directly addresses this gap by structuring scenic archetypes into spatial-visual components and linking them to measurable indicators, enabling cross-paradigm integration and application in both research and heritage practice.

From abstract vocabulary to operational spatial variables

Design traditions in Chinese gardens have long relied on a rich lexicon, such as borrowed scenery, framed scenery, and obstructed scenery, that has transmitted spatial knowledge across generations9,17. These poetic descriptions excel at capturing experiential richness and cultural resonance but suffer from interpretive indeterminacy: practitioners interpret “appropriate density” or “depth with layers” through personal experience, creating inconsistencies that impede systematic application and cross-cultural communication. As Bandarin and van Oers73 note in their analysis of Historic Urban Landscape approaches, and as Smith74 argues regarding intangible heritage, the challenge lies not in preserving terminology but in maintaining operational knowledge. This study addresses the issue by translating abstract vocabulary into operational spatial variables without reducing it to technical jargon. For example, when we demonstrate that framed scenery exhibits specific correlation coefficients (0.549 between foreground and background), we reveal the mathematical relationships underlying poetic experience, not replacing metaphor with measurement but uncovering the quantitative structures that enable qualitative experience. This dual preservation of semantic richness and analytical precision enables what recent heritage management discourse terms “value-based indicators”75 and what contemporary Chinese heritage conservation identifies as urgently needed: transparent communication across different knowledge systems without sacrificing cultural authenticity.

From static quantities to configurational spatial patterns

Building on the preceding discussion, most existing studies, whether employing conventional landscape metrics or recent AI-based analyses of high-quality imagery, have focused on measuring single spatial-visual attributes in isolation. This tendency is reflected in the emphasis on quantitative measures such as the green view index76, sky view factor77, and other visual landscape metrics78. While these approaches have advanced measurement precision, they often conflate quantification with interpretation when applied to heritage contexts. A 65% green view ratio or 0.75 sky view factor quantifies a state, a static condition at a moment in time, but fails to capture the pattern through which such states generate meaning. As Silva79 demonstrates in analyzing Historic Urban Landscapes in the Asia-Pacific, and as Veldpaus and Pereira Roders80 argue in their assessment framework for historic urban landscapes, the essence of designed space emerges from relational configurations rather than aggregate indicators. This study transforms this paradigm by distinguishing between percentage (how much) and pattern (how configured). When identifying framed scenery, we analyze not the quantity of framing elements but the spatial configuration through which framing operates as a phenomenological experience. This distinction proves crucial: two gardens with identical quantitative metrics can produce entirely different spatial experiences because their configurational patterns differ. The shift from state to pattern thus represents more than methodological refinement; it constitutes an epistemological reorientation that aligns measurement with the fundamental nature of spatial experience.

From universal frameworks to culturally specific spatial logic

Many established landscape representation frameworks, such as Tveit et al.’s38 nine key concepts for visual landscape character, Bell’s2 elements of visual design, and Liu and Nijhuis’s3 spatial-visual vocabulary, provide valuable hierarchical structures that inspired our approach. Yet these frameworks encounter insurmountable limitations when applied to culturally specific landscapes: they presuppose universal aesthetic principles that transcend cultural boundaries, failing to recognize that scenic archetypes embody not merely visual arrangements but culturally constituted ways of perceiving and inhabiting space. The Chinese concept of scenery, as Jin81 elucidates in his exploration of jing (scenery) in traditional Chinese garden texts, fundamentally refuses the subject-object dichotomy inherent in western analytical frameworks, the observer does not view the garden from outside but participates in its continuous unfolding. As Lu and Liu14 demonstrate through spatial-experiential analysis of the Master of Nets Garden, Chinese gardens operate through embodied experience rather than detached observation. Building on this premise, the proposed four-tier hierarchy does not simply append Chinese categories to existing frameworks; it restructures the analytical process to mirror what Sun82 identifies as the unity of knowledge and practice fundamental to Chinese epistemology. Each tier maintains this unity: scenic archetypes preserve experiential wholeness, spatial layers translate experience into perceptual structures, visual variables extract measurable qualities without fragmenting meaning, and metrics quantify relationships while maintaining semantic integrity. This structural alignment with Chinese spatial thinking explains our 94.12% recognition accuracy success, which derives not from superior algorithms but from epistemological congruence with the phenomena being analyzed.

From element detection to functional spatial relationships

From a technical perspective, recent computational methods, such as semantic segmentation models and computer vision systems, have achieve remarkable technical precision. YOLOv8’s 93.9% element detection accuracy83 and YOLOv4’s 90.20% damage identification rate84 demonstrate the power of contemporary AI. Yet these tools remain confined to what Liu3 conceptualizes as “the world of data,” unable to reach “the world of concern” where design meaning resides. A semantic segmentation model identifies walls, trees, and rocks with near-perfect accuracy but cannot distinguish whether a wall frames a view, obstructs a sightline, or merely defines a boundary: distinctions fundamental to design practice and heritage value. This study transforms computational capability into design understanding by applying scenic archetype logic as an interpretive layer: element detection provides raw data, but our four-tier structure analyzes spatial relationships to determine design function. This represents not post-processing but fundamental reorientation from asking “what exists?” to asking “what does it mean?”.

Building on these four lines of inquiry, a unifying insight emerges: scenic archetypes resist reductive analysis precisely because they function as holistic design patterns where meaning emerges from relational totality rather than component aggregation. The apparent incompatibility between data thinking (emphasizing decomposition and measurement) and design thinking (prioritizing synthesis and experience) reveals not a methodological problem to solve but an ontological reality to acknowledge. Scenic archetypes encode what Ji40 termed the “living method” of garden creation, principles that exist only through embodied practice, where knowledge is inseparable from action, and patterns dissolve when reduced to isolated elements. Our approach does not bridge this divide through compromise; instead, it recognizes scenic archetypes as translational mechanisms in their own right, capable of converting abstract principles into spatial organizations and, in turn, transforming spatial organizations into visual appreciation.

This recognition has direct implications for heritage conservation, shifting practice from recording only discrete physical changes to monitoring and safeguarding the spatial relationships that sustain scenic integrity. While UNESCO’s Guidance and Toolkit for Impact Assessments calls for evaluating impacts on Outstanding Universal Value85, and threats to the visual integrity of World Heritage properties are well documented35,86, prevailing tools still mainly track building heights, vegetation coverage, and new construction. Recent scholarship underscores the central role of spatial organization in heritage value87,88; this research makes that organization measurable and manageable, operationalizing the long-observed fact that deterioration of spatial relationships often precedes physical degradation89,90. In impact assessments, the method quantifies how proposed changes affect scenic integrity, for example, reducing the correlation between foreground and background in a framed-scenery view from 0.549 to 0.300 indicates not merely visual intrusion but the loss of the archetype itself, even when structures remain. These analyses provide objective, defensible criteria that go beyond subjective judgments and align with UNESCO’s requirements for evidence-based assessment of OUV attributes84. Establishing baseline measurements of scenic archetypes further enables proactive management, including early-warning systems that detect gradual erosion of spatial integrity before irreversible damage occurs91,92. Operationally, the approach integrates scenic archetypes preservation into conservation planning alongside traditional physical monitoring, consistent with contemporary calls to protect both tangible and spatial dimensions of heritage value while supplying the quantitative tools needed for implementation.

From quantitative precision to integrated heritage assessment

While the preceding discussion establishes the methodological contribution of quantitative spatial analysis to heritage conservation, responsible scholarship requires explicitly acknowledging the epistemological boundaries of such approaches. What dimensions of heritage value lie beyond the reach of quantification, and how should our framework be positioned relative to these irreducible intangible dimensions? Heritage embody tangible and intangible dimensions93, where spatial visual characteristics constitute only one component of a broader constellation of values encompassing historical memory, local identity, community narratives, and intangible cultural practices that give designed spaces profound human meaning94,95. As Lian et al.16 demonstrate in their systematic review of historic garden conservation approaches, effective heritage management necessarily integrates multiple analytical frameworks: landscape mapping identifies physical and spatial attributes; landscape planning addresses conservation strategies; landscape design facilitates development and reuse. Critically, their framework positions spatial analysis within broader landscape context through conceptual “layers” connecting tangible architectonic elements with intangible cultural processes, temporal evolution patterns, and community value systems. This layered approach acknowledges that while spatial visual characteristics can be quantified, the cultural significance they embody requires complementary assessment through ethnographic methods, oral history documentation, and participatory evaluation engaging core communities whose lived experiences constitute irreplaceable dimensions of heritage value96,97.

This recognition of heritage value’s multidimensional nature leads directly to a critical methodological question: how should our quantitative framework be positioned relative to these broader assessment requirements? The present study’s analytical framework should therefore be positioned not as comprehensive heritage assessment methodology but as specialized contribution addressing specific evidence gaps in conservation practice. Our quantitative approach excels at particular tasks: providing systematic evidence for heritage inscription processes; detecting gradual spatial changes that might escape qualitative monitoring; enabling comparative analysis across multiple sites; translating design principles into implementable guidelines for contemporary practice. However, these capabilities complement rather than replace methods capturing values our framework cannot measure. A framed scenery opening, for instance, may be precisely characterized through our metrics, yet these measurements cannot convey the cultural associations viewers bring to the scene: literary references accumulated through centuries of poetic tradition, historical events that transformed physical space into commemorative place, personal recollections that link individual memory to collective heritage, or spiritual significance attributed through religious or philosophical practice98. These intangible dimensions do not supplement spatial analysis; they constitute parallel and equally valid forms of heritage value requiring distinct methodological approaches98,99.

Having established this methodological positioning, we turn to the theoretical frameworks emerging in heritage scholarship that provide conceptual foundation for integrating quantitative and qualitative approaches. Recent theoretical work establishes frameworks for this methodological integration. Mason99 distinguishes heritage-centered values (historical, aesthetic, architectural) from societal values (social, economic, environmental), arguing that while heritage-centered values may be more amenable to expert quantification, societal values require participatory assessment engaging diverse stakeholders. Robson96 demonstrates through case study analysis that different assessment methods—quantitative spatial analysis, qualitative interviews, participatory mapping, photo-elicitation—surface different types of knowledge, with findings sometimes converging but often revealing dissonances requiring negotiation rather than resolution. Waterton and Smith100 caution that privileging expert-defined quantifiable attributes risks marginalizing community-defined values that resist measurement, potentially creating what they term “authorized heritage discourse” that legitimizes certain forms of knowledge while delegitimizing others. These critiques do not invalidate quantitative approaches but situate them within broader epistemological landscape where multiple ways of knowing heritage coexist, each revealing distinct dimensions of significance101,102.

The preceding analysis of epistemological boundaries establishes the theoretical foundation for understanding how quantitative spatial analysis should function within comprehensive heritage assessment. This foundation enables us to articulate the broader implications for heritage science as a discipline, particularly regarding the relationship between advancing computational capabilities and the enduring necessity of humanistic methods. The broader implication for heritage science is that advancing computational analysis capabilities, as this study does through AI-enabled multimodal mapping, does not diminish the importance of qualitative methods but rather increases the imperative for their integration. More powerful quantification tools create greater risk of privileging measurable attributes simply because they are measurable, potentially marginalizing equally important values resisting quantification. The solution lies not in rejecting computational approaches but in designing assessment frameworks where quantitative precision serves rather than supplants qualitative understanding. Our framework provides template for this integration: by translating abstract scenic archetypes into measurable spatial variables, we enable systematic comparison and pattern identification across multiple sites and temporal periods; by situating these measurements within phenomenological understanding of how spatial configurations generate experiential qualities, we maintain connection between quantitative evidence and qualitative meaning; by acknowledging that spatial visual characteristics constitute only one dimension of heritage value, we position our contribution within broader assessment frameworks requiring multiple methodological approaches.

In conclusion, this study establishes an analytical framework that explores the spatial visual characteristics of scenic archetypes with AI-enabled multimodal mapping methods. By deconstructing abstract spatial concepts into measurable variables such as shape, size, position, and texture across foreground, middle ground, and background, the framework bridges traditional design principles with computational analysis, enabling replicable and interpretable mapping of landscape visual logic. While it effectively captures static visual configurations for defined archetypes, it does not yet address the temporal and kinesthetic dimensions central to sequential landscape perception or scene types requiring extended sightlines and dynamic compositional shifts. These methodological boundaries point to clear directions for advancement, including integrating eye-tracking and immersive virtual environments for dynamic visual modeling, applying advanced deep-learning architectures for element recognition in visually complex settings, extending metrics to seasonal, diurnal, and weather-induced variations, and validating the framework across diverse garden traditions to assess transferability. Equally important, future research should explore systematic integration of this spatial analysis framework with ethnographic documentation methods, oral history protocols, and participatory assessment techniques, ensuring that measurable spatial characteristics are interpreted within full cultural context encompassing intangible heritage values that our quantitative approach cannot directly capture16,103.

The broader significance of this work extends beyond disciplinary boundaries. As heritage landscapes confront accelerating pressures from urbanization, tourism, and climate change, analytical frameworks that synthesize traditional wisdom with contemporary technology become not merely useful but essential for cultural survival. By demonstrating that quantification, when properly designed, can reveal rather than obscure complexity, this study shows how computational approaches can illuminate patterns and relationships that purely qualitative methods might overlook. By rendering the implicit explicit, the tacit measurable, and the cultural computational, we enable new modalities for preserving, transmitting, and evolving landscape design traditions. Yet this enabling occurs not through replacing humanistic understanding with technical measurement but through creating complementary forms of evidence that together support more robust conservation decisions. The scenic archetypes of Chinese gardens, refined through centuries of iterative practice, encode spatial wisdom directly relevant to contemporary challenges of place-making in an increasingly mediated world. This framework ensures such wisdom remains not preserved solely as static heritage but actively operational as living knowledge, capable of informing and inspiring future practice while maintaining continuity with its cultural origins. Achieving this aspiration requires recognizing that quantitative spatial analysis and qualitative cultural understanding are not competing paradigms but mutually necessary components of comprehensive heritage stewardship, each revealing dimensions of significance the other cannot access, together enabling conservation approaches that honor both the measurable and the ineffable dimensions of heritage value.