Abstract
Semantic segmentation of Buddha facial point clouds guided by canonical proportional rules enables segmentation without manual labels while ensuring semantic consistency and structural continuity. Existing data-driven methods require large annotated datasets, often unavailable for Buddhist statue faces, while classical model-driven methods capture geometry but lack semantic consistency with sculptural canons. This study presents a knowledge-guided method that embeds bilateral symmetry, proportional divisions, and canonical anchors into a grid-based region-growing process with adaptive convexity thresholds, followed by morphological refinement to maintain continuity and reduce fragmentation. The method achieves F1-scores above 0.85 and mean IoU around 0.80 across diverse Buddha statue samples. These results indicate that canonical sculptural principles can be identified through computational analysis.
Similar content being viewed by others
Introduction
Buddha sculptures, particularly those preserved in museums and grotto temples, represent a central component of China’s cultural heritage, embodying both religious devotion and artistic innovation1,2,3. Sculptural faces in grotto sites such as Dunhuang, Yungang, and Longmen exhibit dynastic-specific facial styles, reflecting regional cultural exchanges and historical stylistic evolution, making precise facial analysis important for heritage studies4. Accurate semantic segmentation of facial components on 3D point clouds is therefore not merely a computational task, but a means to extract geometric features that comply with canonical measurement systems, providing quantitative evidence for analyzing historical proportion standards and structural variations across statues. In this regard, canonical proportional systems—standardized measurement rules preserved in Buddhist sculptural traditions, such as the “measurement canons” for Buddha statues5,6—offer a critical link between algorithmic segmentation and established sculptural rules, enabling both faithful digital representation and meaningful interpretation of facial geometry.
Recent advances in 3D digitization technologies—particularly terrestrial laser scanning and structured-light scanning—have enabled the acquisition of high-resolution point clouds of heritage artifacts7,8,9, preserving detailed facial geometry for analysis and documentation. Heritage point clouds are typically unstructured, irregularly sampled, and lack texture information, which limits the direct applicability of conventional geometric processing pipelines developed for industrial or anatomical datasets. Additionally, sculptural faces often follow canonical proportional systems that are not fully captured by standard descriptors. Prior studies on Buddha imagery have quantified facial geometric features, symmetry, and proportional evolution over historical periods, confirming that canonical measurements provide a reliable basis for semantic interpretation5. Methods inspired by classical face analysis—such as landmark geometry10 and low-dimensional characterization11—highlight the value of geometrically informed representations for guiding semantically meaningful segmentation.” For instance, Renoust et al.12, extracted facial contour lines from 68 landmarks and demonstrated that landmark-based ratios, such as eye-height-based proportional systems, can be effectively aligned with sculptural canons, further extending them into facial geometric indicators. These characteristics motivate the development of methods that incorporate prior knowledge of canonical proportional rules and geometric rules derived from Buddhist sculptural canons. Such knowledge serves to guide accurate and semantically meaningful segmentation of facial components on 3D point clouds.
Broadly, existing segmentation methods for 3D point clouds can be categorized as model-driven or data-driven. Model-driven approaches, such as region growing, exploit geometric continuity and heuristic constraints, making them suitable for heritage studies due to their interpretability and minimal data requirements. Representative techniques include curvature-based hierarchical segmentation13, RANSAC-augmented region growing14, multi-scale geometric descriptors15, and graph-based connectivity methods16. These methods effectively capture basic geometric patterns17,18, but their lack of explicit integration of canonical proportional rules limits their ability to produce geometrically and semantically consistent segmentation of facial components. In contrast, data-driven methods learn features directly from raw point clouds19,20. Conventional machine learning and, more recently, deep learning have achieved notable progress in industrial and anatomical applications21,22, and some studies have demonstrated their effectiveness in large-scale architectural heritage cases23,24,25. However, annotated datasets for sculptural faces remain scarce, and available repositories primarily cover non-heritage domains. Few-shot learning (FSL) and domain adaptation approaches can partially alleviate reliance on large annotated datasets26,27,28, yet their adaptability across stylistically diverse statues is limited29. These challenges underscore the need for strategies that deliver coherent and reproducible segmentation without reliance on extensive training data.
In recent years (2022–2025), research in cultural heritage point cloud segmentation has focused on two main directions: reducing annotation dependency and integrating domain knowledge, yet neither fully addresses the segmentation of Buddha facial point clouds without annotations, without texture, and under sculptural constraints. In the few-shot or weakly supervised direction, Zhao et al.30, proposed a framework based on teacher-guided consistency and contrastive learning, achieving semantic segmentation of ancient building point clouds with only 0.1% labeled data, representing a landmark in low-annotation heritage scenarios. Tsai et al.31, enhanced few-shot generalization by leveraging geometric and contextual information, supporting both base and novel classes. In the knowledge integration direction, Bassier et al.32, applied early- and late-stage fusion of image and point cloud segmentation to improve semantic understanding in heritage contexts, while Réby et al.33, explored the application of foundation models to heritage segmentation, aiming to reduce data dependency through pretraining.
Despite these advances, current methods still rely on support samples, annotations, multimodal inputs, or large-scale pretraining, and none explicitly incorporate sculptural conventions, limiting their applicability to newly discovered or textureless Buddha facial point clouds. These limitations highlight the need for a segmentation strategy that is fully prior-driven, capable of leveraging canonical sculptural rules and geometric constraints to achieve semantically consistent and reproducible results. To address these challenges, we propose a knowledge-guided region-growing method for the semantic segmentation of Buddha facial point clouds.
The approach encodes canonical proportional rules and sculptural canons—such as bilateral symmetry, axial alignment, and classical facial ratios—into structural constraints that guide the segmentation process, drawing upon canonical measurement principles documented in prior studies5,6. Seed points are initialized at anatomically meaningful positions, and boundaries are expanded through a grid-based algorithm with adaptive convexity thresholds, ensuring that regions evolve consistently with geometric structure and canonical proportions. Morphological refinement further consolidates fragmented edges, producing coherent delineations of facial components suitable for quantitative analysis. By integrating knowledge-guided constraints, the method enforces canonical proportions during segmentation, improving consistency across facial components while maintaining alignment with the original 3D geometry. In practice, geometric constraints (e.g., convexity and local surface continuity) regulate boundary expansion, while knowledge-guided constraints (e.g., bilateral symmetry, axial alignment, and proportional ratios) anchor the segmentation within canonical sculptural conventions.
The paper is organized as follows. The Introduction section presents the cultural context and technical background, motivating the need for prior knowledge-guided segmentation. The Methodology section details the proposed knowledge-guided region-growing approach. The Experiments and Results section evaluates the method on multiple Buddha statue samples and provides a quantitative analysis of segmentation performance. The Discussion section interprets the findings, examining their implications for reproducible analysis and measurement of heritage artifacts. Finally, the Conclusion section summarizes the contributions and outlines potential applications and future directions.
Methods
In this section, we present the proposed knowledge-guided region-growing method for the semantic segmentation of Buddha facial point clouds. The overall workflow is illustrated in Fig. 1, comprising four main stages: point cloud preparation, encoding of sculptural priors, region-growing based facial segmentation, and semantic refinement. The details of each stage are described in the following subsections.
Data preprocessing
Preprocessing constitutes a foundational stage in the proposed segmentation pipeline, establishing the spatial and structural consistency necessary for reliable downstream operations. The initial step involves point-level denoising and outlier removal, which not only cleans spurious measurements but also ensures that local surface geometry is accurately preserved. By mitigating the influence of isolated or noisy points, this step prevents distortion of key facial features and allows subsequent region growing algorithms to operate on geometrically meaningful data. Following denoising, pose normalization is applied to align each facial model to a canonical frontal orientation using symmetry- and axis-based constraints. This alignment not only enforces consistent spatial orientation across samples but also enables uniform application of growth parameters across diverse sculptural styles, enhancing comparability and robustness in feature localization.
Subsequently, the preprocessed 3D surface is orthogonally projected onto a structured 2D grid, generating a height map that encodes elevation information in a compact and analyzable form34,35, as illustrated in Fig. 2. This representation simplifies neighborhood computation, facilitates efficient morphological operations, and preserves essential geometric features critical for accurate delineation of facial regions. Careful grid sizing is employed to balance computational efficiency with retention of fine structural details. Collectively, these preprocessing operations establish a standardized, reproducible, and analytically tractable foundation, ensuring that the segmentation method can robustly handle variability in facial geometry, sculptural completeness, and stylistic diversity, ultimately supporting precise and reliable downstream semantic mapping.
Prior knowledge formalization
To achieve a structure-controlled segmentation mechanism in 3D scans of Buddhist sculptures, where texture information is typically absent due to the nature of laser or depth acquisition, and where dense manual annotations are prohibitively costly and require domain expertise, this paper proposes a modeling approach based on cultural and anatomical prior knowledge. This method formalizes stylistic norms and facial structural knowledge from Buddhist sculptures into a set of quantifiable geometric parameters, which are used to constrain the region growing and semantic segmentation process, thereby enhancing the robustness and interpretability of segmentation.
Both art-historical scholarship and computational studies provide evidence that Buddhist facial representations follow codified proportional systems and recurring geometric regularities. Traditional iconometric canons, as documented in The Buddhist Canon of Iconometry6, define standardized ratios for facial and bodily features, ensuring symbolic accuracy and visual harmony. Recent analyses of Buddhist facial depictions confirm that bilateral symmetry and proportional arrangements remain statistically consistent across different stylistic renderings5. From a computational perspective, the extraction of symmetry and structural alignments has been extensively studied in geometry processing12, supporting the feasibility of formalizing such regularities as operational constraints. Taken together, these sources indicate that the assumed priors—symmetry, vertical alignment, and proportional rules—are grounded in both sculptural traditions and modern analytical validation, rather than arbitrary assumptions. Table 1 presents a detailed breakdown of the prior categories, their sources, and their roles in the segmentation process, which are derived from both computational studies and traditional art-historical scholarship.
Based on this foundation, three types of prior knowledge are formalized in our method: bilateral symmetry, vertical arrangement structures, and proportional division rules5,6,36. While actual forms may vary due to weathering, abstraction, or regional carving styles, these priors are generalized and broadly applicable, rather than strict representations of any single historical sculpting tradition. To accommodate variations among individual statues, all priors are encoded as flexible spatial parameters with adjustable thresholds, allowing the segmentation algorithm to remain robust while respecting the inherent variability of Buddha facial structures. Building on these assumptions, the following geometric constraints are introduced into the model:
Bilateral symmetry: The symmetry plane π∗ is set according to canonical facial priors and used for pose normalization and feature pairing:
where π denotes the symmetry plane, and Reflect(\({p}_{i}\), π) represents the mirrored position of point \({p}_{i}\) across π.
Vertical alignment: The central axis Lmid is extracted from the aligned point cloud to guide the vertical structural analysis of facial components.
where \({\alpha }_{i}\) is the canonical scale factor derived from sculptural priors for the given feature pair, and \({H}_{{face}}\) is the total height of the face.
In Buddha facial carvings, different dynasties and regional styles exhibit consistent geometric trends. For example, Tang dynasty faces tend to be fuller and more proportionate, whereas Song dynasty faces are more elongated and delicate. To account for the influence of different styles on region-growing parameters, the proportional parameter \({\alpha }_{i}\) is defined as a parameter that can be adjusted according to the style category:
Here, \({\alpha }_{0}\) is the baseline proportion; \({R}_{{wh}}=W/H\) is the overall width-to-height ratio of the face, and \({P}_{v}\) represents the vertical proportions of key facial regions such as the eyes, nose, and eyebrows. The reference values \({R}_{0}\) and \({P}_{0}\) correspond to typical values for the given style, while \({k}_{w}\) and \({k}_{v}\) control the influence of deviations in these geometric metrics on \({\alpha }_{i}\). Through this mechanism, \({\alpha }_{i}\) is continuously adjusted according to the typical geometric features of different styles, providing style-aware control for region growing while preserving standard proportional rules in a parameterized and reproducible manner.
The above geometric priors are uniformly encoded into a structured parameter set:
where \({H}_{{zones}}\) refers to the division of the face into semantic vertical zones (e.g., upper, middle, lower) according to canonical proportional rules.
This parameter set formalizes cultural knowledge into quantifiable spatial conditions, providing the necessary structural guidance for seed point localization, region growing control, and semantic region segmentation.
Segmentation via prior-constrained region growing
To initiate the region-growing process, seed points for each facial component are determined by integrating local geometric features with anatomical and sculptural priors36. The components considered include the nose, eyes, mouth, ears, eyebrows, and chin, whose approximate locations are inferred from measurable surface features on the 3D scans and by referencing canonical facial proportions. This integration enables robust initialization across statues exhibiting stylistic variations or partial degradation.
Local geometric features capture surface characteristics such as peaks, valleys, and relative convexity, while anatomical and sculptural priors encode canonical relationships, including bilateral symmetry, interocular distance, and eye-to-brow proportions. Additionally, canonical proportional rules derived from historical iconometric standards provide guidance on the expected spatial relationships and scale of each facial component. However, while these rules offer a historical and anatomical basis for facial feature placement5,6,35,36, their direct application to Buddha facial point clouds—particularly those derived from laser scanning—often leads to inaccuracies or inconsistencies. These issues stem from challenges such as the conversion between 2D and 3D representations and variations in sculptural styles. These discrepancies arise from challenges such as the conversion between 2D and 3D representations and variations in sculptural styles37,38. To ensure practical operability in point cloud segmentation, these rules are adapted, as summarized in Table 2, to preserve cultural and anatomical plausibility while ensuring robust, semantically meaningful segmentation. Table 2 outlines the corresponding geometric features, anatomical priors, and canonical proportional rules for each facial component, collectively guiding seed point placement in the region-growing process.
Guided by local geometric features derived from surface elevation (Section 2.3.1), a region-growing strategy constrained by relative Z-value variations is applied for semantic segmentation39. Here, the Z-value refers to a convexity or concavity threshold that measures local height differences on the surface. Following the approach in ref. 40, the 3D point cloud is projected onto a unified 2D grid to simplify neighborhood queries and reduce the impact of uneven point density.
The Z-value threshold acts as a geometric constraint that guides the growth process along meaningful surface structures. It allows the algorithm to distinguish protruding and recessed facial features, such as the nose ridge, eye sockets, or lips, while preventing uncontrolled expansion across abrupt depth changes. This constraint improves the stability and reliability of the region-growing segmentation and facilitates accurate identification of key facial features for subsequent analysis.
Let \(G\) denote the set of grid cells, with each cell \(g\in G\) maintaining a corresponding 3D point set \(P\left(g\right).\) Region growing starts from the seed cell \({g}_{0}\), identified as the local elevation maximum within a facial subregion, and proceeds in the 2D grid while preserving accurate mapping to the original geometry.
The 3D-to-2D grid projection preserves the spatial correspondence between grid cells and the original point cloud, as illustrated in Fig. 3a. This mapping ensures that the region-growing process in the 2D grid accurately reflects the underlying 3D facial geometry. However, discretization inevitably introduces jagged boundaries along the edges of features (Fig. 3b), which may affect the precision of semantic segmentation. These artifacts are subsequently mitigated through morphological refinement (Section 2.3.3), ensuring smooth and anatomically consistent boundaries.
The core expansion criterion is a Z-gradient constraint: a neighboring cell \({g}^{{\prime} }\) is included only if
where \(d\) is the grid distance from the seed cell \({g}_{0}\), and the dynamic threshold \(\tau \left(d\right)\) is defined in Eq. (6). This dynamic threshold design draws on the idea of adaptively adjusting thresholds based on local geometric features41.
with \(\varDelta x,\varDelta y\), representing the horizontal and vertical offsets relative to the seed. This mechanism balances tolerant expansion along gentle slopes and suppression of backward growth into concave or flat areas.
Parameters \(\lambda ,{\alpha }_{x},{\alpha }_{y}\), and maximum growth distance are adjusted according to the geometrid characteristics of different facial features. For instance, the nose region emphasizes horizontal constraint, while eyes and lips incorporate adaptive offsets to better fit their shapes.
A breadth-first search is used to explore neighboring cells42. Cells meeting the threshold condition are queued and marked. Expansion stops when no neighbors satisfy the constraint, the Z- value abruptly decreases, or the distance exceeds a preset limit.
The resulting grid region \(R\subset G\) maps to the point set
which corresponds to the segmented facial component in 3D.
The nasal region growth process is illustrated in Fig. 4. The figure demonstrates how the region grows from a selected seed point based on the 2D grid projection onto the original facial surface. Panel (a) shows the initial segmented nasal region, while (b) depicts the extracted nasal point cloud. Panel (c) provides a schematic illustration of the growth direction, highlighting how the algorithm propagates outward across the facial surface, guided by a set of pre-defined rules. These rules, based on local geometric features derived from surface elevation, ensure that the growth follows the Z-value threshold, which measures local height differences. The Z-value constraint helps distinguish between protruding and recessed facial features, such as the nasal ridge and eye sockets, and prevents misalignment with other facial structures. This geometric constraint improves the stability and reliability of the region-growing segmentation, ensuring accurate identification of key nasal features and avoiding uncontrolled expansion into neighboring regions.
Post-processing and refinement
Building upon the seed-based segmentation of primary facial components such as the eyes, nose, and mouth, these localized results are further abstracted into landmark anchors that serve as structural reference points. In this context, anchors are derived from the centroids or boundaries of the segmented features and provide the basis for projecting canonical proportional rules onto the entire face. Unlike purely data-driven clustering or curvature-based segmentation, this approach integrates culturally informed anatomical priors rooted in classical Buddhist sculptural canons.
Specifically, the division process follows traditional experience and classical texts, as described in the Buddhist Canon of Iconometry6, which confirms the relative proportions of the facial features of Buddha statues. The face is subdivided into three equal vertical sections—the forehead (upper division), the nose (middle division), and the chin (lower division)—thus establishing an internal geometric method for semantic organization.
The entire facial region \({\mathscr{F}}\) is composed of three mutually exclusive subregions: forehead (\({\varOmega }_{{\rm{forehead}}}\)), nose (\({\varOmega }_{{\rm{nose}}}\)), and chin (\({\varOmega }_{{\rm{chin}}}\)).
Each subregion is approximately equal in vertical length, following the canonical proportional rules and anatomical priors described in Section “Prior Knowledge Formalization”. Anchored by the centroids of segmented features, the spatial masks for semantic subregions are constructed accordingly: the forehead occupies the upper third of the facial length above the eyes, the cheeks extend laterally between the eyes and mouth, consistent with classical Buddhist sculpture esthetics, and the lower third corresponds to the chin region. This approach ensures that the subdivision is both geometrically principled and culturally informed.
It should be noted that regions beyond the key semantic components—such as the forehead, cheeks, and peripheral facial contours—are derived deterministically from anthropometric priors and canonical proportional anchors. As these zones are not segmented through data-driven procedures, they are excluded from quantitative evaluation but retained to ensure semantic completeness and structural consistency.
Although the initial region segmentation based on geometric features can achieve satisfactory results in most cases, defects such as isolated small regions, local discontinuities, and incomplete region boundaries may still occur due to surface noise, occlusion, or geometric variations. To address these issues and further improve the completeness and coherence of the segmented regions, we adopt a newly introduced43 three-step post-processing workflow:
Feature-Constrained Boundary Erosion: A feature-constrained erosion process is first applied to refine the boundaries of the segmented region. The procedure starts from boundary grids with the highest elevation values and progressively removes peripheral grids whose mean height \({z}_{{neighbor}}\) deviates significantly from the regional reference height \({z}_{{ref}}\), computed as the average Z-value of the entire segmented region. The global Z-range, defined as \({z}_{\max }-{z}_{\min }\), is used to normalize these differences, and a proportion coefficient \(p\) (typically 0.02–0.05, determined empirically from the dataset’s Z-value distribution) sets the allowable deviation. The erosion continues until the elevation difference meets the following criterion:
This proportional thresholding ensures that only outliers and spurious protrusions are removed, while preserving the intrinsic geometry of the target feature.
Morphological Closing: To restore the continuity of the segmented regions and fill narrow gaps created during erosion, a morphological closing operation is applied. Let \(\varOmega\) denote the binary mask of the segmented region and \(B\) the structuring element, which is typically chosen as a circular or square shape. The size of \(B\) is adaptively determined according to the scale of the target feature—smaller radii (1–2 grid units) are used for fine components such as eyebrows or eyelids, while larger radii (3–5 grid units) are used for broader structures like the chin44. The erosion step produces an intermediate region \(\varOmega {\prime}\) by removing boundary cells that do not fit the desired structure:
where \(\ominus\) denotes erosion. Closing then dilates \(\varOmega {\prime}\) and erodes it with the same B to reconnect fragmented areas and mitigate boundary irregularities, \(\oplus\) denotes dilation. This process reunites semantically coherent regions into more coherent, unified shapes while preserving geometric consistency.
To restore the continuity of the segmented regions and fill narrow gaps created during erosion, a morphological closing operation is applied (Fig. 5a, b). In the figure, red points indicate elements removed or present before closing, while green points represent retained or newly added points. This process reunites semantically coherent regions into smoother, unified shapes while preserving geometric consistency.
Feature-Specific Template Alignment: Certain facial components, such as the eyes and chin, exhibit structurally ambiguous boundaries due to shallow depth transitions, incomplete geometry, or sculptural stylization. To improve segmentation robustness in these regions, we propose a generalizable shape refinement module that adaptively adjusts the grown regions using parametric boundary functions, rather than relying on rigid template matching.
Feature-specific template alignment (e.g., Fig. 5b for the eye region) is applied to facial components that may have incomplete or ambiguous boundaries. This process adaptively expands and adjusts initially segmented points along the component’s structural axes, filling gaps and extending partial boundaries. While illustrated here for the eyes, the method generalizes to other features, producing more coherent and semantically meaningful segmented regions across the face.
This method operates by fitting boundary-expansion profiles along the structural axis of the component (e.g., vertical axis for eye-like shapes), guided by cultural priors. The expansion width at each row or column is defined as:
where \(t\) is the coordinate along the component’s principal axis (e.g., row index), \(d\left(t\right)=\frac{\left|t-{t}_{c}\right|}{D}\) is the normalized distance to the center, and \(D\) defines the maximum half-span. Parameters \({w}_{{\rm{base}}}\) and \({w}_{\max }\) are defined to control the curvature profile.
Parameters \({w}_{{\rm{base}}}\) and \({w}_{\max }\) control the curvature profile. In practice, they are determined from canonical facial proportions (e.g., eye half-width relative to facial width) or estimated from the median half-span of the initially grown region, ensuring both cultural plausibility and geometric stability.
Results
Experimental setup
To ensure consistent and meaningful evaluation, all point clouds underwent a standardized preprocessing pipeline, starting with the extraction of facial regions from complete statue scans, followed by normalization and projection onto a 2D grid. The facial point clouds, manually extracted to cover the entire facial area, including the ears, were normalized with the geometric centroid of the facial region as the origin and scaled to a unified dimension such that the maximum diameter of the facial bounding circle equals 1, eliminating size discrepancies between samples. Projection was performed onto the X-Y plane, with Z-values retained as height features for each grid cell.
Different grid resolutions were empirically tested to balance structural fidelity and segmentation stability. Finer resolutions (e.g., 60 × 60 or 70 × 70) preserved more sampling detail but caused fragmentation in regions with complex Z-value variations, such as the lips, leading to unstable region growth. Conversely, coarser grids below 50 × 50 failed to capture sufficient local geometry, resulting in the loss of critical facial features. A 50 × 50 resolution was therefore adopted as a balanced choice, ensuring both adequate local detail and stable connectivity across all samples. At this resolution, each grid cell intersecting the facial surface contains 20–50 valid points, supporting the construction of an 8-connected manifold for region-growing segmentation. Anchors for region growth, typically the nasal ridge combined with either an eye or mouth region, were chosen for their convexity, centrality, and structural stability, providing reliable axial and transverse references.
For evaluation, the dataset comprised nine synthetic facial point clouds (Samples 1–9) and six real-scanned facial point clouds (Samples 10–15). The real samples were included to broaden stylistic, material, and preservation diversity: Samples 10–11 (Fig. 6a, b) are from the Dazu Rock Carvings, Samples 12–13 (Fig. 6c, d) from the Hanging Temple (Xuankongsi), and Samples 14–15 (Fig. 6e, f) from the No.18 niche of the Yungang Grottoes. Together, these 15 samples cover a wide spectrum of Chinese sculptural traditions, including cliff-face carvings, temple statues, and sandstone grotto figures, as well as variations in surface preservation and point cloud density. Figure 6 illustrates the six real heritage samples. These scans vary in carving style, geometric morphology, surface integrity, and point cloud density, reflecting differences in scanning conditions and statue characteristics. All scans were manually cleaned and cropped to isolate the facial regions relevant to this study. Despite the variations in sampling density, the scans retain sufficient geometric detail for reliable region-growing segmentation. Combining these real-world scans with synthetic models allows evaluation under both controlled and practical conditions, ensuring robustness across variations in point density, preservation state, and sculptural style.
Segmentation results on diverse samples
We evaluate segmentation accuracy using mean Intersection-over-Union (mIoU) and the point-wise F1 score, which capture the overall semantic consistency and feature-localization precision of the predicted regions. To assess boundary fidelity, we further adopt the Normalized One-Sided Hausdorff Distance (NOHD)46, enabling scale-independent comparison between the predicted contours and the ground-truth annotations. The evaluation focuses on six key facial components—eyes, eyebrows, ears, mouth, nose, and chin—which correspond to the primary sculptural landmarks targeted by our segmentation framework, while broader regions such as the forehead or cheeks are excluded because they function mainly as contextual structures rather than explicit segmentation objectives.
The ground truth (GT) for the segmented regions was manually annotated by experienced researchers with expertise in Buddhist sculpture. All nine synthetic models and the single real-scanned model were annotated, with each region labeled independently by two annotators. Any disagreements were resolved through consensus discussion. This procedure provides a high-confidence reference standard while acknowledging the inherent subjectivity in interpreting sculptural features.
The quantitative results of our segmentation method across multiple representative samples are summarized in Table 3. As shown, the proposed method demonstrates stable and reliable performance on all six semantic facial regions, with F1-scores consistently exceeding 0.80 for the majority of samples. Particularly, the ears, lips, and nose regions exhibit high precision and recall, indicating the effectiveness of the combined region-growing and morphological post-processing strategies in handling regions with clear geometric boundaries.
As shown in Fig. 7, the overall segmentation of the facial point clouds demonstrates that the proposed method maintains stable performance across varied sample geometries and poses. This diversity in the evaluation set—covering differences in morphology, point density, resolution, and surface degradation—is beneficial for assessing generalization. Despite these variations, the algorithm exhibits consistently low variance in segmentation accuracy, indicating that it does not rely on any single geometric condition. By integrating geometric continuity with canonical proportional constraints, the method adapts effectively to heterogeneous input characteristics, producing comparable outcomes across both synthetic and real-scanned data.
To evaluate the effectiveness of the proposed prior knowledge-guided segmentation, it was compared against a representative geometry-based clustering method. Qualitative results are shown in Fig. 8.
Conventional geometry-based clustering produced fragmented or inconsistent regions, especially in areas with low curvature or partially eroded features. The proposed method generated more continuous and visually coherent segmentation across all evaluated facial regions, showing fewer boundary breaks and missing areas. These observations provide a clear visual indication of the improvement in segmentation continuity achieved by the knowledge-guided method.
Recent works on 3D point cloud segmentation in cultural heritage have primarily focused on architectural structures, such as walls, façades, and columns, using methods ranging from supervoxel-based edge detection to deep learning frameworks. A few studies address heritage statues or artifacts, but they are limited to classification, reconstruction, or denoising tasks and do not provide fine-grained semantic labels for facial parts. As no public datasets or pre-trained models exist for sculpture face segmentation, conventional FSL or domain-adaptive methods cannot be directly applied. Therefore, to provide a quantitative reference, we evaluated a small-sample PointNet++ model on our dataset, which, while performing poorly due to insufficient data and domain shift, demonstrates the difficulty of adapting existing FSL pipelines to sculpture facial segmentation and highlights the suitability of the proposed prior knowledge-guided method for this specialized task.
As shown in Table 4, while high-resolution methods and transfer learning perform well on their respective datasets, they are either unsuitable or unavailable for sculpture face segmentation. The PointNet++ few-shot baseline illustrates the challenge of applying existing FSL approaches to this task, achieving very low mIoU. In contrast, the proposed prior knowledge-guided method achieves high segmentation accuracy with minimal data, reflecting the advantages of integrating structural priors when dealing with the complex geometric patterns characteristic of sculptural facial models.
Parameter sensitivity analysis
To evaluate the stability of the proposed method under different parameter settings, we performed a parameter sensitivity analysis on three key facial regions: the nose, eyes, and lips. The analysis combines quantitative accuracy measurements with visual comparisons of segmentation masks under different parameter settings.
For the nose region, Fig. 9c shows a clear decline in IoU as the thresholds become more relaxed. Strict settings keep the region growing tightly aligned with the Z-gradient boundary, while looser thresholds tolerate broader depth transitions and allow gradual leakage into neighboring areas. This monotonic degradation indicates that the nose is moderately sensitive to parameter changes, and balanced thresholds best preserve both completeness and boundary precision. For the eyes and lips, a different evaluation strategy was applied: instead of scanning the full parameter range, we compared the optimal configuration with one smaller and one larger setting to assess local stability. As shown in Fig. 9b, the eye region remains largely stable—the IoU distribution stays compact even under biased parameter choices, reflecting the strong geometric constraints imposed by the concave eye socket. The lips, by contrast, show markedly higher variability. Figure 9a illustrates that relaxed thresholds cause clear overgrowth, while strict thresholds overly suppress the expansion and truncate the curved lip contour. This bidirectional sensitivity is also evident in the wide spread of IoU values across samples. These observations confirm that the lip region is inherently more sensitive to threshold variations, whereas the method remains consistent when parameters stay within a reasonable operating range.
a Lips region segmentation (varying parameters); b Eye region segmentation (varying parameters); c Nasal region segmentation (varying parameters); d Parameter sensitivity plot (IoU, mean ± standard deviation); e IoU trend plot (parameter vs. sample, mean ± std); f IoU heatmap (parameter vs. sample).
These numerical observations are consistent with the qualitative comparisons in Fig. 9d–f. Relaxed parameters generally lead to boundary dilation, whereas strict thresholds enforce tighter adherence to the depth-guided structure at the cost of slight incompleteness. As a complement, Table 5 summarizes the representative IoU values under different parameter intervals, illustrating the overall sensitivity patterns without altering the conclusions drawn from the visual and statistical analyses. Overall, the method remains stable within the normal operating range, supported by the combined influence of depth-gradient cues and local connectivity. This robustness stems from the integration of depth-gradient cues and local connectivity, which effectively constrain region growth under moderate parameter perturbations.
Ablation study
An ablation study was conducted to assess the contribution of the key post-processing modules—morphological erosion, morphological closing, and shape-adjustment refinement—to the segmentation performance. Each module was sequentially disabled, and the resulting impact was quantitatively evaluated across representative samples, as summarized in Table 6, with visual illustrations shown in Fig. 10.
When all three modules were active, the segmentation achieved the highest overall accuracy, F1 scores, and mIoU across all facial regions, indicating the effectiveness of the complete post-processing pipeline. Disabling the shape-adjustment refinement while keeping erosion and closing active resulted in a notable drop in Eyes and Chin F1 scores, as shown in Table 6 (Sample 1: Eyes F1 0.6226 vs. 0.8684; Chin F1 0.8549 vs. 0.8842), demonstrating that this module plays a critical role in maintaining structural consistency, particularly in regions where symmetry and proportionality are important. When both closing and template refinement were turned off (erosion only), internal gaps and fragmented boundaries persisted, causing further reduction in F1 scores for eyes and chin, as well as decreased overall accuracy and mIoU. Finally, with all three post-processing modules disabled, the segmentation performance dropped significantly, with substantial fragmentation and misalignment across all tested regions, reflecting the necessity of each module in achieving robust and coherent results.
These observations confirm that morphological erosion regulates boundary growth and prevents local over-expansion, morphological closing reconnects fragmented regions and fills small gaps to preserve completeness, and shape-adjustment refinement enforces anatomical plausibility based on symmetry and proportional priors, particularly benefiting structurally sensitive regions such as the eyes and chin. Together, these modules synergistically enhance segmentation completeness, boundary regularity, and structural fidelity under challenging geometric conditions (Fig. 10; Table 7).
To clarify the individual contribution of each prior knowledge component to semantic consistency, we conduct a deep correlation analysis based on “prior positioning-module dependency-performance variation”. Specifically, bilateral symmetry, implemented via the Erosion module, optimizes the semantic alignment of symmetric organs (e.g., eyes, nasal alae) by eroding asymmetric regions; when the Erosion module is disabled, the F1-scores of symmetric organs decrease by 10–15% (e.g., the nose F1-score of Sample 1 drops from 0.8991 to 0.7715, a 14.2% decrease), directly affecting facial semantic balance. The implicit constraint of vertical axial alignment is embodied in the Closing module, which reinforces vertical structures to provide a stable structural reference for vertically distributed organs like the nose and chin; when the Closing module is disabled, the F1-scores of these organs decrease by 25–45% (e.g., the chin F1-score of Sample 3 drops from 0.8242 to 0.5233, a 36.4% decrease), with obvious vertical offset of organs observed in segmentation results. Proportional partitioning rules, translated into specific semantic boundaries via the Template module, serve as the core constraint for defining organ scales; when the Template module is disabled, the F1-scores of multiple organs decrease by an average of over 30% (e.g., the eye F1-score of Sample 1 drops from 0.8684 to 0.6226, a 28.3% decrease), and the global mIoU decreases significantly, while also directly affecting the optimization effect of the Erosion module—only within the “reasonable symmetric range” defined by proportional rules can Erosion accurately erode asymmetric parts and avoid excessive elimination of valid semantic regions. In summary, bilateral symmetry is a “key supplement” for optimizing semantic balance, vertical axis alignment is a “prerequisite” for ensuring the rationality of semantic structure, and proportional segmentation rules are the “core constraints” for defining semantic boundaries. All three together ensure the semantic consistency of segmentation results.
Boundary accuracy evaluation
To quantitatively evaluate the boundary alignment of predicted facial regions, we employ the Normalized Hausdorff Distance (NOHD)46. By normalizing the traditional Hausdorff distance with respect to the bounding box scale of the point cloud, NOHD removes absolute size dependencies and allows for consistent comparison across different samples and facial features.
As shown in Table 8, our method consistently reduces NOHD across all key facial regions, indicating improved alignment between predicted and ground-truth boundaries. Notably, the eye regions exhibit the most substantial improvement, particularly where initial discrepancies were large, reflecting the method’s ability to correct local deviations. The nose region, while initially closer to ground truth, also shows a consistent reduction in NOHD, demonstrating stable performance even in areas with lower starting error. Overall, these results highlight that the method effectively preserves geometric consistency across diverse facial features and styles, ensuring precise spatial alignment of critical facial landmarks and providing a reliable foundation for subsequent segmentation or geometric analysis.
Discussion
The segmentation of Buddhist facial point clouds presented in this study demonstrates that domain-specific structural priors can be effectively embedded into a point cloud processing workflow to achieve consistent and reproducible partitioning without annotated data. The key contribution lies not in the segmentation per se, but in formalizing sculptural canons as operational anchors that guide local growth while maintaining alignment with the 3D geometry. Combined with a grid-based projection that regularizes irregular sampling, the use of bilateral symmetry, proportional divisions, and predefined anchor zones enables stable reconstruction of facial subregions, even under variable point densities.
Analysis of the results highlights the advantages of canonical guidance in a measurement context. Unlike data-driven networks that rely on dense training examples, the proposed method operates independently of large annotated datasets by leveraging embedded priors derived from canonical structures. Traditional geometric segmentation methods—such as curvature-based partitioning, clustering, or convexity-driven region growing—capture local surface continuity but often fail to align boundaries with the intended structural partitions. Canonical anchors, including nasal and facial axes, provide reproducible reference points that systematically constrain region expansion according to geometric ratios, ensuring robust segmentation even when parameter thresholds are moderately relaxed. Variations in facial geometry across different Buddha statues—including stylization, feature prominence, and local incompleteness—affect segmentation outcomes. Fixed anchors or strict proportional templates can result in suboptimal delineation in certain areas, particularly around the eyes and mouth. These observations indicate that structural constraints must be flexibly applied in accordance with actual 3D geometry to maintain reproducible and semantically meaningful segmentation across samples.
Prior knowledge is operationalized as constraints for each sample, with anchors such as symmetry axes and facial reference lines guiding region growth and morphological refinement. Classical two-dimensional facial proportions are transformed into computationally implementable priors, which can be adjusted to accommodate stylistic deviations while preserving measurable alignment. This transformation process, along with the original two-dimensional facial proportion standard, is visually illustrated in Fig. 11. Once priors are established, parameters controlling region growth and morphological operations are applied in the context of these anchors. Parameter sensitivity is meaningful only relative to the defined priors, and adjustments are performed systematically to ensure structural continuity and measurement consistency across diverse facial profiles.
The use of a 2D grid projection in the segmentation method serves as a practical means to structure the unstructured point cloud data, enabling controlled region-growing operations while preserving the spatial relationships necessary for canonical prior guidance. By mapping the point cloud onto a regular grid, the method can systematically apply proportional rules and anchor constraints across the surface, facilitating consistent delineation even in areas with uneven point density. This discretization simplifies neighborhood computations, stabilizes the expansion process, and provides a clear visualization for both debugging and result inspection (see Fig. 12 for an example of eye and eyebrow segmentation, illustrating boundary errors caused by improper seed placement and staircase-like artifacts introduced by the grid).
The proposed multi-stage region-growing framework for facial point cloud segmentation performs reliably on most intact samples but faces challenges on severely weathered or structurally incomplete facial data. Heritage facial point clouds may exhibit missing earlobes, damaged nasal bridges, or gaps at eye corners caused by weathering or structural loss, which disrupt local geometric continuity and reduce the reliability of gradient-based discrimination, relative positional priors, and morphological constraints.
Figure 13 illustrates the effects of distinct facial defects on region segmentation, with two subfigures presenting contrasting scenarios. Subfigure (a) depicts the impact of nasal bridge damage on region segmentation: as facial feature region growth relies on local Z-value gradients and spatial connectivity, damage to the nasal bridge introduces gradient discontinuities—these can either cause premature termination of nasal region expansion or lead to its erroneous extension into the cheek area, thereby disrupting the subsequent localization of peri-nasal features such as the eyes and mouth. In contrast, Subfigure (b) shows a naturally eroded pit located in a non-facial-feature area: while this pit impairs local facial structural integrity, it does not interfere with the region segmentation outcome, as its position lies outside the key facial feature zones that drive the segmentation process.
Comparison with traditional segmentation methods highlights the advantages of the proposed approach. When applied to the same samples, curvature-based segmentation often led to over-segmented facial regions with noisy boundaries that poorly matched sculptural semantics, while geometric clustering or convexity-based methods became unstable under irregular point densities. In contrast, our prior-guided method produced semantically coherent partitions, accurately preserving facial subregions such as the eyes, nose, and mouth. Both quantitative metrics (e.g., F1 scores, mIoU; see Table 3) and qualitative observations (Figs. 7–10) confirm that embedding canonical priors improves boundary regularity, structural consistency, and interpretability, supporting downstream tasks such as restoration analysis and art-historical study.
Future improvements could focus on adaptive parameter tuning, automatic anchor detection, and extension to other types of heritage point clouds, enabling broader applicability in measurement and digital preservation of cultural artifacts. These developments would further enhance automation, reproducibility, and the integration of prior knowledge into large-scale 3D heritage analysis.
Data availability
The dataset used in this study is publicly registered on Zenodo (DOI: 10.5281/zenodo.17355934). Access to the files is restricted and requires permission from the corresponding author. The code used in this study is publicly registered on Zenodo (DOI: 10.5281/zenodo.17355934). Access to the files is restricted and requires permission from the corresponding author.
Code availability
The code used in this study is publicly registered on Zenodo (DOI: 10.5281/zenodo.17355934). Access to the files is restricted and requires permission from the corresponding author.
References
Forrest, C. International Law and the Protection of Cultural Heritage (Routledge, 2012).
Waterton, E. & Smith, L. Heritage protection for the 21st century. Cult. Trends 17, 197–203 (2008).
Kakiuchi, E. Cultural heritage protection system in Japan: current issues and prospects for the future. Gdańskie Studia Azji Wschodniej 10, 7–27 (2016).
Fu, J. et al. Tracing the historical development and spatial distribution of Buddhist temples in Xiamen, China. npj Herit. Sci. 13, 397 (2025).
Yang, Y. & Fan, F. Ancient thangka Buddha face recognition based on the Dlib machine learning library and comparison with secular aesthetics. Herit. Sci. 11, 137 (2023).
Jing, Z. L., Gömpojab, Cai, J. & Henss, M. The Buddhist Canon of Iconometry (Fabri Verlag, 2000).
Apollonio, F. I., Fantini, F., Garagnani, S. & Gaiani, M. A photogrammetry-based workflow for the accurate 3D construction and visualization of museums assets. Remote Sens. 13, 486 (2021).
Rossi, M. & Bournas, D. Structural health monitoring and management of cultural heritage structures: a state-of-the-art review. Appl. Sci. 13, 6450 (2023).
Tapete, D. et al. Integrating radar and laser-based remote sensing techniques for monitoring structural deformation of archaeological monuments. J. Archaeol. Sci. 40, 176–189 (2013).
Sirovich, L. & Kirby, M. Low-dimensional procedure for the characterization of human faces. J. Opt. Soc. Am. A 4, 519–524 (1987).
Belhumeur, P. N., Hespanha, J. P. & Kriegman, D. J. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19, 711–720 (1997).
Renoust, B., Oliveira Franca, M., Chan, J., Garcia, N., Le, V. & Uesaka, A. Historical and modern features for Buddha statue classification. In Proc. 27th ACM International Conference on Multimedia (MM’19), 23–30 (ACM, New York, 2019).
Pérez-Sinticala, C. et al. Evaluation of primitive extraction methods from point clouds of cultural heritage buildings. In Structural Analysis of Historical Constructions (ed. RILEM Bookser) vol. 18, 2332–2341 (Springer, 2019).
Tarsha-Kurdi, F., Landes, T. & Grussenmeyer, P. Hough-transform and extended RANSAC algorithms for automatic detection of 3D building roof planes from lidar data. In ISPRS Workshop Laser Scanning & SilviLaser 2007 (407–412) (Espoo, 2007).
Mirande, K. et al. A graph-based approach for simultaneous semantic and instance segmentation of plant 3D point clouds. Front. Plant Sci. 13, 1012669 (2022).
Bi, S. et al. Optical classification of inland waters based on an improved fuzzy c-means method. Opt. Express 27, 34838–34856 (2019).
Xiao, J. et al. Tiny object detection with context enhancement and feature purification. Expert Syst. Appl. 211, 118665 (2023).
Sammartano, G., Avena, M., Fillia, E. & Spanò, A. Integrated HBIM-GIS models for multi-scale seismic vulnerability assessment of historical buildings. Remote Sens. 15, 833 (2023).
Liu, P., Wang, L., Ranjan, R., He, G. & Zhao, L. A survey on active deep learning: from model driven to data driven. ACM Comput. Surv. 54, 1–34 (2022).
Yang, S., Hou, M. & Li, S. Three-dimensional point cloud semantic segmentation for cultural heritage: a comprehensive review. Remote Sens 15, 548 (2023).
Moyano, J., León, J., Nieto-Julián, J. E. & Bruno, S. Semantic interpretation of architectural and archaeological geometries: point cloud segmentation for HBIM parameterisation. Autom. Constr. 130, 103856 (2021).
Paiva, P. V. V., Cogima, C. K., Dezen-Kempter, E. & Carvalho, M. A. G. Historical building point cloud segmentation combining hierarchical watershed transform and curvature analysis. Pattern Recognit. Lett. 135, 114–121 (2020).
Matrone, F. et al. Comparing machine and deep learning methods for large 3D heritage semantic segmentation. ISPRS Int. J. Geo-Inf. 9, 535 (2020).
Kulkarni, U., Meena, S. M., Gurlahosur, S. V. & Mudengudi, U. Classification of cultural heritage sites using transfer learning. In IEEE International Conference on Multimedia Big Data (BigMM) (391–397) (IEEE, 2019).
Dong, Q., Wei, T., Zhang, Q., Jia, X. & Pan, B. The texture of Chinese garden rockery stones: based on 3D point cloud and 3D printing technology. npj Herit. Sci. 13, 47 (2025).
Croce, V., Caroti, G., De Luca, L., Piemonte, A. & Véron, P. Semantic annotations on heritage models: 2D/3D approaches and future research challenges. Remote Sens. Spat. Inf. Sci. XLIII-B2-2020, 829–836 (2020).
Croce, V. et al. Semi-automatic classification of digital heritage on the Aïoli open source 2D/3D annotation platform via machine learning and deep learning. J. Cult. Herit. 62, 187–197 (2023).
Matrone, F. & Martini, M. Transfer learning and performance enhancement techniques for deep semantic segmentation of built heritage point clouds. Virtual Archaeol. Rev. 12, 73 (2021).
Aliferis, C. & Simon, G. Overfitting, underfitting and general model overconfidence and under-performance pitfalls and best practices in machine learning and AI. In Artificial Intelligence and Machine Learning in Healthcare and Medical Sciences: Best Practices and Pitfalls 477–524 (Springer, 2024).
Zhao, J. et al. Semantic segmentation of point clouds of ancient buildings based on weak supervision. Herit. Sci. 12, 232 (2024).
Tsai, C. J., Chen, H. T. & Liu, T. L. Pseudo-embedding for Generalized Few-Shot 3D Segmentation. Computer Vision—ECCV 2024. Lecture Notes in Computer Science (eds Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T. & Varol, G.), vol. 15094 (383–400) (Springer, 2025).
Bassier, M., Mazzacca, G., Battisti, R., Malek, S. & Remondino, F. Combining image and point cloud segmentation to improve heritage understanding. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-2/W4-2024, 49–56 (2024).
Réby, K., Guilhelm, A. & De Luca, L. Semantic segmentation using foundation models for cultural heritage: an experimental study on Notre-Dame de Paris. In Proc. IEEE/CVF International Conference on Computer Vision (1689–1697) (IEEE, 2023).
Asthana, A., Marks, T. K., Jones, M. J., Tieu, K. H. & Rohith, M. Fully automatic pose-invariant face recognition via 3D pose normalization. In IEEE International Conference on Computer Vision (937–944) (IEEE, 2011).
Furferi, R. et al. From 2D to 2.5D i.e. from painting to tactile model. Graph. Models 76, 706–723 (2014).
Wei, W. et al. Assessing facial symmetry and attractiveness using augmented reality. Pattern Anal. Appl. 25, 635–651 (2022).
Liu, S., Wei, C., Li, M., Cui, X. & Li, J. Adaptive superpixel segmentation and pigment identification of colored relics based on visible spectral images. Herit. Sci. 12, 1 (2024).
Shi, J., Samal, A. & Marx, D. How effective are landmarks and their geometry for face recognition? Comput. Vis. Image Underst. 102, 117–133 (2006).
Gautam, S. & Jha, K. A new region-growing-based fast hybrid segmentation technique for 3D point clouds. Sādhanā 50, 136 (2025).
Sa, J. et al. Depth grid-based local description for 3D point clouds. SIViP 18, 4085–4102 (2024).
Liu, D., Li, D., Wang, M. & Wang, Z. 3D change detection using adaptive thresholds based on local point cloud density. ISPRS Int. J. Geo-Inf. 10, 127 (2021).
Silvela, J. & Portillo, J. Breadth-first search and its application to image processing problems. IEEE Trans. Image Process. 10, 1194–1199 (2001).
Said, K. A. M. & Jambek, A. B. Analysis of image processing using morphological erosion and dilation. J. Phys. Conf. Ser. 2071, 012033 (2021).
Gonzalez, R. C. & Woods, R. E. Opening and closing in Digital Image Processing (4th edn) 644–647 (Pearson, 2017).
Sun, W. et al. 3D face parsing via surface parameterization and 2D semantic segmentation network. Preprint at https://doi.org/10.48550/arXiv.2206.09221 (2022).
Zhang, D. et al. An efficient approach to directly compute the exact Hausdorff distance for 3D point sets. Integr. Comput. Aided Eng. 24, 261–277 (2017).
Acknowledgements
We acknowledge grants from the National Natural Science Foundation of China (No. 42171444).
Author information
Authors and Affiliations
Contributions
S.W. developed the point cloud processing code, performed the data analysis, produced the figures, and interpreted the experimental results. M.H. conceived the study, provided the overall research idea, and served as the corresponding author. S.Y. contributed to manuscript revision and correspondence. H.L. participated in data preprocessing and organization. S.L. provided valuable suggestions for experimental design and manuscript improvement. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wei, S., Hou, M., Yang, S. et al. Semantic segmentation of Buddha facial point clouds through knowledge-guided region growing. npj Herit. Sci. 14, 109 (2026). https://doi.org/10.1038/s40494-026-02377-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s40494-026-02377-y















