Introduction

Intangible cultural heritage (ICH) serves as a spiritual vessel of human civilization, embodying rich cultural information resources and symbolic meaning. Its sustainable preservation is crucial not only for perpetuating historical continuity and maintaining cultural diversity but also for safeguarding the foundation of collective human memory. However, globalization and digitalization—while creating new opportunities—have also imposed severe challenges on craft-based ICH, exemplified by the traditional production skills of Xuan Paper (XPS), which rely heavily on tacit knowledge. These challenges include fragmented resource storage, cross-modal terminological ambiguity, decentralized knowledge, and difficulties in user knowledge acquisition, all of which hinder the effective transmission and revitalization of ICH. Within this context, the central goal of digital-heritage preservation is to construct an interpretable and transferable systematic-knowledge framework that enables the transition of ICH resources from “static storage” to “active reuse”.

Knowledge graphs (KGs), owing to their powerful semantic-relational capabilities, provide a promising pathway for integrating multi-source, heterogeneous ICH data. Existing KG research in the cultural-heritage domain primarily follows two paradigms: the expert-driven approach and the data-driven approach1. The expert-driven approach involves constructing domain ontologies based on standardized vocabularies like CIDOC CRM, emphasizing the accuracy of knowledge and the rigor of cultural logic2,3,4,5. In contrast, the data-driven approach leverages technologies such as natural language processing (NLP), large language models (LLM), multimodal integration, and machine learning to achieve automated knowledge extraction6,7. Although hybrid paradigms combining both are emerging8,9, most current studies focus primarily on the construction and retrieval of cultural-heritage knowledge itself—addressing “how to build and integrate knowledge”—while failing to adequately consider the fundamental differences in cognitive abilities and information needs among diverse user groups10, that is, “how to present knowledge to users.” This structure-over-application limitation leaves most ICH-KG systems as static retrieval interfaces, hindering efficient knowledge dissemination and utilization and ultimately constituting a “cognitive adaptation dilemma” in ICH digital practice. Consequently, overcoming the static-knowledge-base paradigm and developing a knowledge-service system that dynamically responds to users’ cognitive differences has become a pressing challenge. To empirically evaluate the effectiveness of new methods, this study uses two baseline systems for comparative experiments: a purely symbolic, expert-driven method (CIDOC-KG) and a purely neural, data-driven method (BERT-KG).

Cognitive load theory (CLT) provides a classical framework for understanding and optimizing the allocation of cognitive resources during human learning and information interaction, offering an effective means of addressing the aforementioned cognitive-adaptation dilemma in KGs11. This theory posits that human working-memory capacity is limited and that learning effectiveness largely depends on managing intrinsic cognitive load (inherent task complexity), extraneous cognitive load (information-presentation format), and germane cognitive load (schema construction and automation)12. Among these, the extraneous cognitive load, determined by information presentation and interaction design, represents the most actionable variable for intervention in design. Although CLT has been widely validated in the visualization13 and explainability design14 for general information systems, its systematic application to the construction of cultural-heritage knowledge graphs—especially within craft-based ICH contexts—remains limited. Therefore, this study introduces a User Demand-Information Density (UD-ID) mapping matrix to facilitate the transformation of cultural-heritage KGs from static knowledge repositories to user-adaptive systems.

In recent years, the emergence of neuro-symbolic AI (NS-AI) has provided new methodological tools for simultaneously addressing knowledge integration and cognitive adaptation15. This approach synergistically combines the logical-reasoning capabilities of symbolic systems with the perceptual-learning strengths of neural networks. Its symbolic components (e.g., ontologies, rule bases) are responsible for the structured representation of knowledge, relational reasoning, and cognitive constraints; whereas its neural components (e.g., deep learning models) are responsible for extracting relevant features and identifying patterns from multimodal data. These components interact through a semantic-mapping module to jointly deliver interpretable, inferable, and user-adaptive knowledge services. This hybrid approach can provide KGs with more powerful and flexible reasoning capabilities, making it particularly suitable for tackling the unique complex semantic gaps16,17 and cognitive adaptation needs18,19,20 prevalent in the ICH domain, demonstrating potential to surpass purely neural or purely symbolic baselines in both interpretability and coverage. Despite the significant inherent potential of neuro-symbolic AI (NS-AI) Neuro-Symbolic AI, its application in the cultural-heritage field, especially in the preservation of craft-based ICH, remains in its early exploratory stages. Existing research predominantly focuses on validating the technology itself and has not yet established a framework that deeply integrates the computational power of Neuro-Symbolic AI with systematic user-cognitive models (e.g., CLT) to simultaneously address the dual challenges of knowledge structuring and cognitive adaptation. This methodological gap is the core area that this study intends to fill.

Xuan Paper, revered as the first of China’s “Four Treasures of Literature” and an outstanding representative of traditional papermaking, has served as the core medium for literati and artists to create exquisite calligraphy and paintings since the Tang Dynasty, and its characteristic of “paper with a life of a thousand years, ink with the charm of ten thousand changes” has contributed to artistic masterpieces such as “Along the River During Qingming Festival” and “Five Oxen”. The traditional production skills of Xuan Paper (XPS) are not only an outstanding representative of Chinese handmade papermaking techniques but were also inscribed on the UNESCO Representative List of the Intangible Cultural Heritage of Humanity in 2009, standing as the sole protected handmade papermaking project. The craft involves 108 intricate and sophisticated steps. As illustrated in Fig. 1, these include five core processes whose transmission relies heavily on tacit knowledge passed down through oral instruction and hands-on demonstration between master and apprentice. For instance, paper forming (Lao Zhi) requires precise coordination of timing and force among artisans, while paper drying (Shai Zhi) demands the craftsmen’s keen perception of baking temperatures. Moreover, over millennia of transmission, Xuan Paper has fostered rich folk traditions such as the worship of Cai Lun, who is the inventor of paper-making, the rituals like drinking spring wine at the start of the work and closing wine at its end. These elements of tacit knowledge collectively constitute a complex ICH system characterized by a high degree of knowledge fragmentation and terminological ambiguity. Moreover, current academic research on Xuan Paper predominantly focuses on its material and physical properties21,22,23, whereas studies exploring the construction of its knowledge system from a digital knowledge engineering perspective and its integration with user cognitive needs are relatively scarce. This research gap also makes XPS an ideal case study for empirically validating the user-demand-driven approach proposed in this study.

Fig. 1: Tools and processes for the core steps in Xuan Paper production.
Fig. 1: Tools and processes for the core steps in Xuan Paper production.
Full size image

This schematic outlines the five key stages of the traditional craft, from raw material preparation to the finished product, and highlights the core tools utilized in each step.

To systematically address the aforementioned cognitive-adaptation dilemma of KGs, the integration challenges of NS-AI from a user perspective, and the shortcomings in digital research on the Xuan Paper domain, this study takes XPS as its research object and proposes a user-demand-oriented neuro-symbolic cognitive enhancement framework for ICH knowledge graphs (NSCEF-ICHKG). Based on this framework, we construct a knowledge graph of traditional production skills of Xuan Paper (XPKG), aiming to achieve structured integration of multi-source heterogeneous knowledge, resolve domain-specific terminological ambiguity, and achieve user cognitive adaptation (Fig. 2). The main contributions of this paper are as follows:

Fig. 2: NSCEF-ICHKG architecture and research framework.
Fig. 2: NSCEF-ICHKG architecture and research framework.
Full size image

The framework integrates four cohesive layers—the Cognitive-Driven Layer, the Neuro-Symbolic Knowledge Production Layer, the Knowledge Hierarchical Storage Layer, and the Experimental Validation Layer—to address the challenges of knowledge fragmentation and cognitive adaptation imbalance in the digital preservation of ICH.

(i) Theoretical level: We propose the neuro-symbolic ontology for traditional production skills of Xuan Paper (NS-XuPOnto), the first ontology model in the ICH field that integrates neuro-symbolic mechanisms with user-cognitive adaptation capabilities. Based on CLT and AHP, we establish a user demand-information density (UD-ID) mapping matrix to achieve dynamic equilibrium between knowledge supply and cognitive demand, thereby filling the theoretical gap in cognitive-adaptation mechanisms for ICH digital preservation.

(ii) Methodological level: We proposed the NSCEF-ICHKG framework and constructed an XPKG instance that achieves fine-grained semantic mapping of multimodal data such as oral, textual, and video sources, ensuring semantic consistency. Furthermore, its performance is demonstrated to be superior to traditional baselines such as CIDOC-KG and BERT-KG.

(iii) Practical level: We constructed the first comprehensive multimodal dataset covering the entire lifecycle of Chinese Xuan Paper (XPDataset), comprising 3000 keyframe images, ~3.5 h of process and interview video, and 10 h of artisan oral records, thereby providing benchmark data support for the living transmission and research of XPS.

This study opens new pathways for the safeguarding of Xuan Paper culture through the three-dimensional collaboration of NS-XuPOnto, XPKG, and XPDataset for multimodal data collection, holding significant theoretical and practical value. The structure of this paper is as follows: Section “Methods” details the construction of the NSCEF-ICHKG framework and the NS-XuPOnto ontology model. Section “Results” presents the construction of XPKG using XPS as an example and compares the model performance and cognitive load through two experiments. Section “Discussion” summarizes the entire work and outlines future research directions.

Methods

Research framework. To systematically address the dual challenges of knowledge fragmentation and user-cognitive adaptation imbalance in the digital preservation of XPS, we integrate neuro-symbolic AI (NS-AI), knowledge graphs (KGs), and cognitive load theory (CLT) to propose the neuro-symbolic cognitive enhancement framework for ICH knowledge graphs (NSCEF-ICHKG). This framework combines the logical rigor of symbolic systems with the perceptual learning capabilities of neural systems, effectively compensating for the limited knowledge coverage of purely symbolic methods and the semantic inaccuracy of purely neural methods. It thus provides an ideal architecture for constructing a knowledge graph that incorporates both the rigid constraints of cultural logic and user-cognitive adaptation16,17.

As illustrated in Fig. 2, the framework contains three core modules: The first, the cognitive-driven layer. This layer serves as the decision-making core of the framework, aiming to establish a user-demand-driven cognitive grading mechanism. We deeply integrate CLT as a guiding principle into the construction process of NS-AI and KGs, and combine the analytic hierarchy process (AHP) to quantify demand weights for different user groups. On this basis, we establish the UD-ID mapping matrix to mitigate the conflict between technical logic and cognitive needs. The second module, the neuro-symbolic knowledge production layer. This layer is the core component for knowledge structuring and processing. We construct the NS-XuPOnto using the Protégé software. In the symbolic layer, we extend the CIDOC CRM core classes and embed four types of cognitively sensitive SWRL rules to ensure the interpretability of process knowledge. In the neural layer, we utilize LLM technology for cross-modal contrastive learning and semantic alignment, bridging the semantic gap between unstructured information and the structured ontology, thereby resolving the ambiguity of craft terminology. The third module, the knowledge hierarchical storage layer. This layer is responsible for instantiating the outputs from the previous two layers and achieving the final user adaptation. Based on the strategy defined by the UD-ID matrix, we design and implement a dynamic hierarchical storage strategy based on property labels using the Neo4j graph database, generating an adaptively visualized XPKG view. Finally, the experimental validation layer: A dual-evaluation mechanism is implemented through an entity-retrieval performance comparison experiment (Experiment 1) and a cognitive-adaptation experiment based on the NASA-TLX scale (Experiment 2), to empirically verify the performance of XPKG and the effectiveness of the NSCEF-ICHKG framework.

User-oriented mapping matrix. In order to quantify user demands and establish a cognitive stratification mechanism, this study employs the analytic hierarchy process (AHP). Proposed by operations researcher Saaty in 1980, AHP is a quantitative, multi-criteria decision-making tool whose core involves constructing a hierarchical structure and judgment matrices to transform subjective experience into objective weights24,25 In the field of cultural heritage digitalization, AHP can effectively quantify users’ cognitive preferences for knowledge service functions, overcoming the data processing bottleneck of traditional questionnaires that require large sample sizes, thereby achieving a scientific allocation of demand weights. The affinity diagram (KJ) is a commonly used tool in systems engineering that decomposes different problems into distinct factors, classifies, compares, screens, and integrates them to ultimately derive solutions26. The user demand analysis for XPKG involves a multi-level analysis of the system itself and its interaction with users. We recruited 30 users (10 experts, 20 novice users) and classified and integrated the XPS demands using the KJ method, establishing a hierarchical structure model (Fig.3). The goal layer represents the overall objective of this hierarchical model, defined as knowledge graph service satisfaction. The criterion layer indicators are B1 Data Analysis Capability, B2 Information Visualization, B3 Interaction Experience, and B4 System Performance. Summarizing the demands of the criterion layer yields 16 indicators (C1–C16) at the solution layer.

Fig. 3: AHP-based hierarchical structure model of XPKG.
Fig. 3: AHP-based hierarchical structure model of XPKG.
Full size image

The model structures user requirements into a three-level hierarchy to quantify cognitive preferences for knowledge services. It comprises the goal layer (knowledge graph service satisfaction), four criterion layers (e.g., data analysis, visualization), and 16 solution-layer indicators.

Subsequently, we randomly invited 10 ICH experts and 15 novice users to perform pairwise comparisons of the sorted demands using the Saaty 1–9 scale method (Table 1),27 constructing a positive reciprocal judgment matrix \(B={({b}_{{ij}})}_{n\times n}\) (Eq. (1))

$$\begin{array}{l}{\bf{B}}=\left(\begin{array}{cccc}{b}_{11} & {b}_{12} & \cdots & {b}_{1n}\\ {b}_{21} & {b}_{22} & \cdots & {b}_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ {b}_{n1} & {b}_{n2} & \cdots & {b}_{nn}\end{array}\right)\end{array}$$
(1)
Table 1 Satty1–9 scale

The correlation judgment matrix B is normalized by columns (Eq. (2)) to obtain C = (cij)n×m.

$${c}_{{ij}}=\frac{{b}_{{ij}}}{\displaystyle \mathop{\sum }\limits_{i=1}^{n}{b}_{{ij}}}(i,j=1,2,\cdots ,n)$$
(2)

The column-normalized matrix C is summed row-wise (Eq. (3)) to obtain the vector \(w=\) \((\)ω1, ω2,... ωn)T, which is normalized to obtain the weights of each indicator \({\omega }_{i}=(i=1,2,...,n)\) (Eq. (4)).

$${\omega }_{i}=\mathop{\sum }\limits_{j=1}^{n}{c}_{{ij}}(i=1,2,\cdots ,n)$$
(3)
$$\omega =\frac{{\omega }_{\text{i}}}{\displaystyle \mathop{\sum }\limits_{i=1}^{n}{\omega }_{\kappa }}$$
(4)

The maximum eigenvalue \({\lambda }_{\max }\) of the judgment matrix is calculated (Eq. (5)) to check the matrix consistency.

$${\lambda }_{{\max }}=\frac{1}{n}\displaystyle \mathop{\sum }\limits_{i=1}^{n}\frac{({\boldsymbol{A}}{\boldsymbol{\omega }}{)}_{i}}{{\omega }_{i}}(i=1,2,\cdots ,n)$$
(5)

Here, \(({\boldsymbol{A}}{\boldsymbol{\omega }}{)}_{i})\) is the ith component of vector \({\boldsymbol{A}}{\boldsymbol{\omega }}\). Consistency is verified using the consistency ratio \({\boldsymbol{CR}}=\frac{{\boldsymbol{CI}}}{{\boldsymbol{RI}}}\) (where consistency index \({\boldsymbol{CI}}=\frac{{\lambda }_{max}-n}{n-1})\). If \({\boldsymbol{CR}} < 0.1\), the judgment matrix is consistent, and the weights are valid; otherwise, the matrix needs adjustment. The judgment matrix RI values are shown in Table 2.

Table 2 Judgment matrix RI values

Through the above process, the relative weight values of the first and second level evaluation indexes of XPKG are calculated, and the consistency test ratios \({\boldsymbol{CR}}\) for all judgment matrices were <0.1, indicating that the matrices passed the consistency check and the weight calculation results are valid. Based on the global weight calculation formula: \({\omega }_{{\rm{Global}}}^{(\kappa )}={\omega }_{{\rm{Guideline}}}^{(i)}\times {\omega }_{{\rm{Program}}}^{(j)}\), the global weight ranking of expert and novice users C1–C16 is calculated (Table 3). The top-ranked items were selected to plot the global high-weight sequence diagram shown in Fig. 4.

Fig. 4: Global weighting of user demands.
Fig. 4: Global weighting of user demands.
Full size image

a Expert group and b Novice group.

Table 3 Global weighting ranking of user demands for Expert and Novice groups (partial)

To translate the AHP-quantified user demand weights into executable knowledge presentation strategies, this study adopts cognitive load theory (CLT) as an a priori design guidance framework to construct the user demand-information density (UD-ID) mapping matrix. This matrix first categorizes the high-weight demands based on CLT, clarifying the type of cognitive load each aims to optimize. Subsequently, it specifies corresponding information-density strategies based on the load type, which are subsequently mapped NS-XuPOnto. This “demand-theory-strategy-ontology” derivation process ensures both theoretical rigor and practical feasibility, providing a design blueprint and executable rule set for the cognitive-adaptation function of the KGs, while also supplying the basis for attributing effects in Experiment 2.

Following this pathway, we screened the top eight highest-weighted demands for the expert group (C16, C5, C15, C7, C9, C10, C3, C11) and the novice group (C14, C9, C5, C10, C16, C13, C6, C3), respectively. According to CLT, we categorized these high-weight demands into intrinsic, extraneous, and germane cognitive loads. For instance, expert group C15 relates to deep information processing (germane cognitive load), while C16 relates to efficiency (extraneous cognitive load); novice group C14 and C9 relate to reducing interaction complexity and information overload (extraneous cognitive load). Simultaneously, considering the predefined design objectives of the current knowledge graph modeling, we performed a preliminary correlational mapping of these high-weight demands, their corresponding information density strategies, and ontology concepts, constructing the UD-ID mapping matrix (Tables 4 and 5).

Table 4 AHP-based UD-ID mapping matrix for expert groups
Table 5 AHP-based UD-ID mapping matrix for novice groups

Symbolic Layer Construction. In order to transform the AHP-established UD-ID matrix into a computable knowledge structure and realize the key function of user cognitive adaptation in the symbolic layer of the NSCEF-ICHKG framework, this study constructs the NS-XuPOnto ontology model. This ontology aims to achieve the rigid constraints of cultural logic and dynamic response to high-weighted user demands by extending the standard CIDOC CRM vocabulary and embedding semantic rules and properties oriented towards cognitive stratification. The core classes for the ontology were constructed as follows. The Conceptual Reference Model (CRM), developed by the International Committee for Documentation (CIDOC), serves as an ontology model for the cultural heritage domain. It provides a structured series of entities and properties applicable to cultural heritage, offering a universal framework for integrating multi-source, heterogeneous cultural heritage resources. However, the generalized core concept definitions provided by the CIDOC CRM core classes are insufficient for fully capturing the fine-grained process parameters and artisans’ tacit knowledge inherent in the traditional production skills of Xuan Paper (XPS). Therefore, to achieve standardized expression of XPS knowledge and enforce cultural-logical constraints, this study reuses the core classes of the general-purpose CIDOC CRM standard in the ICH domain and extends them with XPS-specific classes and properties (Table 6) based on the high-weight demands identified in the UD-ID matrix. This design yields the NS-XuPOnto ontology, ensuring that the resulting knowledge model satisfies user-oriented demands while maintaining completeness and semantic consistency across the domain.

Table 6 NS-XuPOnto core classes definitions and demand mapping

In this ontology, entities are represented as classes. Defining classes and their hierarchical relationships serves to categorize and organize the concepts within the ontology, with their structure and property design directly serving the UD-ID strategy outlined in Tables 4 and 5. The core classes of the NS-XuPOnto ontology span seven dimensions, all inheriting from CIDOC CRM and extended specifically for the XPS domain. The Processes class covers the sequence of Xuan Paper production steps and quantifies key process parameters, serving expert demands C15, C16, and novice demands C14, C16. The Product class represents tangible end products (e.g., raw Xuan, processed Xuan), acting as the physical carrier linking specific processes to cultural meaning. The Tools class denotes instruments used in specific procedures, enriching both semantic detail and contextual granularity. The Actor class encompasses individuals, groups, and relevant ICH safeguarding institutions, directly responding to user needs for agent-related information (C9, C10). The Place class represents geographical origins and environmental context, supporting enhanced spatial–temporal narrative needs (C5). The Time class indicates the temporal scope of activities or events, enabling sequence recording and efficiency tracking (C8, C16). The Cultural Symbols class represents concrete objects, related customs, and rituals that carry cultural meaning, thereby enhancing the semantic depth of cultural transmission and meeting novices’ cultural cognition (C14, C3). Fig. 5 provides an overview of the top-level class hierarchy of the NS-XuPOnto ontology.

Fig. 5: Overview of the top-level class hierarchy of the NS-XuPOnto ontology.
Fig. 5: Overview of the top-level class hierarchy of the NS-XuPOnto ontology.
Full size image

This screenshot from the Protégé ontology editor present the foundational taxonomic structure of its core classes, which establishes the primary categorization scheme for the Xuan Papermaking domain.

After defining the class hierarchy, both object properties and data properties were further specified to describe relationships among entities and their quantifiable characteristics. Object properties, also known as relational properties, specify relationships between classes, semantically connecting otherwise independent classes or instances, whose domain and range are both classes. The properties of the ontology constructed in this study partly originate from the reused from CIDOC CRM ontology model (e.g., cidoc:P7_took_place_at) and newly introduced vocabulary (e.g., xup:ProcessParameter). Table 7 summarizes the definitions of the core object properties and data properties, while Fig. 6 illustrates the top-level class hierarchy and relation schema of the NS-XuPOnto ontology.

Fig. 6: Top-level class hierarchy and relation schema of the NS-XuPOnto ontology.
Fig. 6: Top-level class hierarchy and relation schema of the NS-XuPOnto ontology.
Full size image

This visualization, generated within Protégé, illustrates the relationships between classes through object and data properties, forming the semantic network that underpins the XPKG knowledge graph.

Table 7 Definitions of Core NS-XuPOnto object properties and data properties

We further encoded SWRL rules to ensure the logical consistency inherent in craft-based ICH production techniques, which characteristically involve multiple processes with defined temporal and causal structures. Their essence lies in a series of activities that alter the state of input entities, a process characterized by strict temporality and causality. Taking the XPS as an example, the craft follows five core processes: Raw Material Preparation → Pulping → Paper Forming (Lao Zhi) → Paper Drying (Shai Zhi) → Paper Cutting, resulting in the physical Xuan Paper product, which strictly adheres to an Input → Action → Output” state transition logic. Following the core RDF triple logic, the behavioral logic of a production step can be represented as Process(n) → Input → Action(1) → Output → Process(n + 1).

To implement this logic, we constructed a UD-ID → Ontology Extension → Rule-based Reasoning pathway, implemented using the Semantic Web Rule Language (SWRL) formal reasoning framework. This framework not only encodes cultural logic but also enables automated verification of logical consistency through standard reasoners (e.g., HermiT), ensuring the rigor and reliability of the knowledge service. First, we established the formal foundation and verification mechanism for the rules. We defined a uniform “Antecedent → Consequent” logical paradigm for SWRL rules. The Antecedent defines the sufficient conditions triggering the rule, and the Consequent specifies the logical assertion or operation produced upon rule execution. All rules underwent offline consistency checking using the HermiT reasoner within the Protégé environment to ensure no logical conflicts with the ontology axioms. For example, by defining an inconsistency class xup:temporalViolation, the system can automatically identify and flag instances that violate process temporal sequences during inference, thereby technically enforcing the rigidity of cultural logic. Second, we built an executable mapping path from UD-ID to symbolic operations. The cognitive strategies defined in the UD-ID matrix are translated into SWRL-based operations within the symbolic layer via explicit mapping logic. The core mechanism operates as follows: when a user instance ?u initiates a query ?q, the system determines the corresponding information-density strategy from the UD-ID matrix based on the user’s group (xup:userGroup) and query content. This selection automatically triggers a predefined SWRL rules set, which implements the chosen strategy. These rules are then executed by the OWL reasoner (e.g., HermiT) after loading the user context and ontology, thereby ensuring that user demands seamlessly drive the reasoning behavior of the symbolic system.

Meanwhile, we defined four types of SWRL rules in the symbolic layer (Table 8), constructing the core bridge between demand definition and symbolic function transformation. Process temporal constraint rules enforce the sequential order of processes (e.g., “Paper Forming” must precede “Paper Drying”) to ensure irreversibility of steps, triggering warnings upon violation, thereby safeguarding the rigidity of cultural logic. High-Weight Demand Response Rules directly address high-weight demand items in the UD-ID matrix. For example, to respond to C16 (Cognitive Matching), the system triggers a global view for experts or a summarized view for novices, depending on user type. Information-Density Adjustment Rules dynamically compress the granularity of knowledge based on user group and preset thresholds, effectively preventing cognitive overload. Cognitive Level Annotation embeds user-level markers (e.g., xup:audience_level, “Expert”) into ontology properties, enabling pre-filtering and automatic adaptation of knowledge for different user groups.

Table 8 Formalized SWRL rules and UD-ID

Neural layer construction: The neural layer achieves the structured mapping of unstructured data (videos, oral recordings) to the NS-XuPOnto ontology through three stages: multi-modal knowledge extraction, entity alignment, and knowledge storage. This enables knowledge organization to align with users’ differentiated needs and achieve cognitively hierarchical adaptation, and the technical pathway is illustrated in Fig. 7. We employ large language model (LLM) technology28 to accomplish feature extraction and semantic alignment of both explicit and tacit multi-modal knowledge, ultimately importing the standardized results into Neo4j for cognitively stratified storage.

Fig. 7: Neural layer construction technology path.
Fig. 7: Neural layer construction technology path.
Full size image

The workflow illustrates the process of transforming unstructured multi-modal data into structured knowledge within the NS-XuPOnto ontology. It encompasses three core stages: multi-modal knowledge extraction using LLMs, entity alignment to resolve semantic heterogeneity, and stratified storage in Neo4j to enable cognitive adaptation.

Multimodal semantic extraction serves as the foundation for constructing the knowledge graph. Its core task is to transform the tacit knowledge (artisans’ oral experiences) and explicit knowledge (craft process video actions) from ICH living transmission into structured semantic representations in the form of RDF triples29. Due to most craft-based ICH resources predominantly existing as video data and oral texts, differentiated extraction strategies are adopted.

For the video category, a video is an unstructured time series formed by sequential two-dimensional image frames, and the keyframes capture the essential informational content within a video. By extracting keyframes, we can reduce the indexing volume of video data and avoid using amount of redundant data, thereby improving the efficiency of semantic annotation. Consequently, for video data, we utilize the image encoder of the multimodal contrastive learning model CLIP-ViT30, combined with a Frame Difference Algorithm (Eq. (6)) to automatically locate action transition points and extract visual semantic vectors (Vv) of keyframes, dynamically adjusting the threshold τ to adapt to cognitive layering, the working path is shown in Fig. 8.

$$\Delta S=\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{\rm{ {\mathbb{I}} }}\left(\,\Vert \mathrm{CLIP}(\,{f}_{i+1})-\mathrm{CLIP}(\,{f}_{i}){\mathrm{||}}_{2} > \tau \right)$$
(6)
Fig. 8: Flowchart for extracting ICH video keyframes using the frame difference algorithm.
Fig. 8: Flowchart for extracting ICH video keyframes using the frame difference algorithm.
Full size image

This pipeline utilizes a Frame Difference Algorithm to locate action transition points, employs the CLIP-ViT image encoder to obtain visual semantic vectors (Vv) of keyframes, and dynamically adjusts the threshold (τ) to adapt to user cognitive stratification.

Here, ΔS represents the degree of semantic difference between consecutive frames; n is the sliding window size (default n = 5); fi denotes the ith video frame in the sequence (i is the frame index); I(·) is the Indicator Function, which takes the value 1 when the condition is true and 0 otherwise; ·2 denotes the L2 norm (Euclidean distance) of the vectors; CLIP(·) represents the feature encoder of the CLIP-ViT model; and τ is the user-hierarchical threshold (τ = 0.7 for novice group, τ = 0.3 for expert group). The selection of τ is based on Pre-Experiment 1, in which 15 experts and 15 novices reviewed 100 video-clip frames (from 30 min of craft footage) under five threshold settings (τ\(\in\){0.1, 0.3, 0.5, 0.7, 0.9}) using the NASA–TLX scale to assess cognitive load. Analysis showed that τ = 0.3 maintained >90% accuracy for experts while reducing effort, and τ = 0.7 improved novices’ accuracy by ≈25% by filtering redundant frames. Therefore, τ = 0.7/0.3 was selected for user stratification.

Because ICH production crafts involve multiple complex procedures, pre-trained large models alone cannot fully perform fine-grained extraction. When the CLIP model’s confidence score Cclip < 0.8, a manual correction mechanism is triggered, with three domain experts' annotation keyframes in Protégé. Annotation include action labels, parameter associations, and quality control, with the adaptation relationship between keyframe semantic density and user stratification shown in Table 9. Ultimately, the keyframe selection and the semantic density extraction formula it must satisfy are based on Eq. (7). Here, \(\Delta {F}_{t}\) represents the keyframe selection condition (the frame is selected as a keyframe if the condition is true); FtF{t−1}2 represents the visual difference between the frame at time t and the previous frame at time t−1; Ft and Ft−1 represent the video frame data at time points t and t−1, respectively.

$$\Delta {F}_{t}={||{F}_{t}-{F}_{t-1}||}_{2} > \tau$$
(7)
Table 9 Relationship between user segmentation and key frame semantic density

For the oral-text data, craft-based ICH production involves rich empirical knowledge conveyed orally, often difficult to quantify. To structure this information, the workflow in Fig. 9 is adopted. First, Speech Transcription and Text Cleaning. Oral recordings are transcribed with ASR technology, then cleaned by predefined rule templates to remove fillers (e.g., “uh”, “ah”) and standardize terminology (e.g., unifying “捞纸” as “Paper Forming”). Concurrently, we established three protective rules to prevent over-deletion of text content: (1) retain sentences with craft nouns (e.g., paper curtain, angle). (2) retain sentences with action verbs (e.g., lift, scoop, dry). (3) retain core cultural terms (e.g., Cai Lun Worship). For example, “Uh, when scooping the paper, ah, the curtain needs to be kept at a 45° angle” becomes “When scooping the paper, the curtain needs to be kept at a 45° angle”.

Fig. 9: Flowchart for processing oral-text data using BERT.
Fig. 9: Flowchart for processing oral-text data using BERT.
Full size image

The workflow begins with automatic speech recognition (ASR) and rule-based text cleaning, followed by parallel knowledge extraction and semantic optimization for subsequent ontology linking and entity alignment.

Second, semantic encoding and vectorization. Cleaned text is input into the BERT model31 to extract token- and sentence-level embeddings. Subsequently, knowledge extraction and semantic optimization proceed in parallel, specifically: (1) Knowledge extraction and RDF triple generation: A named-entity-recognition (NER) model with the BIO tagging scheme (entity types:process step, tool, material, parameter), trained via a cross-entropy loss, extracts entities from the text. A Relation-Extraction (RE) model identifies relations (e.g., usesTool, hasSpecificParameter), outputting candidate RDF triples(head entity—relation—tail entity). (2) Semantic optimization and vector representation: Sentence embeddings are whitened with BERT-Whitening32 to enhance isotropy and remove redundancy, yielding high-quality sentence vectors Vt. Finally, the candidate RDF triple feed into ontology linking, while the Vt supports entity alignment.

Entity alignment is critical for resolving cross-modal knowledge heterogeneity, ensuring that semantically diverse expressions of the same craft entity—such as a video frame labeled “Paper Forming Action,” an artisan’s oral description “the third step of forming,” and the ontological class xup:LaoZhi_Process—are unified under a consistent identifier. The alignment proceeds in four steps. In the first step, aimed at unifying heterogeneous entity features. Aiming at the multiple-representational nature of ICH knowledge, we adopt a hierarchical feature-extraction strategy to handle the semantic heterogeneity among video, oral text, and the domain ontology. As shown in Fig. 10, for video keyframes, the visual semantic vectors (Vv) are extracted directly from the Image Encoder of the CLIP-ViT model. For example, the keyframe of the paper forming process (frame_1832): Vv = [0.23, -0.76, …, 0.58]. For oral text, the sentence vectors (Vt) generated by the BERT model are used; for instance, the oral segment “insert the paper curtain at a 45° angle during forming”: Vt = [-0.34, 0.81, …, 0.02]. For the domain ontology, the Text Encoder of the CLIP-ViT model is used to encode the class labels (e.g., rdfs:label of xup:LaoZhi_Process) to generate standard vector representations (Vont). Simultaneously, compliance with the process constraints defined in NS-XuPOnto is calculated, outputting a rule confidence scalar R (value range [0,1]).

Fig. 10: Schematic diagram of the CLIP-ViT model dual encoders.
Fig. 10: Schematic diagram of the CLIP-ViT model dual encoders.
Full size image

This schematic illustrates the procedure for harmonizing the features from video, oral text, and the domain ontology.

Based on the above features, we use Eq. (8) to calculate the comprehensive confidence score \({{\boldsymbol{C}}}_{{\boldsymbol{fusion}}}\) for multi-modal entity alignment.

$${C}_{\mathrm{fusion}}=\alpha \cdot \text{sim}({{\boldsymbol{V}}}_{{\bf{v}}},{{\boldsymbol{V}}}_{{\bf{ont}}})+\beta \cdot \text{sim}({{\boldsymbol{V}}}_{{\bf{t}}},{{\boldsymbol{V}}}_{{\bf{ont}}})+\gamma {\cdot }{\bf{R}}$$
(8)

Here, α, β, γ represent the weight coefficients for the video, text, and rule modalities, respectively (default values 0.4, 0.4, 0.2). Vv is the visual semantic vector of the video keyframe, Vt is the textual semantic vector of the oral text, Vont is the standard vector of the ontology concept, sim(·) is the cosine similarity function (range [0,1]), and R is the rule confidence scalar. The weights are dynamically adjusted, with the expert group γ = 0.4 to strengthen rule constraints, and the novice group γ = 0.1 to reduce complexity.

In the second step, which focused on multi-level similarity calculation and filtering. Different regions may use different terms for the same process. For example, “抄造” refers to the paper-forming process in Jing County dialect, while it means “晒纸” in Shexian dialect. Therefore, we constructed a cognitively sensitive three-level filtering mechanism, which filters from the Name Layer, the Feature Layer, and the Rule levels (Table 10), respectively, to address the terminology polysemy specific to the ICH domain. The feature-level similarity thresholds (Expert > 0.85, Novice > 0.65), weight coefficients (Expert: 0.3/0.5/0.2; Novice: 0.2/0.6/0.2), and the preset confidence threshold (0.75) were all calibrated through Pre-experiment 2, a joint optimization experiment conducted on a validation set containing 200 high-quality manual annotations. The final chosen parameter combination represents the optimal balance point for ensuring high precision while maximizing recall.

Table 10 A three-layer filtering mechanism for multi-level similarity calculation

A representative example (ID: EA_1087) of alignment of the“Paper Forming” process illustrates the mechanism: at the Name Layer, the similarity between the candidate entity name “抄造” and the ontology name xup:Lao Zhi_Process is 0.92. At the Feature Layer, the multimodal cosine similarity between the video action and the oral text is 0.89, meeting the expert group requirement. At the Rule Layer, the candidate entity’s attribute “bamboo curtain inserted at 45°” is verified to comply with the ontology-defined xup:angleRequirement rule. According to \({\boldsymbol{C}}=0.3{{\boldsymbol{S}}}_{{\rm{name}}}+0.5{{\boldsymbol{C}}}_{{\rm{fusion}}}+0.2{{\boldsymbol{S}}}_{{\rm{rule}}}\), the automatic decision confidence C = 0.82 >preset threshold 0.75, triggering automatic alignment and binding to xup:LaoZhi_Process. Here, Sname,Srule represent the scores calculated at the Name and Rule levels, respectively, and \({C}_{{\rm{fusion}}}\) serves as the Feature level score input; the weight coefficients and preset threshold parameters are all based on Pre-experiment 2.

In the third step, targeting alignment decision and conflict resolution, we established a three-tier decision mechanism based on the automatic alignment confidence score (C). Specifically, when C ≥ 0.75, the system automatically binds a global ID; when 0.70 ≤ C < 0.75, a human-AI collaborative decision-making mechanism is triggered; and when C < 0.70, the entity is recorded as unaligned. For clarity, we define the collaboration threshold as θlow = 0.70 and the automatic binding threshold as θhigh = 0.75. When θlow ≤ C<θhigh, the human–AI collaborative decision-making mechanism is activated. First, the Claude 3 LLM is invoked using a standardized prompt template (see Supplementary Information) to perform cultural tracing on terminological conflicts, generating a term tracing report. This prompt aims to cast the LLM in the role of a Xuan Paper ICH domain expert, constraining it to perform structured reasoning strictly according to a defined analytical framework (craft process, tool usage, action description), and requiring it to output a binary “Yes/No” conclusion to ensure the repeatability of the analysis process and the comparability of results. Second, a review panel composed of three domain experts, unaware of the LLM’s output, reviews and adjudicates the LLM’s tracing report based on predefined adjudication criteria. These criteria include three items with equal weight: historical literature support, regional consensus degree, and craft feature conformity. Simultaneously, to assess the reliability of the inter-expert adjudication results, we calculated Fleiss’ Kappa coefficient (k = 0.78) for the expert panel, indicating high inter-expert agreement. For cases lacking consensus, agreement is reached through panel discussion. The adjudicated results update NS-XuPOnto via owl:sameAs (equivalence) or skos:closeMatch (partial similarity) relationships, annotated with xup:sourceExpert and xup:decisionConfidence for transparency and traceability. Entities with C<θhigh are considered unaligned cases and are archived for incremental updates and re-evaluation in subsequent ontology versions.

In the fourth and final step, aimed at global ID binding and knowledge granularity adaptation. Based on the final alignment decision, a unified Global Identifier (Global ID) is generated for cross-media entities, and semantic equivalence between video frames, oral texts, and ontology concepts is established using OWL to solve the semantic heterogeneity problem. Within Neo4j graph database, we implement the property-label-based cognitive-filtering strategy implements UD-ID matrix. (1) Inject cognitive labels: We add custom property labels (e.g., audience_level: [“expert”, “novice”] or audience_level: [“novice”]) to nodes and relationships, marking which user group(s) the entity content should be presented to, achieving logical knowledge stratification. (2) Generate dynamic views: We use the WHERE clause in Cypher queries to filter and present KG subgraphs adapted to different users’ cognitive levels (e.g., WHERE “expert” IN n.audience_level). (3) Bind knowledge granularity: Based on the cognitive labels, different granularities of properties are bound to the same entity. For example, for the “Paper Forming” node, the expert view additionally associates fine-grained properties like xup:angleRequirement and xup:timeWindow, whereas the novice view associates cultural experience properties like xup:hasCulturalSymbol and xup:hasARExperience. Examples are shown in Table 11.

Table 11 Example of global ID binding and knowledge granularity hierarchy

After converting various ICH data and knowledge into Linked-Data form, it is necessary to store and publish, as this directly affects data sharing and reuse. Among existing graph databases, Neo4j is an open-source, robust, scalable, high-performance graph database. Its advantages of high performance, lightweight nature, strong versatility, and high scalability make it widely used in domain KGs construction. Based on this, to achieve the user cognitive stratification adaptation defined in the UD-ID matrix, we employ Neo4j to implement a logical layered storage strategy based on property labels. The core of this strategy is dynamically generating views by attaching cognitive-level identifiers (audience_level) to entities, avoiding the data redundancy and maintenance complexity associated with physical isolation storage schemes.

First, Cognitive Grading based on Property Marking. We import all RDF data into a unified Neo4j graph database. Each node includes an audience_level property (“expert” or “novice”), assigned during the preprocessing phase based on the outputs of the UD-ID matrix and SWRL rules. For example, a Paper Forming Angle parameter node serving C16 is labeled [“expert”], whereas a Cai Lun Worship cultural node shared by all users is labeled [“expert”, “novice”]. This method achieves logical layering through property labels, requiring no node duplication or physical isolation, thus avoiding data redundancy. Second, the Hybrid Reasoning Architecture ensuring Cultural Logic. Since Neo4j does not natively support SWRL rules, this study adopts a hybrid architecture separating the reasoning layer from the storage layer. Before data import, all cultural logic constraints and cognitive annotations based on SWRL are computed and verified at the RDF level by a dedicated reasoner (e.g., Apache Jena), and Neo4j only stores the final, deterministically reasoned results. This architecture ensures logical rigor while leveraging Neo4j’s query performance advantages. Furthermore, for rule conflicts or violations discovered during reasoning, the system marks the xup:ruleViolation property and triggers a manual review process. Finally, Query Layer Interface Reservation. Based on the aforementioned storage design, the system provides a solid foundation for the query layer to offer differentiated knowledge services to different user groups.

Neural-Symbolic Collaborative Knowledge Service Generation. Based on the NS-XuPOnto ontology and the UD-ID matrix, this subsection implements a knowledge service generation system. The core workflow comprises view generation and interface adaptation. For view generation, we implemented cognitively hierarchical views, which are key to interacting with users of different cognitive backgrounds and serve as a core mechanism for bridging the cognitive gap in ICH knowledge transmission. We implement cognitively-graded queries through predefined Cypher query templates, enabling the XPKG system to automatically generate differentiated knowledge views based on user identity (expert/novice). This resolves the challenge of balancing procedural accuracy and cultural accessibility in ICH preservation. The core logic involves constructing a query skeleton within the Symbol Layer based on the process logic defined in the NS-XuPOnto ontology, while the Neural Layer appends a WHERE clause to the Cypher query to filter nodes by their audience_level property. This enables content projection tailored to demand, ensuring the presented information density strictly aligns with the user’s cognitive load threshold.

The following template constructs a comprehensive process chain and technical parameter network for the expert group, serving demands C16 and C15:

// EXPERT VIEW: Retrieve technical parameters

MATCH (n:Process)-[r:hasParameter]->(p:Parameter)

WHERE ‘expert’ IN n.audience_level

RETURN n.name AS process, r.type AS parameterType, p.value AS parameterValue

ORDER BY p.criticality DESC

LIMIT 10;

The core query template for the novice group provides an aggregated view focused on cultural symbols and core narratives, serving demands C5 and C6:

// NOVICE VIEW: Retrieve cultural symbols

MATCH (n:Process)-[r:hasCulturalSymbol]->(c:Symbol)

WHERE ‘novice’ IN n.audience_level OR ‘all’ IN n.audience_level

RETURN n.name AS process, c.name AS symbol, c.description AS culturalStory

ORDER BY c.culturalSignificance DESC

LIMIT 5;

To transform the aforementioned knowledge subgraphs into an intuitive user experience, we developed an adaptive interface engine using the Flourish visualization platform. Following the design principles of the UD-ID matrix (Tables 4 and 5), this engine achieves cognitive adaptation along three dimensions: physical interaction, content presentation, and load control, as shown in Fig. 11.

Fig. 11: Cognitive adaptation workflow: from user identity to personalized knowledge presentation.
Fig. 11: Cognitive adaptation workflow: from user identity to personalized knowledge presentation.
Full size image

This diagram illustrates the end-to-end operation of the system, which integrates Front-end Control Layer, Back-end Service Layer, and Cognitive-load Control Layer to dynamically generate a personalized knowledge experience for distinct user groups.

First, at the user interaction layer (front-end control layer), a dynamic identity switching mechanism is established. In the front-end interface, an explicit “Expert/Novice” toggle button is implemented in the front-end interface. The user’s selection triggers calls to backend services, binding to predefined Cypher templates to initiate subgraph queries to the Neo4j database. Second, at the content presentation layer (back-end service layer), a multimodal hotspot interaction mapping mechanism is implemented. The backend service processes the subgraph returned by Neo4j and drives flourish to generate differentiated visualizations based on structural characteristics (e.g., parameter relationships, cultural symbol aggregations). For experts, a parameter-driven interface is presented with historical data feedback to support process-oriented research; for novices, a story-driven interface is provided with multimodal interactions to facilitate cultural dissemination and immersive experiences.

Finally, at the cognitive-load control layer, we implement dynamic complexity management as a heuristic safeguard derived from Pre-Experiment 1, which revealed a positive correlation between subgraph size and NASA–TLX scores. We cap query results (expert ≤ 10, novice ≤ 5) using Cypher’s LIMIT clause, while ORDER BY prioritizes key content. To enhance personalization, an Information-Density Adjustment slider allows users to fine-tune displayed content, integrating system-level heuristics with user-led control for flexible cognitive management.

Data sources and empirical objects: To systematically validate the effectiveness of the proposed NSCEF-ICHKG framework in ICH knowledge organization and cognitive adaptation, we constructed the multi-modal ICH dataset for the traditional production skills of Xuan Paper (XPDataset) and built the XPKG knowledge graph based upon it. This dataset encompasses not only raw multi-modal resources but also includes a manually annotated Named-Entity Recognition sub-dataset (XP-NER), sampled from its textual corpus. All data are centered around Xuan papermaking, ensuring domain focus and depth.

The textual data sources for constructing the XPDataset were primarily sourced from authoritative platforms such as the China Intangible Cultural Heritage website (中国非物质遗产网址http://www.ihchina.cn/project.html), and we used Python for data crawling to acquire relevant articles and images from these websites. Both video and oral data were obtained from field surveys conducted by the research team in Jing County, Anhui Province, and from documentary recordings of the production process. The temporal range of the collected data spans from January 2015 to June 2024. The collected resources are processed using differentiated strategies. For video data, the frame difference algorithm to automatically locate keyframes at action transition points, combined with the CLIP-ViT model, was used for frame semantic encoding, thereby automatically associating keyframe semantic density with user cognitive adaptation. In Fig. 12, we show the frame difference distribution for a video segment of the “Paper Forming” process in the Xuan Paper production workflow. Additionally, Fig. 13 displays 24 representative keyframes extracted from this 9000-frame video. Due to the presence of transitions, shifts, and other image artifacts, a manual correction mechanism was triggered when the Cclip < 0.8, involving expert review and manual annotation. The final keyframe extraction results for selected videos are shown in Table 12. For oral data, ASR technology was used for transcription into text, followed by the BERT model and BERT-Whitening technique to output structured, graded knowledge representations. Through the above work, we acquired 3000 keyframe images, ~3.5 h of process and interview videos, 10 h of artisan oral records, and 300 texts, completing the digital resource collection for XPS and its related customs and forming the dataset (XPDataset) in the Xuan paper domain. To detail its statistical characteristics and representativeness, its data composition is shown in Table 13. The data collection covers four dimensions: spatio-temporal context, artisans, processes, and media, ensuring representativeness and traceability.

Fig. 12: Video “Xuan Paper process of paper forming”.
Fig. 12: Video “Xuan Paper process of paper forming”.
Full size image

a Inter-frame difference distribution. b Smoothed inter-frame difference distribution.

Fig. 13: Extraction of key frames from the video of the “paper forming” process.
Fig. 13: Extraction of key frames from the video of the “paper forming” process.
Full size image

This figure displays a set of 24 representative keyframes sampled from the video.

Table 12 Keyframe extraction results of video resources related to “XPS” (part)
Table 13 XPDataset multi-modal dataset statistics

Furthermore, for the entity recognition performance evaluation in Experiment 1, we randomly sampled 1350 sentence segments from the XPDataset text corpus (300 texts and 10 h of transcribed oral recordings, totaling approximately 5200 sentences) to construct the XP-NER dataset (Table 14). This dataset was manually annotated by three trained experts according to the entity types defined by the NS-XuPOnto ontology, using the Begin–Inside–Outside (BIO) tagging scheme. The annotation schema includes four types: Process Steps, Making Tools, Material Formulas, and Quality Indicators. Detailed annotation guidelines were established to handle ambiguous cases. We calculated Fleiss’ Kappa coefficient (k = 0.81) to evaluate inter-annotator agreement, indicating high consistency. The dataset was split into training (945 instances), validation (270 instances), and test (135 instances) sets in a 7:2:1 ratio. All data collection procedures in this study were approved by the Institutional Review Board, and all participants signed informed consent forms, and data were used solely for this research and academic purposes.

Table 14 XP-NERDataset

We take XPS as the empirical case, complete the addition of instances to the NS-XuPOnto ontology model, and construct the Xuan Paper Knowledge Graph (XPKG). In the Symbol Layer, the NS-XuPOnto ontology was built by extending CIDOC CRM, fully encoding the five core process chains of Xuan Paper production (“Raw Material Processing → Pulping → Paper Forming → Paper Drying → Paper Cutting”). This involved extending 13 core domain classes and 13 ICH-specific properties (7 object properties, 6 data properties). Furthermore, cultural logics such as temporal constraints and regional characteristics were translated into SWRL rules embedded in the symbolic layer. In the Neural Layer, a differentiated strategy was employed to uniformly map video data, oral texts, and ontological symbols to standardized process nodes after entity alignment. After adding instance information, instances were created and corresponding properties were added using the Protégé software, transforming abstract symbols into an interactive digital representation of cultural heritage. Finally, the ontological structure data were converted into RDF format and stored with cognitive stratification in the Neo4j graph database, resulting in the XPKG (Fig.14). As Fig. 15 shows, the XPKG visualization interface and Table 15 provide the core statistical indicators for XPKG, which are standard metrics for assessing knowledge graph completeness and complexity.

Fig. 14: Visualization of the XPKG knowledge graph in Neo4j.
Fig. 14: Visualization of the XPKG knowledge graph in Neo4j.
Full size image

a Overview of entities and their relationships. b Node-attribute view of a specific entity.

Fig. 15: User flow and representative interfaces of the XPKG visualization system.
Fig. 15: User flow and representative interfaces of the XPKG visualization system.
Full size image

a Initial landing page. b Group selection interface. c Novice representative view (story-driven visualization). d Expert representative view (parameter-driven visualization).

Table 15 XPKG Knowledge graph statistical indicators

Results

To systematically validate the effectiveness of the proposed NSCEF-ICHKG framework, we designed and conducted two sets of experiments: Experiment 1 evaluated the entity-retrieval performance of the knowledge graph at the entity level, while Experiment 2 assessed the system’s cognitive-adaptation effects at the user level. All experiments were conducted using the XPDataset, the XP-NER dataset, and XPKG introduced above, ensuring consistency in the experimental subjects and data sources.

Experiment 1: Knowledge graph entity-retrieval performance comparison: To verify the effectiveness of the XPKG, built upon the NSCEF-ICHKG framework, in its core knowledge-service function, we conducted a comparative experiment on entity-retrieval performance. This experiment aimed to evaluate the performance of KG systems constructed using different methods on the task of entity retrieval and linking tasks, assessing the system’s accuracy and completeness in responding to queries and returning relevant entities. Meanwhile, to ensure fairness and reproducibility of the comparison, all compared systems were built from the same source data (XPDataset) and using the 135-test-item dataset from the previously constructed XP-NER dataset as the gold standard. This dataset contains 1350 manually annotated sentences covering four entity types: Process Steps, Making Tools, Material Formulas, and Quality Indicators.

The experimental group (XPKG) was built using the NSCEF-ICHKG framework, where its neural layer extracts entities from the source data and the symbolic layer provides semantic constraints and normalization guidance via the NS-XuPOnto ontology. Control Group 1 (CIDOC-KG) was a baseline KG built using a purely symbolic-driven approach, employing only the standard CIDOC CRM ontology for entity extraction and linking, without incorporating the semantic extraction and ontology adaptation mechanisms of the neural layer. Control Group 2 (BERT-KG) was a baseline KG built using a purely neural-driven approach, utilizing only an off-the-shelf, non-fine-tuned BERT model for open entity extraction and employing the same cosine similarity algorithm and threshold as the experimental group for entity linking, without introducing the ontological constraints and logical reasoning of the symbolic layer. All systems used an identical hardware configuration (MacBook Pro Intel Core i7, macOS Sequoia 15.4.1), software environment (Neo4j Cypher), and query processing interface to control for system performance and query process variables, ensuring procedural consistency. We employed the following three standard metrics: Precision (P), Recall (R), and F1 score to quantitatively evaluate the entity retrieval performance of the KGs. Furthermore, to assess whether the differences between XPKG and the two baseline systems across these metrics were statistically significant, we employed paired-sample t-tests for all comparisons.

Precision (P) refers to the proportion of correct results among all results returned by the system (Eq. (9)), where TP represents True Positives and FP represents False Positives:

$${\rm{Precision}}=\frac{{TP}}{{TP}+{FP}}$$
(9)

Recall (R) refers to the proportion of all correct results that were successfully returned (Eq. (10)), where FN represents False Negatives:

$${\rm{Recall}}=\frac{{TP}}{{TP}+{FN}}$$
(10)

The F1 score is the harmonic mean of Precision and Recall (Eq. (11)), used for a comprehensive evaluation of system performance).

$${F}_{1}=2\times \frac{P\times R}{P+R}$$
(11)

Simultaneously, paired-sample t-tests (α = 0.05) were used to evaluate the statistical significance of the differences between XPKG and the two baseline systems across all metrics.

After inputting the test set samples into the three systems, we obtained the performance comparison shown in Fig. 16. Experiments demonstrate that our constructed XPKG outperformed the two baseline systems in Precision (84.2%), Recall (85.1%), and F1-score (84.6%). Specifically, XPKG’s Precision was comparable to CIDOC-KG (83.3%), its Recall was significantly higher than the latter (52.4%), indicating that our proposed NSCEF-ICHKG framework greatly improves knowledge-coverage breadth while maintaining high accuracy. Compared to BERT-KG, XPKG achieved superior results on all metrics, demonstrating the symbolic layer’s role in enhancing neural output accuracy through ontological constraints. These results confirm the robust performance and validity of the NSCEF-ICHKG framework in building high-quality, semantically consistent knowledge graphs.

Fig. 16: Comparison of entity retrieval performance across knowledge graph systems.
Fig. 16: Comparison of entity retrieval performance across knowledge graph systems.
Full size image

This bar chart compares the performance of three knowledge graph systems—XPKG, CIDOC-KG, and BERT-KG—on the entity retrieval task, evaluated using Precision (P), Recall (R), and F1-score. Error bars represent the mean ± 1 SD (N = 135). Paired-sample t-tests indicate that the differences in all key performance metrics between groups are statistically significant (p < 0.01).

Experiment 2: Cognitive adaptation effect evaluation: To evaluate the NSCEF-ICHKG framework’s effectiveness at the user-cognitive level, we conducted a within-subjects controlled experiment comparing task performance and subjective cognitive load when users (experts/novices) used the XPKG system versus a traditional system (CIODC-KG) to complete identical cognitive tasks.

We recruited 30 participants. The expert group (n = 15) consisted of professionals with over five years of experience in Xuan Paper research or preservation and prior KGs experience. The novice group (n = 15) consisted of ordinary users without relevant domain knowledge or prior KG exposure. Each participant used both the XPKG system and CIDOC-KG system under identical interface and query conditions, ensuring that any variations in performance and load factors stemmed solely from the cognitive adaptation capabilities of the underlying knowledge graphs. The experimental tasks were divided into three cognitive levels: perceptual, comprehension, and decision-making. Each level contained two parallel questions, balanced for difficulty and semantic complexity based on Pre-experiment 1, to ensure task equivalence. For example, the perceptual level involved identifying basic process elements (e.g., “Identify the tool for ‘Paper Forming’”); the comprehension level involved understanding process relationships (e.g., “Explain the ‘Paper Drying’ step”); the decision-making level involved making simple judgments (e.g., “Which type of Xuan paper should be prepared for the Cai Lun worship ritual?”).

Task accuracy and single-task completion time were recorded to measure task efficiency, while cognitive load was assessed using the NASA-TLX multidimensional scale33. Before testing, all participants received 15 min of training on operating both systems, along with two practice training tasks to ensure they could use the systems proficiently. In the formal session, participants independently completed 6 tasks (3 types *2 questions). To control for order and fatigue effects, the sequence of the systems and the tasks was fully randomized for each participant, while task accuracy and completion time were recorded simultaneously. After completing the tasks for one system, participants immediately filled out the NASA-TLX scale, rating six dimensions: Mental Demand (MD), Temporal Demand (TD), Effort (EF), Frustration (FR), Physical Demand (PD), and Performance (PE). Subsequently, we followed the standard procedure to assign weights through pairwise comparison of the dimensions and calculated the weighted total score using Eq. (12) (a lower score indicates lower cognitive load), where Ratingi (0−100) is the dimension score and Weighti (0–5) is the dimension weight. Meanwhile, paired-sample t-tests (α = 0.05) were used to statistically analyze the task accuracy, completion time, and NASA-TLX weighted scores, producing the results shown in Fig. 17.

$${{TLX}}_{{weighted}}=\frac{{\sum }_{i=1}^{6}{({Rating}}_{i})\times {({Weighting}}_{i})}{15}$$
(12)
Fig. 17: Comparison of task performance and cognitive load data between XPKG and traditional CIDOC-KG systems.
Fig. 17: Comparison of task performance and cognitive load data between XPKG and traditional CIDOC-KG systems.
Full size image

a Task accuracy rates for both groups. b NASA-TLX weighted total scores for both groups (lower scores indicate lighter cognitive load). Results indicate that both expert and novice users achieved higher accuracy and lower perceived cognitive load with XPKG,confirming its effectiveness in enhancing task performance and reducing cognitive burden.

As shown in Fig. 17, compared to the traditional CIDOC-KG system, both expert and novice groups achieved higher task accuracy rates when using XPKG (Experts: 91.5% vs. 76.3%; Novices: 86.7% vs. 68.2%), demonstrating that the XPKG system effectively lowers the barrier to accessing intangible cultural heritage knowledge. Simultaneously, in terms of cognitive load, XPKG’s weighted NASA-TLX total scores were lower than the traditional system (Experts: 41.1/53.9; Novices: 36.3/66.8), indicating reduced perceived workload. Notably, the novice group exhibited a substantially greater reduction in perceived cognitive load compared to the expert group. Paired t-test results confirmed all differences were statistically significant (p < 0.001). These findings demonstrate that XPKG, built upon the NSCEF-ICHKG framework, effectively enhances task performance while reducing cognitive load for users, particularly for benefiting non-expert users, thereby validating its cognitive-layering and adaptation mechanism.

Discussion

The core challenge in digital heritage preservation lies in overcoming the dual bottlenecks of knowledge fragmentation and user-cognitive imbalance, thereby facilitating the transition of ICH knowledge from static storage to active understanding. This study takes XPS as an empirical object, integrates knowledge graphs, Neuro-Symbolic AI, and Cognitive Load Theory to propose a user-demand-driven NSCEF-ICHKG. Experimental results demonstrate that the XPKG, constructed based on this framework, not only outperformed traditional baselines in objective metrics such as entity retrieval precision (84.2%) and recall (85.1%) (p < 0.01) but also, through the UD-ID matrix, achieved graded knowledge presentation, which effectively reduced user cognitive load, especially for novice users. This confirms the framework’s efficacy in enhancing knowledge accessibility and comprehension efficiency.

Unlike the purely symbolic CIDOC-KG baseline, which lacks adaptive mechanisms, our framework leverages the neural components (CLIP-ViT, BERT) to extract tacit knowledge from unstructured data(videos, oral records), thereby overcoming the recall limitations of purely symbolic methods. Conversely, compared with the purely neural BERT-KG baseline, the symbolic components (NS-XuPOnto, SWRL rules) maintain semantic precision and cultural-logical consistency, mitigating the “black-box” uncertainty typical of neural models. Crucially, this study extends beyond a mere technical combination of KG with NS-AI by constructing a computable UD-ID mapping matrix based on CLT, quantitatively embedding cognitive differences throughout the KG-construction pipeline. This addresses the common “structure-over-application” limitation of conventional KGs and fosters a dynamic balance between knowledge supply and user demand, offering a viable solution to the cognitive-adaptation dilemma in ICH digitalization.

Although our NSCEF-ICHKG framework has made progress in delving into the XPS, there is still room for improvement. First, the breadth and representativeness of user-demand modeling could be enhanced. The current UD-ID matrix was derived from 30 participants in a specific region; subsequent research will include a broader demographic and cultural spectrum(geographical areas, cultural backgrounds, and age groups) to strengthen the model’s robustness and generalizability. Second, the interactivity and immersiveness of knowledge presentation warrant improvement. At present, static query templates govern stratified views; future work will explore dynamic narrative visualization tools—for example, timeline-based storytelling—to enhance user engagement. Third, the real-time adaptability of the cognitive-adaptation mechanism needs strengthening. Presently, the adaptation mechanism relies on predefined weights and rules; upcoming research will explore integrating lightweight physiological sensing technologies(e.g., EEG, eye-tracking) to achieve real-time cognitive feedback and regulation.

In summary, the NSCEF-ICHKG framework proposed in this study confirms the research value of user-cognitive adaptation in cultural-heritage digitization and offers a transferable, human-centered approach for craft-based ICH systems worldwide. Moreover, it contributes to a paradigm shift in ICH preservation from—focusing solely on knowledge structure to emphasizing human understanding—thus providing both theoretical and methodological foundations for the sustainable digital transmission of global cultural heritage.