Introduction

With the evolution of AI-generated technologies, the digital cultural content production has been gradually shifting from manual curation to algorithmic dominance, ushering in a new stage of cultural heritage dissemination marked by personalization and interactivity1. Against this backdrop, AIGC-driven cultural platforms are transitioning from “static display” to “narrative reconstruction” and “situational engagement”2, thereby enhancing both user immersion and the breadth of dissemination. However, whether online cultural experiences can be effectively translated into offline cultural participation remains uncertain. Existing research has revealed similar problems of “experience discontinuity.” For instance, Huang et al.3 found that although users frequently responded to event invitations on Facebook, their actual attendance rates remained low, reflecting a gap between online responses and offline behaviors. Mihelj et al.4 reported that digital cultural participation often serves as a supplement for existing offline participants, failing to attract new audiences from the offline world. Sabatini and Sarracino5 further demonstrated that while online interactions can facilitate communication, they may also weaken social trust, thereby constraining the translation of online behaviors into offline social relationships. Drawing on these findings, it can be inferred that in the context of AIGC, the risk of weak online-to-offline transformation similarly persists—and may even be amplified by algorithmic generation mechanisms. Specifically, cultural content generated on AIGC platforms tends to be algorithm-centered, often lacking coherent narratives and cultural context, which limits its ability to foster deeper cultural identification6. Their interaction modes are frequently confined to sensory stimulation, with insufficient emotional guidance and contextual resonance. Moreover, users’ doubts about the authenticity and authority of AI-generated content7 may further weaken the motivational force needed to convert immersive online experiences into offline practices. This suggests that AI-generated platforms still face critical theoretical and mechanistic gaps in constructing cultural meaning and stimulating user behavioral motivations.

Existing studies on cultural communication and technology acceptance primarily rely on classical frameworks such as the Technology Acceptance Model (TAM), the Theory of Reasoned Action (TRA), and the Theory of Planned Behavior (TPB), emphasizing linear relationships among attitudes, subjective norms, and behavioral intentions8,9,10. These models effectively explain behaviors in scenarios with clear functionality and stable structures, but their explanatory power diminishes in the dynamic, multi-sensory cultural contexts of AIGC-driven heritage platforms11. In such contexts, user behaviors are profoundly influenced by irrational factors, such as emotional fluctuations, cognitive conflicts, and perceived risks1,12. It is particularly evident when users are drawn to creative and immersive platform experiences yet confronted by concerns regarding content authenticity, privacy, security, and cultural ethics. Their behavioral decision-making processes often exhibit nonlinear, asymmetric, and psychologically complex characteristics13,14. Consequently, traditional theoretical frameworks fall short of fully capturing the nuanced dynamics underpinning user behavior within these evolving digital-cultural environments.

This study introduces the Net Valence Model (NVM) as an analytical framework to overcome theoretical limitations inherent in traditional behavioral models. Originating from the valence framework theory in economics and psychology, the NVM suggests that individuals undertake a cognitive evaluation balancing perceived benefits (PB) against perceived risks (PR) before making behavioral decisions15,16. The resulting net value (Net Value = PB−PR) serves as the fundamental criterion guiding behavioral intentions. In contrast to traditional linear models that emphasize rational decision-making, NVM prioritizes understanding the authentic motivations derived from users’ value conflicts, emotional responses, and psychological trade-offs in contexts involving emerging technologies17. Specifically, the immersive interaction scenarios and algorithm-generated content provided by AIGC-driven cultural heritage platforms correspond to users’ positive and negative perceptions, making NVM particularly suitable for elucidating the underlying behavioral mechanisms within this context.

To operationalize and empirically validate the theoretical framework, this study selects the “Cloud Tour of Dunhuang” platform, collaboratively developed by the Dunhuang Academy and Tencent, as the empirical case. Centered on the Mogao Grottoes’ mural resources, this platform integrates AI-generated content (AIGC), virtual character guidance, semantic recognition, and multimodal narratives into a comprehensive mobile-based cultural experience system. It transcends traditional exhibition formats. It advances technological boundaries and fosters innovative pathways for users’ cultural cognition and emotional engagement. Consequently, the technological characteristics and content structures of this platform provide an ideal empirical context for defining research variables and establishing model pathways of this study.

Based on the aforementioned theoretical foundations and practical context, this study proposes a research model centered on users’ value perceptions. The model systematically explores how creative design, content generation, and narrative structures on AIGC-based cultural heritage platforms enhance perceived benefits, while simultaneously analyzing how privacy concerns, ethical risks, and emotional burdens collectively shape perceived risks, ultimately influencing users’ intentions for offline cultural engagement. The study employs an integrated approach combining structural equation modeling (SEM) and artificial neural networks (ANN). It leverages causal inference and nonlinear pattern recognition capabilities to enhance explanatory power and predictive accuracy.

The main contributions of this study are threefold:

  1. (1)

    Theoretical Contribution: This study introduces the Net Valence Model (NVM) in AIGC-driven cultural heritage communication. It provides a systematic explanation of users’ psychological trade-offs and behavioral decisions under the influence of perceived benefits and risks. It expands the cross-domain applicability of NVM and addresses the limitations of traditional rational behavior models in complex cognitive environments.

  2. (2)

    Methodological Innovation: This study integrates SEM and ANN to develop a hybrid modeling approach incorporating net value quantification and sensitivity analysis. This enables nonlinear identification and weighting of perceptual variables, thereby enhancing the modeling accuracy and predictive power in cultural behavior research.

  3. (3)

    Practical Implications: Using the “Cloud Tour of Dunhuang” platform as a case study, this research empirically identifies how different AIGC formats and generation mechanisms shape users’ experiential pathways. It highlights the pivotal role of narrative design in driving offline cultural participation, offering actionable insights for platform optimization and cultural communication strategies.

Methods

Theoretical foundation and research framework

This study systematically elucidates the underlying mechanisms driving user experiences toward offline cultural participation behaviors within AIGC-enabled platforms. It integrates complementary perspectives in literature with theoretical construction. On the one hand, the research focuses on the application pathways and contextual characteristics of AIGC in the digital cultural heritage domain. It identifies core challenges and emerging trends related to digital preservation, content presentation, and ethical governance. On the other hand, the Net Valence Model (NVM) serves as the theoretical foundation, explaining the psychological trade-offs users experience between perceived benefits and perceived risks during technology adoption, which subsequently influences their behavioral judgments and participation intentions. The synthesis of these two perspectives provides a robust theoretical basis for developing a user-behavior model centered on benefit–risk perception mechanisms.

Application pathways and challenges of AI-generated technologies in digital cultural heritage

At the international level, AI-generated technologies have already given rise to diverse applications in the field of cultural heritage. For example, in the digital restoration of Dwelling in the Fuchun Mountains, research teams employed Generative Adversarial Networks (GANs) and Diffusion Models to complete missing sections and generate stylistically consistent continuations. By learning Huang Gongwang’s brushstroke characteristics and landscape painting style, the model can generate transitional landscape structures, thereby achieving a “digital recomposition” of the fragmented scroll and opening a new pathway for the digital regeneration of ancient paintings (see Fig. 1). Meanwhile, The Digital Palace Museum: Duobao Pavilion platform integrates more than 600 artifacts into a mobile mini-program. Users can browse decorative motifs and generate personalized “cultural footprint maps” and exclusive posters through AI algorithms. It demonstrates the platform’s capacity for interactive storytelling and personalized content generation (see Fig. 2). At the National Museum of Korea, the exhibition Animals in Old Paintings Come Alive allows visitors to select animals from traditional paintings via mobile devices and project them onto large screens. Powered by AI generative models, the system produces dynamic figures via touch-based interaction, creating an immersive “human–AI co-creation” experience that transcends the limitations of traditional static displays (see Fig. 3).

Fig. 1
figure 1

Digital restoration platform of Dwelling in the Fuchun Mountains.

Fig. 2
figure 2

Digital Palace Museum: Duobao Pavilion experience platform.

Fig. 3
figure 3

National Museum of Korea: “Animals in Old Paintings Come Alive” digital exhibition project.

These cases illustrate that AI-generated technologies have expanded from digital restoration to personalized interaction and immersive presentation. It has opened new avenues for preserving and disseminating cultural heritage, while providing an important contextual reference for the subsequent analysis of the Cloud Tour of Dunhuang platform in this study.

Against this backdrop, scholars have conceptualized such systems as digital cultural heritage platforms—comprehensive systems that integrate content generation, resource display, and user interaction, typically in the form of web-based platforms and mobile applications18. With the advancements in AIGC technologies, an increasing number of platforms have incorporated image-generation and semantic-interaction modules to enable digital preservation, broaden dissemination, and enhance public engagement19. Their core functions can be broadly summarized into three categories: (1) Content Generator: leveraging large language models, image synthesis networks, and semantic reconstruction techniques to produce cultural texts, images, and situational content; (2) Narrative Constructor: enhancing the intelligibility and immersiveness of cultural heritage through multimodal explanations, virtual tours, and scene-based storytelling; and (3) Virtual Guide or AI Successor: employing natural language generation and multi-turn dialog systems to facilitate personalized exploration and context co-creation. These functions span the entire process of digital preservation, presentation, and interaction, constituting the fundamental mechanisms of AIGC cultural platforms.

Firstly, in the domain of digital preservation, AIGC technologies utilize generative models and linguistic restoration techniques to document and semantically enrich traditional stories, architectural ornamentation, artifact imagery, and craft processes20. This process fills existing gaps in historical documentation and enhances readability and intergenerational transmission by utilizing contemporary language and multimodal representations21. However, current practices often exhibit fragmented characteristics, predominantly focusing on the generation of isolated cultural elements without systematically reconstructing the overarching cultural context22. As a result, such generated content frequently struggles to integrate into established cultural narratives, thus limiting its efficacy in sustaining cultural value and ensuring long-term preservation.

In terms of content presentation, AIGC integrates natural language and image generation technologies to provide platforms with personalized interpretive texts, immersive audiovisual content, and interactive 3D environments23. Machidon et al.24 demonstrated that users engaging in virtual interactions with AI characters can access more contextually rich cultural experiences, thereby deepening their understanding of cultural backgrounds and enhancing emotional resonance.

However, despite the growing diversity of presentation formats, existing research remains primarily focused on visual outputs and technical implementation, with limited attention to the underlying cultural narrative structures and user psychological mechanisms. This technological bias undermines the explanatory power and contextual expressiveness of AIGC in cultural communication, thereby constraining its potential to foster a holistic understanding of users and to stimulate effective offline cultural participation.

In terms of user interaction, AIGC technologies—through natural language generation (NLG), multi-turn dialog systems, and virtual agents25,26—have introduced a novel “dialog–response–guidance” mechanism to cultural heritage platforms, enabling users to engage more actively in exploring cultural content21. However, two key limitations persist in current systems. First, the underlying interaction logic is still largely rule and template-based, lacking a deep semantic and emotional understanding, which often results in repetitive and mechanical content generation7. Second, cultural adaptability remains limited, as AI systems struggle to tailor their outputs dynamically based on users’ cultural backgrounds, linguistic preferences, or contextual nuances27. These challenges diminish the potential of AIGC to contribute meaningfully to cultural education and foster psychological resonance with heritage content.

Overall, AIGC demonstrates multifaceted advantages within digital cultural heritage platforms. Regarding digital preservation, it facilitates the semantic restoration and content expansion of cultural elements; in content presentation, it enhances narrative expression and contextual reconstruction; and in user interaction, it establishes mechanisms of identity resonance through semantic generation and dynamic guidance.

In contrast, traditional platforms often adopt “digital archive” or “virtual exhibition” models that are expert-driven and content-predefined, with linear dissemination pathways that position users as passive recipients, thereby limiting opportunities for deep engagement and personalized meaning-making28. AIGC, by comparison, can generate personalized content based on users’ interests and behavioral trajectories, while enabling dynamic co-construction of cultural meaning through interactive and context-sensitive adaptations. This user-centric adaptability enables cultural expressions to vary by individual and by need, highlighting the structural advantages and irreplaceable role of AIGC in reshaping the pathways of cultural communication29.

Despite its growing promise, the development of AIGC also faces profound challenges. On one hand, to enable immersive experiences and personalized recommendations, platforms must rely heavily on users’ behavioral and physiological data to train models. However, the lack of transparency in data collection mechanisms raises serious concerns about privacy breaches, algorithmic manipulation, and emotional interference, ultimately undermining user trust30,31. On the other hand, AIGC is constrained by the limitations of training corpora and model architectures, often resulting in homogenized cultural expression, symbolic formalism, and semantic superficiality—all of which compromise the authenticity and diversity of cultural representation32. Therefore, future optimization of AIGC platforms should place greater emphasis on enhancing users’ cultural understanding, trust, and willingness to co-create. It is essential to improve content quality while safeguarding user privacy, reinforcing cultural identity, and promoting active participation.

Applicability and extensibility of the Net Valence Model (NVM) in user perception research

The theoretical foundation of the Net Valence Model (NVM) can be traced back to the concept of “valence” proposed by Lewin et al.15, which posits that individual behavior results from a dynamic trade-off between positive and negative motivational forces. Building on this idea, Fishbein16 developed the “expectancy–valence model,” arguing that behavioral intention arises from individuals’ evaluations of the anticipated outcomes—both positive and negative—weighted by their subjective importance. The core proposition suggests that when perceived benefits outweigh the costs or risks associated with an action, individuals are more likely to exhibit a willingness to engage in that behavior16.

The Net Valence Model (NVM) suggests that individual behavior emerges from a comprehensive judgment formed through the dynamic interplay between positive drivers (e.g., anticipated benefits) and negative inhibitors (e.g., potential risks). Users typically seek a psychological equilibrium between maximizing gains and minimizing risks, thereby developing a clear behavioral tendency33. NVM demonstrates strong theoretical flexibility and cross-context applicability, particularly in technology environments characterized by high uncertainty and cognitive complexity34,35.

In AIGC-enabled digital cultural heritage platforms, users may simultaneously encounter positive experiences—such as cultural re-cognition, virtual storytelling, and immersive guided tours—and potential risks, including privacy breaches, cultural misinterpretation, ambiguous content authenticity, and algorithmic manipulation. These environments are often characterized by information opacity and the inherent incomprehensibility of generative mechanisms36, as well as the sensitivity and ethical controversy surrounding cultural expression37. As a result, users face highly dynamic and cognitively ambiguous decision-making contexts in which behavioral responses tend to be nonlinear, emotionally driven, and sometimes deviate from rational expectations.

Traditional rational behavior theories, such as the Theory of Reasoned Action (TRA) and the Theory of Planned Behavior (TPB), are built on the assumption that individuals are rational decision-makers. These models suggest that behavioral intentions are linear and predictable, primarily driven by cognitive factors such as attitudes, subjective norms, and perceived behavioral control9,10,38. These models are effective in structured environments with minimal external interference. However, they cannot explain user behavior and psychological responses in the high-uncertainty contexts where perceived benefits and risks are deeply intertwined.

In contrast, the Net Valence Model (NVM) highlights the psychological mechanism by which users dynamically weigh perceived benefits against potential risks. Its core pathway—“Perceived Benefits → Perceived Risks → Behavioral Intention”— is built upon the Technology Acceptance Model’s (TAM) emphasis on perception-based variables and offers considerable theoretical extensibility. NVM can integrate a wide range of influencing factors, including emotional responses, value conflicts, and social norms. It serves as a robust dual-path framework for explaining users’ psychological trade-offs and behavioral decision-making processes.

In subsequent research, Li et al.35 structurally extended the Net Valence Model (NVM) and applied it to the social media context to examine users’ psychological trade-offs in seeking and sharing health information. Their findings validated the applicability of NVM in scenarios characterized by perceived risks and benefits. NVM has been increasingly applied across a range of emerging technology domains, including autonomous driving, online healthcare, and social networking platforms39,40,41. It demonstrates notable theoretical advantages in capturing the psychological conflicts, risk evaluations, and nonlinear decision-making pathways underlying user behavior.

Regarding perceived benefits, users’ perceived positive value brought by a technology, product, or service is one of the core driving forces behind behavioral decision-making. Prior studies have shown that such benefit perceptions are often closely linked to the system’s functionality, creativity, and interactivity42. Schreier et al.43 found that creative design and content innovation significantly enhance users’ acceptance and willingness to engage with a product. Building on this, Yang et al.44 further argued that when confronted with culturally rich content with high cognitive demands and strong emotional expectations, users tend to rely more heavily on creative visual expressions and interactive narrative mechanisms to achieve contextual understanding and esthetic engagement. As a result, users become sensitive to a platform’s creative and interactive features, which in turn amplify the role of perceived benefits in motivating usage and participation.

Correspondingly, perceived risk focuses on users’ systematic evaluations of the potential negative impacts associated with a given technology. Wang et al.45 identified privacy concerns, system reliability, and psychological discomfort as key sources of perceived risk in digital applications in their technology risk model. Eslami et al.46 further revealed that opacity in data processing and algorithmic bias significantly undermine users’ trust in digital systems. In the context of AIGC, Tsvetkova et al.47 emphasized that algorithmic misinterpretation of semantics and the misuse of cultural symbols during content generation can trigger user concerns about authenticity and cultural appropriateness, thereby dampening their willingness to engage in offline cultural activities. The lack of cultural sensitivity has thus emerged as a critical barrier to user behavioral transformation.

Moreover, a significant trade-off exists between perceived benefits and perceived risks. Empirical research by Martins et al.48 suggests that users are more likely to accept or adopt a technology when its positive value substantially outweighs its potential risks. Conversely, when risk perception becomes dominant, users often exhibit avoidance or resistance behaviors34. Therefore, a key challenge for AIGC platform development and cultural communication design lies in enhancing/improving perceived benefits—through creative design, narrative strategies, and emotional engagement—while simultaneously mitigating user concerns related to privacy, authenticity, and cognitive overload.

Building on this research logic, this study developed an extended influence mechanism model under the framework of the Net Valence Model (NVM), tailored to the contextual characteristics of digital cultural heritage dissemination. The model incorporates both perceived benefits and perceived risks, encompassing four benefit-related dimensions—creative design, creative content, narrative design, and entertainment experience—and three risk-related dimensions—privacy concerns, ethical considerations, and negative psychological responses. Based on this structure, corresponding hypotheses are proposed to systematically examine how users’ perceived trade-offs on AIGC-enabled cultural heritage platforms influence their willingness to engage in offline cultural experiences.

The relationship between creative design/content and perceived benefits

In the context of AIGC, creative design typically refers to innovative expression forms enabled by artificial intelligence technologies, encompassing virtual scene construction, dynamic interactive presentation, and immersive storytelling49. Im et al.50 argue that highly novel creative designs can significantly enhance users’ perceived value of a product or service, particularly in high technological complexity scenarios where such designs are more likely to capture user attention and stimulate exploratory interest. Moon and Han51 further emphasize that creative design is not limited to technical breakthroughs; it also involves personalizing and diversifying the user experience, thereby strengthening immersion and interactive engagement.

As a subjective evaluation criterion for the positive value provided by a technological system, perceived benefits typically encompass multiple dimensions, such as functionality, emotional value, and social value. Noble and Kumar52 found that creative design can significantly enhance users’ overall perception of functional value and emotional benefits, thereby improving their overall satisfaction. Similarly, Huang et al.53 noted that creative visual content and narrative approaches strengthen users’ perceived benefits and foster their acceptance of digital cultural heritage platforms and their inclination toward cultural participation. Accordingly, this study proposes the following hypothesis:

H1: Creative design significantly positively affects perceived benefits.

H2: Creative content significantly positively affects perceived benefits.

The relationship between recreation experience and perceived benefits

Entertainment experience generally refers to the pleasure and satisfaction users derive during interactions with a system or its content, and it represents an important psychological factor influencing their subjective evaluations and overall user experience54. Prior studies have demonstrated that in contexts such as cultural communication, tourism, and education, entertainment elements not only enhance user engagement and emotional experience but also strengthen the perceived value of service systems55,56. The entertainment experience enhances users’ sense of immersion and cognitive engagement through dynamic interaction, contextualized storytelling, and visual design, thereby improving their perceptions of content quality as well as their overall evaluation of platform value. Tan and Chou57 further emphasized that entertaining interfaces and interaction mechanisms help mitigate users’ perceived technological complexity, ultimately improving both their usage experience and perceived benefits. Accordingly, this study proposes the following hypothesis:

H3: Entertainment experience significantly positively affects perceived benefits.

The relationship between narrative design and perceived benefits

In AIGC-generated digital cultural heritage content, narrative is regarded as a design approach that integrates cultural resources with contemporary contexts in a dynamic and layered manner, aiming to enhance users’ emotional resonance and understanding58. Compared to traditional modes of information presentation, AIGC-driven narratives not only generate visually structured content but also reconstruct the historical backgrounds, traditional customs, and cultural stories associated with heritage, thereby enriching the cultural connotations and expressive intensity of the content59. This narrative-oriented content generation approach enhances the educational value and emotional warmth of cultural information, significantly improving users’ perceived benefits. Gu et al.60 noted that narrative-based interactive design effectively reduces users’ cognitive load and improves their satisfaction and experience of the technology. Accordingly, this study proposes the following hypothesis:

H4: Narrative design significantly positively affects users’ perceived benefits.

The relationship between privacy concerns and perceived risks

Privacy and security concerns are key challenges affecting users’ acceptance of AIGC-driven digital heritage content. Studies have shown that users often adopt a cautious and even resistant attitude toward technologies involving the collection and use of personal data61. AIGC platforms typically process large volumes of personal information for content recommendation and targeted marketing purposes, including camera surveillance, geolocation, biometric, and behavioral data62. These mechanisms can enhance the personalization of cultural experiences. However, the opacity of data handling practices and the potential for information leakage can trigger users’ privacy-related anxiety, with possible severe consequences such as identity theft63. The absence of robust privacy protection mechanisms undermines users’ trust in the platform and significantly increases their perceived risk toward AIGC cultural content, thereby reducing their willingness to engage in cultural participation. Based on this, the following hypothesis is proposed:

H5: Privacy and security concerns significantly positively affect users’ perceived risks.

The relationship between ethical concerns and perceived risks

AIGC-driven digital cultural heritage content offers users personalized and immersive cultural experiences. However, its technical design and content generation processes still involve multiple ethical risks, particularly regarding the authenticity, fairness, and semantic appropriateness of cultural expression64. In selecting and reproducing cultural symbols, AIGC often overlooks the values and cultural backgrounds of specific groups, resulting in partial, stereotypical, or even distorted representations65. Moreover, content recommendation mechanisms are typically optimized based on mainstream user preferences, thereby marginalizing the needs and niche cultures of underrepresented communities, fostering a perceived sense of cultural exclusion.

Algorithmic personalization based on user preferences enhances the relevance of content. However, it can intensify “information blind spots,” making it difficult for users to access diverse cultural materials and potentially directing them toward homogenized experiential paths66. This erosion of cultural choice and information transparency can undermine user trust and significantly increase the perceived risk associated with the platform. Based on this, the following hypothesis is proposed:

H6: Ethical concerns significantly positively affect perceived risks.

The relationship between negative psychological responses and perceived risks

AIGC-driven digital heritage content offers users a more immersive and expressive cultural experience. However, its generative mechanisms may trigger potential negative psychological responses that influence users’ perceived risk. In reconstructing virtual cultural scenes, AIGC systems often idealize representations of historical events, ritual practices, or cultural spaces, resulting in visually refined content that may deviate from real-world contexts67. This “perceptual distortion” can create a sense of detachment and raise doubts about authenticity, especially among users who seek genuine cultural understanding. For such users, the imbalance between expectation and representation may lead to disappointment, confusion, or even feelings of alienation68.

Moreover, AIGC cultural content often emphasizes the production of visual or informational elements while neglecting the social context and emotional interactions between the user and the heritage itself. This may lead to cognitive disengagement and emotional desensitization69, weakening users’ holistic understanding of cultural heritage and diminishing their emotional connection to it. Such psychological dissonance and cultural detachment can intensify user skepticism regarding the platform’s authenticity and reliability, constituting a major source of perceived risk. Based on this, the following hypothesis is proposed:

H7: Negative psychological responses significantly positively affect perceived risks.

The relationship between perceived benefits/risks and users’ AIGC experience behaviors

Perceived benefits refer to the positive value users gain from engaging with AIGC. They are manifested in the convenience of information access, enhanced cultural identity, and immersive entertainment experiences. Katifori et al.70 noted that AIGC facilitates users’ intuitive understanding of the cultural context behind heritage through dynamic narratives and creative visual representations. It increases satisfaction with the platform and strengthens users’ recognition of cultural value, ultimately enhancing their willingness to engage in online experiences.

In contrast, perceived risk reflects users’ concerns about the potential negative consequences of AI-generated content, primarily including issues such as privacy leakage, distortion of cultural expression, and algorithmic bias68. Users may experience cognitive dissonance and emotional disengagement when content becomes overly formulaic or detached from authentic cultural contexts, which in turn undermines trust and reduces their behavioral intention71. From a cognitive trade-off perspective, the relative strength of perceived benefits and perceived risks jointly determines whether users are willing to accept the cultural content and extended experiences offered by AIGC platforms.

Users’ actual experiences on AIGC platforms not only influence their perceptions and satisfaction with digital cultural content but may also extend through multiple psychological mechanisms to translate into offline cultural participation. First, immersive narratives and cultural interaction experiences can stimulate users’ positive emotions and learning interests, which in turn translate into motivational drivers for offline cultural behaviors72,73. Second, the knowledge gains and cultural value recognition derived from online experiences often influence users’ real-world behavioral intentions through motivational extension mechanisms, manifesting as a tendency toward real-world cultural exploration driven by virtual experiences74. At the same time, prior studies have pointed out that although virtual cultural experiences can effectively stimulate interest, their limitations in authenticity and completeness frequently trigger users’ compensatory psychological needs, prompting them to seek more authentic offline cultural experiences75. Based on this reasoning, the following hypothesis is proposed:

H8: Perceived benefits significantly positively affect users’ willingness to engage with AIGC platforms. H9: Perceived risks significantly negatively affect users’ willingness to engage with AIGC platforms. H10: Perceived benefits have a significant positive effect on users’ offline cultural participation intentions. H11: Perceived risks have a significant negative effect on users’ offline cultural participation intentions. H12: Users’ actual experiences on AIGC platforms have a significant positive effect on their intentions for offline cultural participation.

This study constructs a user experience behavior model for AIGC-enabled cultural heritage platforms by incorporating six core variables—creative design, creative content, entertainment experience, narrative design, privacy concerns, ethical considerations, and negative psychological responses—and introducing perceived benefits and perceived risks as antecedent factors. The model systematically investigates the transformation mechanism from online experience to offline participation (see Fig. 4).

Fig. 4
figure 4

Impact model of AIGC-driven digital cultural heritage platforms on offline participation intentions.

Case Study: The “Cloud Tour of Dunhuang” mobile platform of the Dunhuang Museum

This study adopts the Cloud Tour of Dunhuang digital cultural platform, jointly developed by the Dunhuang Academy and Tencent, as the empirical case, with its mobile application serving as the subject for user surveys and behavioral pathway analysis (see Figs. 513). Drawing on the mural resources of the Mogao Caves, a UNESCO World Heritage site, the platform establishes a digital communication system that integrates mobile interaction, virtual roaming, and AI-generated technologies.

Fig. 5: a–e “Explore Dunhuang – Mirror-Seeking Module”.
figure 5

a Virtual cave browsing interface where users can click on mural nodes to view textual guides and audio commentaries. b Concept-mapping module that visualizes the symbolic meanings of murals. c Narrative video module that presents dynamic storytelling of mural scenes. d Interactive Questions and Answers module that provides real-time feedback and engagement. e Collection and sharing interface that guides users through a “discovery–reflection–sharing” experiential pathway.

Fig. 6
figure 6

Comparative view of the “Three-Story Building” and Caves 16 and 17 (Library Cave) at the Mogao Caves in Dunhuang: the left shows the original site, while the right presents the digital restoration results.

Fig. 7
figure 7

Digital restoration of murals, sculptures, and artifact details in the Digital Library Cave.

Fig. 8
figure 8

AI-based restoration of the original stacked state of more than 60,000 scrolls in the Digital Library Cave.

Fig. 9
figure 9

AI-rendered virtual lighting effects in the Digital Library Cave (1).

Fig. 10
figure 10

AI-rendered virtual lighting effects in the Digital Library Cave (2).

Fig. 11
figure 11

AI-generated facial animation effects of virtual characters.

Fig. 12
figure 12

AI-generated restoration of Sanjie Monastery in the virtual scene.

Fig. 13
figure 13

AI-driven retrieval and interactive presentation of ancient books.

In terms of operation, users can freely navigate the cave spaces via a virtual map. By clicking on specific mural nodes, they can access textual guides, image restorations, and audio commentaries. They may further enter dynamic narrative videos or interactive Questions and Answers modules corresponding to the selected murals (Fig. 5a–d). In addition, the platform incorporates “digital cave tasks” and a content-sharing recommendation mechanism, guiding users along a “discovery–reflection–sharing” pathway that progressively deepens their cultural experience (Fig. 5e).

In the Digital Library Cave module, the platform achieves millimeter-level 1:1 precision replication, faithfully reconstructing the murals, sculptures, and artifact details of the Mogao Caves’ “Three-Story Building” and Cave 17. From color and material to weathering traces, the reproduction closely mirrors the real site, delivering an ultra-realistic museum experience. This was made possible through more than 30,000 multi-angle captured images and an ultra-detailed 3D model comprising over 900 million polygons, further enhanced by AI-based super-resolution and material recognition algorithms to improve clarity and texture representation (Fig. 6).

It is worth emphasizing that the platform’s content structure comprises two categories of elements: manually preset components and AI-generated components. The manually pre-set elements mainly include the identity settings of NPCs (e.g., historical prototypes such as Master Hongbian or Taoist Wang), the overarching narrative framework (e.g., the historical timeline from the late Tang to the Northern Song to the late Qing), cave numbering, and the basic spatial topology. These elements are predefined by Dunhuang studies experts and the development team to ensure the accuracy of historical narratives and the reliability of cultural connotations.

The AI-generated elements encompass mural restoration, spatial reconstruction, lighting rendering, NPC dynamic performance, and classical-text Questions and Answers. Their implementation relies on generative AI techniques (e.g., GANs, diffusion models, speech synthesis, and retrieval-augmented generation). It produces dynamic and differentiated results that are distinguishable from manually preset content.

Regarding mural restoration, the platform applies AI-based completion and generation technologies. Based on the annotations and scholarly validation provided by Dunhuang experts, GAN models employ adversarial training between generators and discriminators to learn mural brushstrokes, pigment textures, and local structural logics, thereby inferring possible textures and details in missing areas. Diffusion models, by contrast, adopt an iterative noise-adding and denoising inversion mechanism to generate smooth and continuous transitions of color and brushwork, making them particularly suitable for high-fidelity restoration of large-scale damaged regions (Fig. 7). By integrating expert knowledge with AI-based content generation models, the platform achieves authentic restoration of heritage remains and ensures consistency of user perception and an enhanced sense of immersion.

In the overall spatial reconstruction stage, the platform integrates physics-constrained generative models with AI-based content completion techniques to simulate the stacked scene of more than 60,000 scrolls stored in the cave a century ago. The former employs finite element mechanical simulation and fiber fracture modeling to generate natural scroll forms and wear features; the latter, in the absence of complete image records, uses AI generative models to automatically produce local decorative and textural details of the scrolls, thereby ensuring both physical plausibility in geometric form and visual completeness in surface texture. At the same time, the system incorporates a stacking logic inference mechanism, which applies constraint-based 3D arrangement algorithms to deduce the spatial relationships among scrolls (e.g., stacking order, contact and occlusion, center-of-gravity shifts). This ensures that the restoration results are visually realistic and are consistent with the physical plausibility of the historical on-site stacking state (see Fig. 8). As a result, during immersive browsing, users can perceive an authentic approximation of the “original stacking” from a century ago and experience a digitally reconstructed environment imbued with greater narrative tension and historical depth.

In the virtual scene rendering stage, the platform integrates physically based rendering (PBR), global dynamic illumination, and deep learning–driven lighting generation models. PBR rendering employs physics-level modeling of material reflectivity, roughness, and normal maps, enabling the lime plaster base of murals, pigment layers, and sculpture surfaces to exhibit realistic refraction and diffuse reflection effects under virtual light sources. Global illumination further simulates the reflection and refraction paths among multiple light sources, ensuring unified and natural light–shadow relationships throughout the cave space. Building on this, the platform incorporates generative lighting modeling, which takes users’ real-time perspectives as input and dynamically produces realistically distributed rays and shadows using deep learning models. An adaptive compensation mechanism is then applied to optimize brightness gradients in dark corners and corridor areas. This fusion of physics-based modeling and AI-driven lighting generation not only overcomes the limitations of on-site visits characterized by “localized lighting and restricted visibility” but also significantly enhances the fidelity and continuity of lighting details in high-dynamic-range (HDR) environments. It provides users with a clearer, more complete, and immersive virtual exhibition experience (see Figs. 9, 10).

In terms of character and narrative design, users can assume the role of a “Guardian of the Digital Library Cave.” They can select one of six virtual characters and “travel” through different historical periods, including the late Tang, Northern Song, and late Qing dynasties. During this process, users interact with NPCs (e.g., Master Hongbian, Monk Daozhen, and Taoist Wang) who connect the historical narratives of cave excavation, artifact dispersal, and cultural rediscovery. By leveraging AI-based facial animation and speech generation models, the platform enables NPCs’ expressions, voices, and narrative dialogs to appear more natural and fluid, thereby creating a dynamic and humanized cultural experience within the pre-set historical framework (see Fig. 11). Compared with traditional static exhibitions, this AI-generated narrative pathway and character performance significantly enhance users’ sense of immersion and presence.

For heritage sites lacking image records (such as Sanjie Monastery), the platform first employs AI-based semantic modeling to transform historical features annotated by Dunhuang studies experts—such as the proportional layout of pagodas in Five Dynasties–Song temples, the arrangement of monks’ quarters, and the structure of stables—into computable parameter constraints. Next, style transfer and material generation algorithms are applied to unify styles and enhance realism in details such as brick-and-stone textures, wooden structures, and painted eaves, ensuring that the overall appearance aligns with historical authenticity. Building on this, the platform further leverages AI generative models to automatically infer and reconstruct missing architectural structures under semantic constraints and expert validation, thereby producing a complete three-dimensional spatial scene. Through this “AI semantic modeling → generative completion → style optimization” technical pipeline, users in virtual roaming can directly “see” the reconstructed entirety of Sanjie Monastery based on scholarly inference and experience a stronger sense of historical atmosphere and immersion conveyed through AI-generated architectural details (see Fig. 12).

In the ancient book exhibition module, the platform incorporates a large-model–driven Retrieval-Augmented Generation (RAG) system. Users can directly pose questions such as “What is the Diamond Sutra about?” or “When was the Library Cave discovered?” The system performs keyword retrieval and semantic matching within its knowledge base, after which the large model generates concise summaries or multilingual responses, along with background information and key ideas. This transforms ancient texts from being merely “visible” into becoming “comprehensible and interactive” knowledge experiences (see Fig. 13).

In addition, the platform is designed with multi-layered interaction and dissemination mechanisms. First, through modules such as “character tasks,” mural tours, and knowledge Questions and Answers, it enables pathway-based and contextualized immersive interaction. Second, it supports features such as likes, favorites, and multimodal navigation, which enhance the personalized processing of cultural information. Third, it integrates social sharing functions, transforming individual experiences into collective dissemination and thereby reinforcing external resonance and diffusion effects of cultural identity.

Currently, common AI-generated platforms can be roughly categorized into two types. The first are image recognition–driven platforms, typically exemplified by certain museum apps, where users can scan artifacts or exhibits to trigger 3D reconstructions or text-and-image explanations76. These platforms excel at enhancing visual expression and providing rapid feedback, but their generative logic typically relies on preset scripts or template invocation. As a result, they lack dynamic responses to users’ semantic inputs and struggle to construct coherent narratives or deep cultural understanding. The second are text-generation–driven platforms, which rely on AI text generation and semantic Q&A as their core functions. Users can obtain explanations or knowledge responses by posing natural language questions77. While suitable for information-retrieval scenarios, this mechanism generally offers limited interaction formats, and the generated content tends to be disconnected from users’ behavioral pathways, making it difficult to evoke contextual engagement or cultural identity.

Compared to these two types of platforms, Cloud Tour of Dunhuang demonstrates an integrated advantage of narrative-fusion and game-based mechanisms. In terms of cultural narrative depth, structural consistency of generative logic, and integrity of immersive pathways, it shows higher system integration capacity while highlighting distinct gamified features. By fusing multimodal content (images, texts, sound effects, and interactive commands), the platform creates an RPG-like immersive experience in which users gain not only knowledge but also situational enjoyment throughout the process of “exploration–learning–creation.” Through the organic integration of narrative logic, gamified mechanisms, and AI content generation technologies, Cloud Tour of Dunhuang constructs a layered and immersive cultural experience environment. This differentiated interaction mechanism provides a solid practical foundation and theoretical support for this study. It also enhances the observability and verifiability of the transformation from virtual experience to real-world participation.

Questionnaire design

To ensure that the measurement indicators are closely aligned with the research context, this study designed the questionnaire around the Cloud Tour of Dunhuang platform. Based on the AIGC-generated content and user interaction modes embedded in its core functional modules, we developed measurement indicators covering dimensions such as creative design, narrative design, ethical concerns, and platform experience. The platform presents diverse forms of AIGC-generated content, including image restoration, textual narration, voice-guided tours, and interactive tasks. Its generative mechanisms can be broadly categorized into three types: fully AI-automated generation, human–AI collaborative generation, and platform algorithm-assisted generation (see Table 1).

Table 1 Constructs and Measurement Items

The survey employed in this study consists of 25 items. To enhance content validity, each latent variable was measured using multiple indicators. All measurement items were adapted from well-established scales in existing literature and translated into Chinese using the back-translation method (see Table 1). Creative design and creative content were measured using four items adapted from scales developed by Bloch and Zhou et al.78,79. The Entertainment experience was measured using two items from Holbrook and Hirschman80 and narrative design with two items adapted from Eacalas81. Privacy concerns were measured using two items from Malhotra et al.82, while ethical concerns were based on two items from Hunt and Vitell83. Negative psychological responses were measured using two items from Watson et al.84 Perceived benefits and perceived risks were assessed using six items adapted from Li et al.35. Online platform experience was measured using two items from Witmer and Singer85, and offline cultural participation intention was measured using three items adapted from Ajzen10.

The use of multi-item measures was intended to overcome the limitations of single-item indicators and to capture the core constructs of each latent variable. All items were rated on a five-point Likert scale (1 = strongly disagree, 5 = strongly agree).

This study struck a balance between theoretical rigor and model compatibility in the design of measurement items. Although several latent variables were measured using only two items, prior research has shown that dual-item measures can yield stable and valid constructs when the theoretical dimensions are clearly defined and the measurement objectives are specific. Drolet and Morrison86 argue that increasing the number of items provides limited incremental information and may induce “mechanical responding” and higher inter-item error correlations, ultimately compromising data quality. Similarly, Leslie A Hayduk87 emphasizes that when constructs are well-defined, one or two high-quality indicators are sufficient for latent variable modeling, whereas excessive redundancy can lead to model instability and reduced explanatory power.

Based on these theoretical insights, dual-item designs were adopted for certain variables for two main reasons. First, the underlying dimensions are conceptually focused, and the measurement objectives are unambiguous, allowing two indicators to capture the core construct. Second, limiting the number of items helps reduce the cognitive burden on respondents, thereby improving data quality and enhancing the stability of path estimation within the model.

Questionnaire distribution and data collection

This study adopted a combination of purposive sampling and snowball sampling to test the research hypotheses. The questionnaires were distributed through both online and offline channels. To ensure that the survey content closely reflected actual usage scenarios, all participants were guided by the research team to engage in a 10–15-minute experience with the “Cloud Tour of Dunhuang” platform prior to completing the questionnaire. This pre-survey interaction was designed to establish a basic understanding of the platform’s operations and interactive features.

Given the study’s focus on AIGC-driven digital cultural heritage experiences, the sample was inevitably concentrated within specific interest groups and social networks, raising the potential risk of structural bias. To mitigate self-selection bias, the study employed a diversified seed-user recruitment strategy, including university students, cultural heritage enthusiasts, and users from diverse professional backgrounds. This research also used a social media forwarding mechanism to expand outreach and enhance sample diversity across age groups, occupational categories, and geographic regions.

A total of 1073 questionnaires were collected. After excluding invalid responses—such as those completed in under 90 seconds, with highly repetitive answer patterns, or logical inconsistencies—986 valid responses were retained, meeting the basic sample size requirements for Structural Equation Modeling (SEM–ANN) analysis (see Table 2).

Table 2 Descriptive statistics of participant demographics

According to statistical results from SPSS 27.0, respondents aged 18–32 accounted for 61.3%, constituting the primary demographic group. Additionally, 85.5% of participants held an associate degree or higher. Although the sample is somewhat concentrated in terms of age and education level, this reflects the digital literacy and cultural engagement capacity of the core user base targeted by AIGC heritage platforms. Therefore, this structural composition is both appropriate and purposeful for this study.

Reliability and validity assessment

Based on 986 valid survey responses, this study conducted reliability and validity tests for 11 latent variables and their 25 measurement items using SPSS 27.0. The results showed that all variables had Cronbach’s α coefficients exceeding 0.70, indicating good internal consistency of the scales (see Table 3). Confirmatory Factor Analysis (CFA) further demonstrated that the Average Variance Extracted (AVE) for each latent construct was above 0.50 and the Composite Reliability (CR) exceeded 0.70, meeting the convergent validity criteria proposed by Fornell and Larcker88. These results confirm that the measurement model possesses satisfactory convergent validity.

Table 3 Relationships between observed variables and latent constructs

In addition, Pearson correlation analysis (see Table 4) revealed that the relationships among the variables were all statistically significant and in the expected directions. Specifically, creative design (CD), creative content (CC), recreation experience (RE), and narrative design (ND) were significantly and positively correlated with perceived benefits (PB); similarly, privacy concerns (PI), ethical concerns (EQ), and negative psychological responses (NE) were positively correlated with perceived risks (PR). These findings provide strong theoretical and empirical support for the subsequent structural path modeling.

Table 4 Pearson correlation analysis

Model fit and hypothesis verification

Based on 986 valid survey responses, a structural equation model (SEM) was constructed and tested using AMOS 27.0. The primary model fit indices indicate a good fit (see Table 5). Among the absolute fit indices, CMIN/df = 2.45, RMSEA = 0.067, GFI = 0.915, and AGFI = 0.903. For incremental fit, NFI, CFI, and IFI all exceed the 0.90 threshold, demonstrating strong explanatory power. The parsimony-adjusted index PGFI is 0.512, which falls within an acceptable range.

Table 5 Model Fit Indices of the structural equation model

Path analysis results (see Table 6) show that creative design (CD), creative content (CC), recreation experience (RE), and narrative design (ND) all exert significant positive effects on perceived benefits (PB). Specifically, CD, CC, and RE are significant at the 1% level (P < 0.001), while ND is significant at the 5% level (P = 0.003). Regarding perceived risks (PR), ethical concerns (EQ), negative psychological responses (NE), and privacy concerns (PI) all have significant positive effects. EQ and NE are significant at the 1% level, while PI is significant at the 10% level (P = 0.034).

Table 6 Model estimation results and hypothesis testing

Further analysis reveals that perceived benefits (PB) significantly and positively influence both AIGC platform experience (AE) and offline cultural engagement intention (OE) (P < 0.001), whereas perceived risks (PR) exhibit significant negative effects on both AE and OE. Notably, the path coefficient from PR to OE is significant at the 10% level (P = 0.048).

The results validate the structural model and suggest that AIGC-enabled cultural heritage platforms enhance users’ perceived benefits through creative design and cultural expression. This, in turn, positively influences both online experiences and offline engagement behaviors. At the same time, users’ concerns regarding privacy breaches, cultural distortion, and emotional detachment contribute to perceived risk, which negatively affects behavioral intention through the risk pathway. The full structural model is illustrated in Fig. 14.

Fig. 14
figure 14

Results of structural model testing.

SEM–ANN-based analysis of how AIGC-driven digital cultural heritage platforms influence users’ offline experience intentions

To further enhance the model’s predictive power and explanatory precision, this study incorporates an Artificial Neural Network (ANN) approach alongside the Structural Equation Modeling (SEM) analysis, resulting in a hybrid SEM–ANN model. This integrated framework explores the mechanisms through which AIGC-driven digital cultural heritage platforms influence users’ intentions for offline cultural engagement. Drawing on the methodological framework proposed by Liébana89, four ANN sub-models (Models A–D) were constructed based on the significant SEM path results and the principles of ANN modeling to enhance the accuracy of fitting multi-path relationships and capture non-linear patterns more effectively (see Fig. 15).

Fig. 15: AD ANN models for predicting perceived benefits and perceived risks.
figure 15

Model A: CD, CC, ND, and RE are used to predict PB; Model B: PI, EQ, and NE influence PR; Model C: PB and PR affect AE; and Model D: PB, PR, and AE jointly determine OE. All models employ the Sigmoid activation function, with line thickness indicating the strength of connection weights.

Model A: The input layer includes creative design (CD), creative content (CC), narrative design (ND), and recreation experience (RE), with perceived benefits (PB) as the output layer. This model estimates the relative importance of positive content dimensions in predicting PB.

Model B: The input layer includes privacy concerns (PI), ethical concerns (EQ), and negative psychological responses (NE), with perceived risks (PR) as the output layer. This model measures the impact strength of various negative cognitive factors on PR.

Model C: The input layer consists of perceived benefits (PB) and perceived risks (PR), with AIGC platform experience (AE) as the output layer. This model examines the driving factors of cognitive evaluations of the online experience.

Model D: The input layer comprises perceived benefits (PB), perceived risks (PR), and online experience (AE), with offline cultural engagement intention (OE) as the output layer. This model tests the predictive strength of the complete behavioral transformation path.

Subsequently, the artificial neural network (ANN) models were constructed, as defined in Eq. (1).

$$\hat{y}=f({W}^{(2)}\cdot \sigma ({W}^{(1)}\cdot {\bf{x}}+{{\bf{b}}}^{(1)})+{b}^{(2)})$$
(1)

Where \(\,x\in {R}^{n}\) represents the input vector, W(1) and W(2) are the weight matrices, b(1) and b(2) are the bias terms, and \(\sigma (\cdot )\) denotes the activation function. In this study, both the hidden layer and the output layer adopt the Sigmoid activation function, defined as (Eq. 2).

$$\sigma (z)=\frac{1}{1+{e}^{-z}}$$
(2)

All input and output variables were normalized using the min-max normalization method to improve the model’s training performance (Eq. 3).

$$x{\prime} =\frac{x-\,\min (x)}{\max (x)-\,\min (x)}\in [0,1]$$
(3)

This study employed a 10-fold cross-validation strategy to mitigate the risk of overfitting, where 90% of the data were used for training and the remaining 10% for testing in each iteration. The model’s predictive performance was evaluated using the Root Mean Square Error (RMSE) (Eq. 4).

$${\rm{RMSE}}=\sqrt{\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{({\hat{y}}_{i}-{y}_{i})}^{2}}$$
(4)

As shown in Table 7, the average Root Mean Square Error (RMSE) values over ten training iterations for the four artificial neural network models are as follows: Models A, B, C, and D yielded average RMSEs of 0.2187, 0.2524, 0.2295, and 0.1853, respectively, on the training sets. Corresponding RMSE values for the test sets were 0.2280, 0.2570, 0.2175, and 0.1724, respectively. The relatively small standard deviations across models indicate good model fit, stable convergence, and strong generalization capability.

Table 7 Root Mean Square Error (RMSE) of artificial neural network models

To further evaluate the predictive performance of the integrated SEM–ANN model, a sensitivity analysis was conducted for each of the four ANN models. Based on the Normalized Importance metrics reported in Table 8, the following conclusions can be drawn:

Table 8 Sensitivity analysis of artificial neural network models

In Model A, creative design (CD) contributed most significantly to predicting perceived benefits (PB), with a normalized importance of 100%, underscoring its dominant role in enhancing users’ positive perceptions.

In Model B, ethical concerns (EQ) emerged as the key predictor of perceived risk (PR), with a significantly higher weight than other risk-related variables.

In Models C and D, perceived benefits (PB) and AIGC platform experience (AE) were the most influential predictors for their respective dependent variables, indicating that users’ subjective evaluations of content value and immersive experience play a pivotal role in the behavioral transformation mechanism.

Moreover, in the ANN analysis of Model A, narrative design (ND) exhibited a normalized importance of 64.12%, which was significantly higher than that of recreation experience (RE) at 28.56%. This ranking contrasts with the results of the Structural Equation Model (SEM), which showed a stronger linear path effect of recreation experience on behavioral intention. The ANN model reveals the dominant psychological role of narrative content in shaping user perceptions.

This finding highlights that narrative design on AIGC platforms functions as an information delivery medium and a critical mechanism for value construction and emotional engagement. In comparison, while recreational experiences may elicit immediate enjoyment and motivate short-term participation, their influence within the deeper psychological chain of “meaning-making – emotional resonance – value evaluation” is relatively limited. Narrative elements, on the other hand, more effectively trigger advanced cognitive processing and emotional involvement, significantly enhancing users’ overall perception of benefits.

From a psychological mechanism perspective, the strengths of narrative design can be understood through three key pathways: (1) Enhancing immersion and cultural identification through role-based engagement and cultural contextualization; (2) Reducing information complexity via causal structures that improve cognitive manageability; (3) Evoking nostalgia, reverence, and a sense of cultural continuity through contextualized representations of heritage.

These mechanisms are jointly activated and weighted within the ANN’s feature interaction modeling process, amplifying their importance in predicting user behavior.

Therefore, this study argues that although recreation experience appears more prominent in traditional linear models due to its significant path coefficients, narrative design plays a more central role in psychological guidance and perception shaping within the context of cultural value transmission and deep user experience construction. The ANN findings complement the explanatory gaps of SEM and empirically support the user psychology hypothesis that “narrative outweighs entertainment,” offering strong theoretical and practical insights.

Results

Comparative analysis of model explanatory power and predictive performance

This study employs an integrated SEM–ANN modeling approach, combining Structural Equation Modeling (SEM) with Artificial Neural Networks (ANN) to systematically investigate the psychological mechanisms that influence users’ offline engagement intentions on AIGC-enabled digital cultural heritage platforms. The analytical results reveal that although the two models exhibit high consistency in overall path structure, they differ significantly in terms of variable importance ranking, effect identification, and explanatory capacity.

From the perspective of path significance, the SEM model is effective in identifying causal relationships between variables and assessing their statistical significance. This makes it particularly suitable for hypothesis testing and direct path analysis. However, SEM relies fundamentally on linear assumptions and is therefore limited in its ability to capture non-linear interactions and complex weight structures, which may result in underestimation or misrepresentation of certain psychological effects.

In contrast, the ANN model demonstrates superior capabilities in learning non-linear patterns and recognizing variable interactions. Without requiring a predefined path structure, ANN employs a data-driven approach to uncover multi-level relationships among latent constructs. In the sensitivity analysis of this study, the ANN results indicated that narrative design (ND) accounted for 64.12% in predicting perceived benefits (PB)—significantly higher than recreation experience (RE) at 28.56%. This suggests that narrative content with contextual guidance and cultural depth may hold greater psychological dominance in users’ actual perceptions, a factor that traditional linear models may underestimate.

Additionally, creative design (CD) emerged as the most influential predictor of perceived benefits in both modeling approaches, reinforcing its pivotal role in attracting visual attention and stimulating cultural interest. This alignment across models further validates the critical function of CD in driving user engagement.

Effects of NVM variables on users’ offline experience intentions on AIGC-driven digital cultural heritage platforms

Based on the dual SEM–ANN modeling approach, this study systematically examined the influence mechanisms of perceived benefits and perceived risks on users’ intentions to engage in offline cultural experiences. The results indicate that perceived benefits serve as the core driver of behavioral intention. Among the benefit-related dimensions, creative design (CD), creative content (CC), and narrative design (ND) all have a significant positive effect on users’ willingness to participate. Further analysis, in conjunction with the classifications in Table 1 on “AIGC Formats” and “AIGC Generation Mechanisms,” reveals that these three content types differ notably in both media form and generation approach, and their influence pathways exhibit distinct patterns accordingly.

Specifically, creative design (CD) is primarily presented in the form of image-based content and is generated through fully automated AI processes. It delivers high-fidelity mural reconstructions, decorative motifs, and digital cultural scenes, thereby enhancing users’ visual stimulation and esthetic appreciation, significantly contributing to the sensory-driven dimension of perceived benefits.

Creative content (CC) typically involves image-plus-text combinations, produced through human–AI collaborative generation. This format generates personalized interpretive texts and illustrative visuals, forming a dual-channel information structure that strengthens users’ comprehension of cultural context and cognitive value.

Narrative design (ND) is predominantly embodied in the form of interactive storytelling. Through co-generated narrative videos, character-based tasks, and choice-based dialogs, dynamic plotlines are constructed. It increases/influences/affects users’ role immersion and emotional resonance. This process deeply activates cultural identity and behavioral intention through contextual participation.

These findings highlight how differentiated AIGC formats and generation mechanisms shape user perceptions along distinct psychological dimensions, underscoring the need for content-type-sensitive strategies in the design of digital heritage platforms.

From the perspective of content generation mechanisms, the three benefit-related variables correspond to two dominant pathways: fully automated AI generation and human–AI collaborative generation, reflecting differentiated levels of perceived value. Visual image-based content is primarily algorithm-driven, emphasizing real-time presentation and standardized user experience; in contrast, textual and narrative content relies on human–AI collaboration, placing greater emphasis on contextual construction and personalized engagement.

Both the structural model and ANN analysis reveal that interactive narrative content—with its higher interactivity and immersion—makes a stronger contribution to perceived benefits than more superficial mechanisms such as recreation experience (RE). This finding highlights the differential psychological effects of various AIGC forms and engagement types. These findings also support the conclusions of Cai and Huang53,59, who emphasized the dual driving forces of visual impact and cultural narrative. It also aligns with the work of Jung et al.90, which demonstrated that immersive mechanisms significantly enhance users’ behavioral migration tendencies.

Therefore, a triadic mechanism—anchored in AIGC type, mediated by generation mechanism, and reflected in psychological response—emerges as a key pathway through which AIGC-enabled cultural platforms effectively promote users’ willingness to participate in offline cultural experiences.

Moreover, perceived risk exerts a significant inhibitory effect on users’ willingness to participate in offline cultural experiences, with a powerful influence observed in the dimensions of privacy concerns (PI) and ethical concerns (EQ).

The privacy concerns (PI) variable primarily relates to the platform’s collection and use of user behavior data—such as browsing history, clicks, and role selections—during interactions. Although such data are processed under the premise of improving content recommendations and personalizing experiences, they effectively constitute an “algorithmic privacy black box” in which users have limited visibility into how their data are generated, interpreted, or applied. This reflects the algorithm-assisted nature of AIGC engagement, amplifying user anxiety over data transparency and control.

Ethical concerns (EQ) center on the cultural bias and unequal representation embedded in AIGC and recommended content. These concerns often manifest as user skepticism regarding the authenticity of platform-generated texts, particularly interpretive narratives, such as those provided by tour guides. Perceptions of content distortion increase users’ ethical concerns and, consequently, their perceived risk. These findings support Dourish and Anderson’s63 assertions regarding AI-related privacy anxieties and align with the conclusions of Beerends et al.66. They highlighted the cognitive transformation process by which platform bias triggers users’ risk perceptions.

Although the negative psychological responses (NE) variable accounted for a relatively smaller path coefficient in the overall model, its impact tends to be amplified among psychologically sensitive user groups. This dimension reflects users’ negative emotional reactions when interacting with AIGC, particularly in cases where they experience a lack of cultural affinity or emotional resonance. Specific issues include emotionally flat or impersonal narrative scripts, generic system feedback, and repetitive interaction responses (e.g., uniform dialog prompts, lack of personalized feedback in chat windows). These standardized content delivery mechanisms reduce users’ sense of immersion and cultural connection, ultimately acting as a significant psychological barrier to deeper engagement69.

The results further reveal that online experience with AIGC-enabled cultural heritage platforms (AE) plays a significant mediating role between perceived benefits and offline cultural participation intention. By leveraging human–AI collaborative generation to produce personalized interpretive and interactive content, the platform enhances contextual relevance and semantic alignment. Through virtual exploration, users gradually develop a deeper cultural understanding and emotional resonance, ultimately leading to a transformation in their intention to engage.

This mechanism supports theoretical perspectives on how immersive systems influence users’ cognition and behavioral transition pathways91,92, underscoring the AIGC platform’s role in shaping digital cultural experiences. Rather than merely serving as interfaces for delivering cultural information or technological content, such platforms act as critical psychological bridges between users’ cognitive processing and behavioral decision-making.

Despite the diversity of interactive content offered by AIGC platforms, much of it is still generated based on predefined rules or semantic models, often lacking emotional warmth and a humanistic tone. Even when human–AI collaborative mechanisms are introduced, they are typically limited to logical feedback, falling short of providing deep emotional guidance. On the “Cloud Tour of Dunhuang” platform, for instance, users frequently encounter templated or semantically hollow responses during role-based tasks or cultural Q&A interactions, prompting a sense of “mechanical generation” that leads to emotional detachment and psychological fatigue.

This issue is particularly salient among users with high emotional sensitivity or strong cultural identity needs. It can result in perceived cultural distance and desensitized expression, ultimately undermining trust and diminishing willingness to participate. These findings support Pavlidis et al.71, who argue that technological distrust suppresses behavioral pathways, highlighting the importance of emotional coherence and cultural resonance in AIGC-driven cultural communication. Addressing these dimensions is critical for enhancing users’ psychological receptiveness and motivating behavioral transformation.

While the findings generally align with the theoretical expectations of the Net Valence Model (NVM), the strength of influence across different paths varies notably. Creative design and narrative design exert a more substantial effect on perceived benefits than recreation experience. This indicates that users are more responsive to content with strong visual impact or rich cultural storytelling, rather than content that offers only brief sensory entertainment. This preference is reflected in users’ heightened sensitivity to visual and narrative formats, as well as their inclination toward human–AI collaborative outputs that offer greater contextual depth and expressive richness, as opposed to formulaic or templated content generated solely by automated AI processes.

In summary, this study emphasizes the value of AIGC technology in the digital dissemination of cultural heritage and highlights the need to strike a balance between technological innovation and cultural authenticity and preservation. Achieving such a balance is essential to fulfilling the dual objectives of technological empowerment and sustainable cultural transmission.

Perceptual trade-offs, pathway responses, and behavioral drivers: a threefold empirical validation based on net value

To further highlight the theoretical value of the Net Valence Model (NVM) in explaining user behavioral mechanisms, this section introduces the variable Net Value based on empirical data. It examines its role in the trade-off between perceived benefits and perceived risks. The Net Value is calculated as follows (Eq. 5):

$${\text{Net}}\,{\mathrm{Value}}_{\mathrm{group}}=P{B}_{\mathrm{total}}-P{R}_{\mathrm{total}}$$
(5)

In this formulation, \(PBtotal\) represents the overall average score of the user group on the perceived benefits dimensions—namely, creative design, creative content, and narrative design—while \(PRtotal\) represents the overall average score on the perceived risk dimensions, including privacy concerns, ethical concerns, and negative psychological responses. Given the difference in the number of items across the benefit and risk dimensions, this study calculates Net Value as the difference between the average scores of the two dimensions. This approach ensures that the disparity in item counts does not distort the composite score, thereby maintaining equal weighting across psychological dimensions and enhancing the reliability and interpretability of the analysis.

The Net Value index effectively reflects users’ overall evaluative stance toward the platform’s content. A Net Value > 0 indicates that users tend to accept the cultural content provided by the AIGC platform, whereas a Net Value < 0 suggests a general tendency to avoid or reject it. The corresponding average scores for each dimension are presented in Table 9.

Table 9 Average perception analysis across dimensions among user groups

First, from the perspective of users’ subjective perceptions, the overall average score for perceived benefits (PB) was 3.25, slightly higher than that of perceived risks (PR) at 3.12, resulting in a Net Value (PB−PR) of +0.13. Among the benefit dimensions, creative design (CD) (3.46) and recreation experience (RE) (3.30) received relatively high scores, indicating that users generally express strong recognition and satisfaction with the platform’s performance in presenting cultural content and providing an interactive experience.

At the same time, within the risk dimensions, ethical concerns (EQ) (3.14) and negative psychological responses (NE) (3.16) also received non-negligible scores. These results suggest that users still maintain a certain level of psychological defensiveness regarding potential issues such as privacy breaches, cultural distortion, and emotional detachment associated with AIGC-driven platforms.

The findings indicate that users tend to evaluate AIGC platforms with a slight bias toward perceived benefits over perceived risks—a subjective pattern that closely aligns with the structural path analysis results (see Table 6). Specifically, perceived benefits (PB) exert a significant positive influence on AIGC platform experience (AE) (β = 0.327, p < 0.001), as well as on offline cultural engagement intention (OE) (β = 0.210, p < 0.001). These results suggest that cultural value, creative design, and narrative content enhance users’ immersive experiences and effectively stimulate their motivation for behavioral transformation, thereby confirming the cultural guidance potential of AIGC platforms within the “virtual–physical integration” pathway.

In contrast, while perceived risks (PR) exhibit a significant negative effect on AIGC platform experience (AE) (β = −0.109, p < 0.001), their influence on offline cultural engagement intention (OE) is only marginally significant (β = −0.053, p = 0.048). This suggests that although users are aware of potential platform-related risks, such concerns do not dominate or strongly inhibit their behavioral intentions. Combined with the PR dimension scores in Table 9, which fall within the moderately high range (approximately 3.1–3.2), it can be inferred that users maintain a cautious yet non-dominant stance toward perceived risks. Ultimately, their behavioral judgments appear to be more strongly driven by positive value assessments.

This trend is further confirmed by the ANN sensitivity analysis (see Table 8). In Model D, the normalized importance of perceived benefits (PB) reached 92.76%, substantially higher than that of perceived risks (PR) at 45.92%. This suggests that even within a non-linear framework, perceived benefits remain the primary driver of user behavioral intention, while the influence of perceived risks is comparatively secondary.

Taken together—across user self-reports, structural path modeling, and neural network weight rankings—the findings consistently demonstrate a clear pattern in the variable structure. The perceived risk primarily triggers users’ cognitive vigilance, whereas perceived benefit exerts a stronger emotional appeal and behavioral impact. Although users remain sensitive to risks related to privacy, ethics, and emotional fatigue, their ultimate behavioral choices are influenced/shaped by the perceived value of the platform’s positive content.

These results validate the theoretical expectation within the Net Valence Model (NVM)—that users weigh both positive and negative factors, ultimately favoring benefits33—and offer practical implications for optimizing AIGC-based cultural platforms. Prioritizing content strategies that emphasize creativity, cultural resonance, and personalized narrative engagement may significantly enhance users’ behavioral conversion and deepen cultural participation.

Qualitative insights as a supplement and validation of the SEM–ANN Model

To further validate the structural soundness of the SEM–ANN model and gain deeper insight into users’ behavioral drivers and psychological mechanisms, this study conducted semi-structured in-depth interviews with 12 users (see Table 10 for the interview guide). The interviews focused on four key dimensions: content perception, risk evaluation, emotional feedback, and behavioral transformation. Each session lasted approximately 25 to 30 minutes. With participants’ consent, the interviews were recorded and transcribed, resulting in approximately 28,000 words of textual data.

Table 10 Interview outline and corresponding model variables

The analysis adopted an inductive thematic analysis approach. Two researchers independently conducted open coding using NVivo 12.0, followed by semantic clustering and collaborative discussion to identify 14 core themes. By the tenth interview, thematic saturation had been largely achieved, and no new concepts emerged after the twelfth interview, meeting the standard for theoretical saturation93. These themes validate the variable pathways identified in the structural model and uncover important psychological mechanisms and external influencing factors that were not captured by the quantitative framework (see Table 11 for details).

Table 11 Cross-validation summary of SEM–ANN results and qualitative interview findings

First, along the perceived benefits pathway, users consistently reported positive experiences such as “visual appeal” and “narrative resonance”. As one participant noted, “What moved me most were the storytelling elements—it did not feel like I was just looking at images, but rather listening to someone narrate history.” This type of feedback supports the positive influence of creative design (CD) and narrative design (ND) on perceived benefits (PB).

Further analysis revealed that users expressed emotional responses to narrative content more frequently than static evaluations of visual elements, underscoring the dominant role of narrative design in eliciting cultural resonance and enhancing immersion. This finding aligns with the ANN model’s emphasis on the ND → PB pathway and highlights narrative design as a critical psychological foundation for constructing cultural identity.

Second, within the perceived risk dimension, users primarily expressed concerns related to content authenticity and data privacy. As one participant remarked, “Can AI make things up too much?”, while another noted, “I do not feel comfortable with facial scanning.” These responses affirm the positive path relationships from privacy concerns (PI) and ethical concerns (EQ) to perceived risk (PR).

At a deeper level, such perceptions of risk stem from a general mistrust of technology and a psychological resistance to distorted cultural representation and the manipulation of authenticity by algorithms. The concept of “cultural authenticity sensitivity” embedded within the EQ dimension reflects this cognitive mechanism, highlighting users’ heightened awareness of—and resistance to—perceived ethical violations in AI-generated cultural content.

Within the behavioral transformation pathway, interviews revealed an emotional migration process from online experience to offline participation. Many users noted that the platform’s cultural value and emotional resonance inspired a desire for real-world exploration. As one participant shared: “After a few experiences, I started to feel it was worth seeing in person.” Such feedback supports the path mechanisms whereby perceived benefits (PB) and AIGC platform experience (AE) positively influence offline engagement intention (OE). It further validates the mediating role of the AE → OE pathway.

At the same time, some users expressed skepticism or disengagement due to perceived information opacity or overly commercialized messaging, exemplified by remarks like: “You do not even know if this info is real,” and “They say it is a cultural experience, but it feels more like a marketing pitch.” These responses indicate that heightened risk perception erodes online trust and suppresses offline interaction intention, forming a negative psychological pathway from PR to OE that aligns with the structural model’s PR → OE relationship.

Moreover, the qualitative findings revealed that important external constraints were not captured in the existing model, particularly the issue of behavioral accessibility. Some users expressed clear recognition of the platform’s cultural value but reported low willingness to participate due to real-world barriers such as “long distance” or “inconvenient transportation.” This suggests that cognitive perceptions, motivational intentions, and resource-based limitations influence user behavior.

This insight provides a foundation for incorporating additional variables such as “participation accessibility” and “resource-related drivers” into future research. Such extensions would help evolve the current cognition–emotion framework into a cognition–resource dual-pathway structure, thereby enhancing the model’s external validity and predictive power.

In summary, the qualitative analysis validates the primary pathways in the structural model and identifies a range of unmodeled psychological and contextual factors. These findings expand the conceptual scope and structural boundaries of key theoretical variables, offering deeper situational insight and explanatory power for understanding user behavior on AIGC-enabled cultural heritage platforms.

Discussion

This study adopts an integrated perspective of “perceived benefits–perceived risks–behavioral intention” to systematically examine the formation mechanism of users’ offline cultural engagement intentions within AIGC-driven digital cultural heritage platforms. In doing so, it extends theoretical inquiry at the intersection of user adoption behavior, technology acceptance, and cultural experience.

At the theoretical level, this study is the first to incorporate the Net Valence Model (NVM) into the research on digital cultural heritage dissemination. It addresses the theoretical limitations of traditional rational behavior models—such as the Theory of Reasoned Action (TRA) and the Theory of Planned Behavior (TPB)—in capturing nonlinear cognitive processes and emotion–value conflict mechanisms. Unlike conventional approaches that emphasize rational judgment and functional evaluation, NVM focuses on the mechanisms of psychological conflict and value trade-off, enabling a more nuanced understanding of users’ cognitive struggles, trust fluctuations, and emotional tensions when engaging with AIGC. This makes NVM particularly well-suited to the current landscape in which cultural expression is increasingly shaped by algorithmic generation and digital mediation.

This study integrates Structural Equation Modeling (SEM) with Artificial Neural Network (ANN) to fully leverage the complementary strengths of the Net Valence Model (NVM) in both explanatory and predictive dimensions. On the one hand, SEM provides robust statistical support for testing hypothesized causal paths; on the other hand, ANN effectively uncovers asymmetric effects and complex interactions among variables, thereby overcoming the limitations of traditional linear models. This hybrid modeling strategy offers a more flexible and precise analytical framework for advancing research in cultural technology acceptance.

More importantly, NVM’s theoretical adaptability and scalability endow it with strong cross-context transfer potential. As cultural technology applications rapidly expand—ranging from AI-guided tours and virtual exhibitions to cultural recommendation algorithms and digital art creation platforms—users are increasingly confronted with complex psychological trade-offs. These include authenticity anxiety, algorithmic skepticism, emotional entanglement, and ethical resistance. In such contexts, NVM’s dual-structure of perceived benefits versus perceived risks offers a more realistic representation of the dynamic psychological balancing processes users undergo when engaging with cultural-technology products.

Compared to traditional technology adoption theories that emphasize rational decision-making, NVM is better equipped to explain emotionally driven behaviors arising from users’ concerns over authenticity, ethics, and algorithmic control. As such, it exhibits strong contextual adaptability and cross-disciplinary applicability. These properties make it a powerful tool for future research in digitally mediated cultural experiences.

At the practical level, this study proposes a set of culturally sensitive and operational recommendations for the design, operation, and governance of AIGC-powered digital cultural heritage platforms. It aims to promote the effective integration and sustainable dissemination of AIGC technologies with local cultural heritage.

First, establish a content planning mechanism centered on local cultural contexts and the characteristics of cultural heritage, in order to enhance the cultural expressiveness of AIGC-generated content. Currently, many AIGC cultural heritage platforms suffer from issues such as fragmented content and superficial expression. The content generation mechanisms often detach from specific cultural contexts, leading to the simplification of traditional culture into superficial symbols that lack the necessary historical depth and value-bearing capacity. Therefore, it is recommended to establish “local cultural background + heritage expression characteristics” as the fundamental logic of content design, and to construct a three-stage narrative framework of “historical origins—core values—contemporary expression,” guiding AIGC content to reflect the temporal layers and spiritual lineage of cultural development.

Platforms can enhance the semantic structure and cultural alignment of generated content by embedding semantic tagging systems, introducing expert collaboration mechanisms, and setting up contextual prompts. Especially for intangible heritage projects with distinct regional characteristics, such as Miao silver ornaments, Chaoshan wood carvings, and Jingdezhen ceramics. Content generation strategies should be differentiated according to their cultural regionality and visual-semantic differences in order to avoid cultural misinterpretation and flattened expression caused by “template-based and homogenized reconstruction”.

Second, enhance cultural explanation mechanisms within human–AI interaction to deepen historical understanding and cultural resonance. In response to user feedback that they “can see but not fully understand,” it is recommended to embed multimodal interpretive layers in AIGC or AR heritage scenes. This includes features such as historical annotations, overlaid ancient maps, semantic tagging, reconstruction animations, and multilingual voiceovers. These components together form a “cultural comprehension layer”. It is a knowledge mediation structure that enables users to gradually develop a holistic understanding of heritage entities, historical trajectories, and cultural meanings through immersive spatial interaction and exploration. This, in turn, fosters cultural identification and emotional engagement.

Third, implement a risk-aware guidance mechanism for AI-generated content to reduce user resistance. To address concerns related to authenticity and algorithmic control, interventions should be considered in three key areas: (1) Transparency – clearly label all AI-generated content with its generation method and source materials (e.g., original texts or images) to enhance user awareness and trust. (2) Privacy Protection – adopt localized image processing, data minimization strategies, and user-consent protocols to mitigate risks of data misuse. (3) Expert Review – involve heritage scholars and domain experts in content review processes. Ensure that outputs related to historical information, architectural styles, or artifact backgrounds are culturally accurate and contextually appropriate, thereby preventing misrepresentation and overgeneralization.

Fourth, promote the institutional development of a collaborative governance mechanism between local culture and platform technology. Local governments should be encouraged to support the construction of cultural heritage knowledge graphs and should integrate them with AIGC platforms to enhance semantic precision. Simultaneously, establish expert review and content audit mechanisms, and incorporate digital dissemination outcomes into evaluation systems for heritage transmission. This would foster coordinated governance among platforms, institutions, and public policies, laying the foundation for a sustainable digital cultural ecosystem.

Finally, while this study offers theoretical innovation and practical insight, it also presents several limitations:

First, there are limitations in the cultural sample. The analysis centers on platforms such as “Cloud Tour of Dunhuang”, which largely represent mainstream, nationally endorsed heritage initiatives. Future research could extend the model to grassroots or localized intangible cultural heritage (ICH) projects to test the framework’s adaptability across diverse cultural contexts.

Second, the variable design primarily emphasizes cognitive dimensions, lacking a systematic integration of more complex factors such as emotional motivations, social influence, and platform-specific mechanisms. Future studies may consider incorporating constructs such as social capital, emotional attachment, or community belonging to broaden the theoretical scope and enhance the model’s explanatory powers.