Introduction

The rapid development of digital technology is profoundly changing the traditional model of cultural heritage protection and inheritance. The emergence of artificial intelligence generation technology provides unprecedented technical means for the digital reconstruction and innovative inheritance of endangered cultural heritage1,2. As an important component of Chinese civilization, Jingchu culture carries rich historical memory and cultural connotations through its abundant folk patterns3. However, these precious cultural symbols face the dual dilemma of inheritance disconnection and lack of innovation in the process of modernization. Traditional cultural heritage protection models often adopt static collection and display methods, which are difficult to stimulate the cultural identity and participation enthusiasm of the public, especially the younger generation. This one-way protection model can no longer adapt to the new characteristics and requirements of cultural communication in the digital age4,5.

In recent years, deep learning generative models represented by Stable Diffusion have shown amazing potential in the field of artistic creation6,7. These technologies can learn and understand complex visual patterns and generate new images with specific styles, opening up new paths for the digital protection and innovative development of cultural heritage. At the same time, the participatory culture and co-creation concepts of the Web 2.0 era are reshaping the ways of cultural production and consumption8. Users are no longer passive cultural recipients but active cultural creators and disseminators, and this transformation provides new possibilities for the living inheritance of cultural heritage9,10. However, how to organically combine advanced AI technology with community participation mechanisms to build a digital platform that can both protect cultural authenticity and stimulate innovation vitality has become a core issue that urgently needs to be resolved in the current field of cultural heritage digitalization.

To address these challenges, this study proposes three core research questions (RQs) that guide the investigation: RQ1 examines how to design a culturally-aware AI generation architecture that enhances the cultural authenticity of traditional pattern generation while maintaining competitive technical performance; RQ2 investigates what user participation mechanisms can effectively promote digital co-creation of cultural heritage and sustain long-term user engagement; RQ3 explores how community participatory platforms influence cultural diversity metrics and users’ cultural cognitive development over time. These research questions are interconnected, with RQ1 focusing on the technical foundation, RQ2 addressing the social mechanism design, and RQ3 evaluating the cultural impact of the integrated system.

This study aims to explore a new model of cultural heritage protection by constructing a community participatory Jingchu folk pattern generation platform based on Stable Diffusion, achieving deep integration of technology empowerment and cultural inheritance. The research makes three distinct contributions with different levels of empirical validation. First, at the technical level, we propose the Cultural-Aware Stable Diffusion (CA-SD) architecture, which introduces cultural-aware attention mechanisms and multi-scale feature fusion modules to enhance the AI model’s understanding and generation capabilities of cultural features. This contribution is empirically validated through comparative experiments with multiple baseline models and comprehensive ablation studies. Second, at the mechanism level, we construct a multi-level user participation framework based on CPDC (Cultural Participatory Design Cycle) theory, designing a creative evaluation system that integrates collective intelligence and expert review. This theoretical contribution is validated through a six-month longitudinal empirical study with 486 participants, employing statistical analyses including structural equation modeling and regression analysis. Third, we propose a blockchain-based intellectual property protection framework utilizing improved Shapley value algorithms for fair rights distribution. This component represents a conceptual contribution with preliminary prototype validation (N = 86), and comprehensive empirical validation in real commercial environments remains as future work. By systematically collecting and annotating Jingchu folk patterns to build a specialized training dataset, customizing and transforming the Stable Diffusion model to meet the special needs of cultural heritage, designing multi-level user participation mechanisms to stimulate community creative enthusiasm, and establishing a scientific evaluation system to quantitatively analyze the impact of the co-creation process on cultural diversity.

The relationship between research questions and contributions is illustrated in Fig. 1, which presents a conceptual framework mapping each RQ to its corresponding methodological approach, key contribution, and validation method. RQ1 is addressed through the CA-SD architecture development and validated via technical benchmarking and ablation studies. RQ2 is addressed through the CPDC framework construction and validated via longitudinal behavioral analysis and statistical modeling. RQ3 is addressed through cultural impact assessment and validated via diversity index tracking and pre-post cognitive testing.

Fig. 1
Fig. 1
Full size image

Research framework mapping research questions to contributions and validation methods.

The theoretical significance of the research lies in expanding the application boundaries of participatory design theory in the field of digital cultural heritage, providing a new analytical framework for understanding technology-mediated cultural co-creation processes. The practical significance is reflected in providing replicable technical solutions and operational models for the digital protection of traditional culture, especially in balancing multiple tensions such as cultural authenticity and innovation, professionalism and popularity, globalization and locality. It should be noted that while the proposed CA-SD architecture and CPDC theoretical framework demonstrate methodological transferability to other cultural contexts, the specific empirical findings reported in this study—including user behavior patterns, diversity evolution curves, and retention rates—are derived from the Jingchu cultural context and should not be directly generalized to other regional cultures without further validation. The research results can not only be directly applied to the protection and inheritance of Jingchu culture but also provide reference for the digital protection of other regional cultures and intangible cultural heritage, with important promotion value and application prospects.

Literature review and theoretical framework

Cultural heritage protection and inheritance in the digital age

The digital protection of cultural heritage has evolved from simple digital archiving to intelligent reconstruction, reflecting the bidirectional interaction between technological development and cultural needs. Early digitalization efforts mainly focused on converting material cultural heritage into digital formats for storage and display. With the maturity of technologies such as 3D scanning and virtual reality, digital protection has gradually expanded from two-dimensional planes to three-dimensional space, developing from static display to dynamic interaction. Chunlan et al.‘s11 decade review of digital transformation in Chinese museums shows that digital technology has not only changed the preservation methods of cultural heritage but more importantly reshaped the interaction mode between the public and cultural heritage, transforming cultural experience from one-way reception to two-way dialogue.

The emergence of metaverse technology has brought revolutionary changes to cultural heritage protection. By constructing immersive virtual environments, audiences can experience historical scenes and cultural activities across time and space limitations. Research published in the Buragohain et al.12 pointed out that metaverse applications in the cultural heritage field face multiple challenges such as technical standardization, content authenticity, and user experience design, but also provide unprecedented opportunities for the living inheritance of cultural heritage. Virtual reconstruction technology enables disappeared or damaged cultural heritage to be digitally restored. Emerging technologies such as Neural Radiance Fields can reconstruct high-quality three-dimensional models from limited image data, providing more efficient and accurate technical means for the digital protection of cultural heritage.

China has formed a unique policy system and practice model in cultural heritage digital protection. Vărzaru et al.‘s13 policy analysis research reveals the institutional guarantee mechanism for the digital transformation of Chinese museums, emphasizing a collaborative development model of government leadership, social participation, and technological innovation. Zhou et al.‘s14 survey of China’s intangible cultural heritage digital protection shows that although significant progress has been made, many challenges remain in standardization construction, resource integration, and sustainable development. These studies provide important references for understanding the Chinese path of cultural heritage digital protection and also lay the foundation for the policy recommendations section of this research.

Application of AI generation technology in the cultural and creative field

Artificial intelligence generation technology is triggering a profound transformation in the cultural and creative field, especially image generation technology based on diffusion models, which has shown amazing creative potential. As an open-source text-to-image generation model, Stable Diffusion has been widely applied in the cultural heritage field due to its powerful generation capabilities and flexible customizability. The automatic generation framework for architectural heritage facades developed by Kuang et al.15 proves the application value of Stable Diffusion in protecting historical architectural features. The research enables the model to understand and generate facade designs that conform to specific historical period architectural styles through fine-tuning, providing technical support for cultural protection in urban renewal.

AI generation of cultural heritage is not just a technical issue but involves complex issues such as cultural authenticity, artistic value, and ethical considerations. Zhou et al.16 proposed the concept of “innovation in inheritance” in their kite design innovation research. Through fine-tuning technology and LoRA (Low-Rank Adaptation) methods, they enable AI models to generate innovative designs while maintaining traditional kite cultural characteristics. This method provides new ideas for the revitalization of intangible cultural heritage. Yang et al.‘s17 research on the digital restoration of Yangshao painted pottery further proves the application potential of Stable Diffusion in the field of cultural relic restoration. By training specialized models, it can accurately understand and complete incomplete painted pottery patterns, providing powerful tools for archaeological research and cultural relic protection.

The emergence of Neural Radiance Fields (NeRF) technology has brought new possibilities for three-dimensional reconstruction of cultural heritage. Croce et al.‘s18 comparative study shows that NeRF has incomparable advantages over traditional photogrammetry technology in dealing with complex lighting conditions and incomplete datasets. This technology can reconstruct high-quality three-dimensional models from sparse image inputs, making it particularly suitable for the digital protection of cultural relics and historical buildings. Jaramillo and Sipiran19 applied diffusion networks to three-dimensional reconstruction of cultural heritage, demonstrating the powerful capabilities of AI technology in handling complex geometric structures and texture details. These studies provide the technical foundation for building more intelligent and efficient cultural heritage digitalization systems.

Participatory design theory and digital co-creation models

The development of participatory design theory has undergone a paradigm shift from industrial design to digital design. The co-creation theoretical framework proposed by Sanders and Stappers20 has become foundational work in this field. They define co-creation as “any form of collective creative behavior,” emphasizing the role transformation of users from passive recipients to active creators. This theoretical framework has been further developed and refined in the digital age. Sanders and Stappers21 systematically elaborated on generative research methods in their book “Convivial Toolbox,” providing specific tools and techniques for user participation in the design front-end.

The rise of digital platforms has provided new implementation scenarios and challenges for participatory design. Carroll and Beck’s22 co-design research on community water quality data platforms demonstrates how to achieve effective collaboration among multiple stakeholders in digital environments. The research found that digital platforms not only lower participation barriers but also enhance participants’ sense of engagement and achievement through data visualization and real-time feedback mechanisms. Botero et al.‘s23 research further explores the translation issues of participatory design in different fields, emphasizing that method selection and implementation strategies need to be adjusted according to specific contexts, which provides important insights for the platform design of this research.

The COVID-19 pandemic accelerated the migration of participatory design to online platforms, with distributed collaboration becoming the new normal. Dahl et al.‘s24 research published at the CHI conference focuses on facilitation skills in digital environments, pointing out that online participatory design requires new tool support and interaction strategies. The generative co-design framework proposed by Bird et al.25 provides a systematic method for end-to-end user participation in the medical innovation field. This framework emphasizes the importance of iterative design, continuous feedback, and value co-creation, and these principles are equally applicable to the field of cultural heritage digital protection.

Quantitative assessment framework for cultural IP diversity

The measurement and evaluation of cultural diversity has always been a core issue in cultural research. With the advent of the digital age, traditional qualitative analysis methods can no longer meet the needs of large-scale cultural data analysis. Gök et al.26 proposed a comprehensive measurement framework when studying cultural and ethnic diversity in the innovation field. This framework combines ecological diversity indices and sociological cultural dimension theories, providing scientific methods for quantitatively assessing the diversity of cultural content. This interdisciplinary research method provides a theoretical basis for the assessment of cultural IP diversity in this research.

The rise of digital humanities provides new methodological tools for cultural diversity research. Mahony27 explores the importance of cultural diversity in digital humanities research, emphasizing the unique value of computational methods in revealing cultural patterns and trends. Through techniques such as text mining, image analysis, and network analysis, researchers can extract meaningful patterns and insights from massive cultural data. The cultural sustainability assessment framework developed by Zhao et al.28 incorporates the cultural impact of environmental facilities, providing a multidimensional evaluation system. This integrative thinking is of great significance for understanding the impact of digital platforms on cultural ecology.

The value assessment of cultural IP needs to consider multiple dimensions such as its uniqueness, dissemination power, and innovation potential. Su et al.29, starting from the perspective of intangible cultural heritage inheritors, developed a specialized scale to measure the perceived value of cultural heritage. The research found that inheritors’ understanding of cultural value includes not only historical and artistic dimensions but also modern factors such as social identity, economic potential, and innovation space. This diversified concept of value provides important reference for the evaluation of user-generated content in this research, reminding us not to neglect the intrinsic value and social significance of culture while pursuing technological innovation.

Based on the comprehensive review of existing literature, we identify several research gaps that motivate the current study. First, while AI generation technologies have been applied to cultural heritage, existing approaches lack domain-specific architectural adaptations that explicitly encode cultural knowledge and constraints. Second, participatory design frameworks have not been systematically integrated with AI generation systems in the cultural heritage context, leaving the mechanisms of technology-mediated cultural co-creation underexplored. Third, quantitative assessment methods for cultural diversity in AI-generated content remain underdeveloped, particularly regarding the dynamic evolution of diversity over extended periods of user engagement. The present study addresses these gaps by proposing an integrated framework that combines culturally-aware AI architecture with participatory design mechanisms, validated through longitudinal empirical investigation. Table 1 summarizes the positioning of this research relative to existing studies across key dimensions.

Table 1 Research positioning relative to existing literature.

Design of Jingchu folk pattern generation model

Dataset construction and cultural feature extraction

The Jingchu folk pattern dataset comprises 8,500 high-resolution images from Hubei Provincial Museum and Jingzhou Museum collections, supplemented by 1,200 contemporary interpretations from folk artists. This diverse collection encompasses Chu lacquerware patterns, bronze decorative motifs, and traditional embroidery designs, ensuring comprehensive representation of regional cultural heritage. The original images were captured at 2048 × 2048 pixel resolution using professional museum digitization equipment, then preprocessed and downsampled to 512 × 512 pixels for model training. The dataset was partitioned into training set (80%, 7,760 images), validation set (10%, 970 images), and test set (10%, 970 images) using stratified sampling to ensure balanced representation of all pattern categories across splits.

Data preprocessing procedures included several standardization steps to ensure training quality and consistency. Color normalization was applied using histogram equalization to account for variations in lighting conditions during museum photography. Contrast enhancement using adaptive histogram equalization (CLAHE) improved the visibility of fine pattern details. Data augmentation techniques included random horizontal flipping (probability 0.5), random rotation (± 15 degrees), random cropping (scale 0.8-1.0), and color jittering (brightness ± 0.1, contrast ± 0.1, saturation ± 0.1). These augmentation strategies were designed to preserve cultural authenticity while increasing training sample diversity.

The multi-level semantic annotation framework (Fig. 2) captures both structural and cultural dimensions. Spatial combination patterns (Fig. 2a) identify symmetry axes, repetitive boundaries, and hierarchical nesting—fundamental rules governing Jingchu composition. Cultural semantic density mapping (Fig. 2b) visualizes information concentration through heat maps, highlighting regions of cultural significance. Style feature flow (Fig. 2c) reveals dynamic aesthetic qualities via vector fields, capturing inherent movement characteristic of Jingchu designs. The annotation process employed a rigorous multi-annotator protocol to ensure reliability. Three expert annotators from folklore studies, art history, and cultural heritage conservation independently labeled each image. A fourth senior expert resolved disagreements through majority voting when initial annotations conflicted. Inter-annotator agreement was assessed using Cohen’s Kappa coefficient, achieving κ = 0.82 (95% CI: 0.79–0.85), indicating substantial agreement. For complex semantic categories, Fleiss’ Kappa reached 0.78, demonstrating robust consistency across annotators. Expert annotators and crowdsourced contributors collaboratively generated 237 cultural tags and 1,856 visual descriptors through rigorous consistency testing.

Fig. 2
Fig. 2
Full size image

Multi-level semantic annotation framework for Jingchu folk patterns.

Feature extraction employs hybrid deep learning and traditional processing methods. The repetitive pattern recognition process (Fig. 3) demonstrates multi-scale analysis capabilities. Spatial autocorrelation (Fig. 3a) identifies periodic structures through 3D surface peaks corresponding to repetition intervals. Fourier transform analysis (Fig. 3b) reveals frequency components essential for detecting pattern variations. Multi-scale morphological decomposition (Fig. 3c) separates coarse, medium, and fine structural components, mirroring traditional layered construction practices. Hierarchical structure analysis (Fig. 3d) constructs tree representations where node sizes reflect element complexity and connections indicate containment relationships.

Fig. 3
Fig. 3
Full size image

Repetitive pattern recognition process.

The extraction pipeline produces 128-dimensional feature vectors encoding visual characteristics and cultural semantics. Validation through expert review confirmed alignment with traditional classification systems, with particularly strong performance in capturing cloud patterns and geometric designs. These culturally-grounded features provide essential foundation for training generation models that respect authentic design principles while enabling creative exploration. Table 2 presents the detailed dataset specifications and preprocessing parameters for reproducibility.

Table 2 Dataset specifications and preprocessing parameters.

Model architecture improvements based on stable diffusion

The original architecture of the Stable Diffusion model has certain limitations when processing culture-specific pattern generation, mainly manifested in insufficient understanding of cultural constraints and inadequate detail fidelity. To address these issues, this research proposes the Cultural-Aware Stable Diffusion (CA-SD) architecture, which introduces cultural-aware attention mechanisms and multi-scale feature fusion modules based on the original U-Net structure. The CA-SD architecture builds upon Stable Diffusion v1.5 as the base model, with architectural modifications specifically designed to encode cultural knowledge into the generation process.

Figure 4 shows the core innovations of the architecture improvements through comparison. Figure 4a depicts the standard U-Net structure of the original Stable Diffusion, where encoder and decoder are connected through skip connections. Figure 4b highlights the insertion positions of the newly added cultural-aware attention modules in the network, distributed across feature layers of different resolutions. Specifically, cultural-aware attention modules are inserted at the 2nd, 3rd, and 4th layers of both the encoder and decoder paths, operating at resolutions of 64 × 64, 32 × 32, and 16 × 16 respectively. This multi-resolution placement ensures that cultural features are considered at both fine-grained detail and high-level semantic levels. Figure 4c shows in detail the internal structure of a single cultural-aware attention module, including the generation process of query, key, and value, and the injection method of cultural features. Figure 4d shows how this mechanism concentrates attention on culture-related pattern regions through visualization of attention weight matrices.

Fig. 4
Fig. 4
Full size image

Model architecture improvements.

The mathematical expression of the cultural-aware attention mechanism is:

$${\text{Attention}}(Q,K,V)={\text{softmax}}\left( {\frac{{Q{K^T}}}{{\sqrt {{d_k}} }}+\lambda C} \right)V$$
(1)

Where Q, K, V represent query, key, and value matrices respectively, \({d_k}\) is the dimension of key vectors, C is the cultural feature matrix, and λ is a hyperparameter controlling the influence strength of cultural features. The cultural feature matrix C is derived from the 128-dimensional cultural feature vectors extracted during dataset construction, projected into the attention space through a learned linear transformation. The hyperparameter λ was determined through grid search on the validation set, with optimal value λ = 0.3 balancing cultural authenticity and generation diversity.

The training loss function of the model adds a cultural consistency loss term on top of the standard diffusion loss:

$${L_{total}}={L_{diffusion}}+\alpha {L_{cultural}}+\beta {L_{style}}$$
(2)

Where \({L_{diffusion}}\) is the standard denoising diffusion loss, \({L_{cultural}}\) measures the consistency between generated patterns and cultural features, \({L_{style}}\) ensures style coherence, and α and β are weight coefficients balancing different loss terms. The cultural consistency loss \({L_{cultural}}\) is computed as the cosine distance between the cultural feature vector of the generated image and the average cultural feature vector of the corresponding category in the training set. The style loss \({L_{style}}\) employs Gram matrix matching at multiple VGG-19 layers (conv1_1, conv2_1, conv3_1, conv4_1) to ensure stylistic coherence. Through extensive hyperparameter tuning on the validation set, optimal values were determined as α = 0.2 and β = 0.15.

Table 3 presents the complete training configuration and hyperparameter settings essential for reproducibility. All experiments were conducted on a high-performance computing cluster with consistent hardware configuration.

Table 3 Training configuration and hyperparameter settings.

Cultural constraints and generation control mechanisms

The design of cultural constraints needs to balance the relationship between innovation and cultural authenticity. This research proposes a hierarchical constraint system including both hard constraints and soft constraints. Hard constraints mainly involve cultural taboos and compositional rules that must be followed, implemented by setting inviolable boundary conditions during the generation process. Soft constraints allow a certain degree of variation and innovation, guiding the generation process through probability distributions.

Central to the constraint system is the precise definition and measurement of cultural distance, which quantifies the deviation between generated patterns and traditional references. We define cultural distance as the cosine distance between the generated pattern and traditional reference patterns in the cultural feature space:

$${D_{cultural}}=1 - \cos ({f_{gen}},{f_{ref}})=1 - \frac{{{f_{gen}} \cdot {f_{ref}}}}{{|{f_{gen}}||{f_{ref}}|}}$$
(3)

where \({f_{gen}}\) and \({f_{ref}}\) are 128-dimensional cultural feature vectors of the generated pattern and reference pattern respectively. These vectors are extracted using a pre-trained VGG-19 network followed by a cultural adaptation layer—a two-layer MLP with 512 hidden units trained to project visual features into the cultural semantic space. The reference feature \({f_{ref}}\) is computed as the centroid of all training samples within the corresponding pattern category.

The cultural distance threshold selection was empirically calibrated through expert validation studies. We generated 200 pattern samples with controlled cultural distances ranging from 0.1 to 0.9 and conducted blind expert evaluation with 15 folklore specialists. Based on the acceptance rate curve as shown in Fig. 5, we established the following interpretation framework: \({D_{cultural}}<0.3\) indicates high traditionality (traditional replica type) with 92% expert acceptance; \(0.3 \leqslant {D_{cultural}}<0.6\) represents moderate innovation (optimal acceptance zone) with 78.3% acceptance; \({D_{cultural}} \geqslant 0.6\) signifies radical innovation (potential cultural deviation) with only 34% acceptance. The range 0.4–0.6 was identified as the optimal innovation interval, where works maintain recognizable cultural characteristics while demonstrating creative expression.

Fig. 5
Fig. 5
Full size image

Cultural distance calibration and expert acceptance rate curve.

Figure 6 shows the implementation mechanism of constraints. Figure 6a presents the judgment logic of hard constraints in the form of a decision tree, with each node representing a cultural rule checkpoint. Figure 6b uses probability distribution graphs to show how soft constraints work, with curves of different colors representing the occurrence probabilities of different cultural elements. The soft constraint probability distributions were learned from the training data, where the probability \(P({e_i}|c)\) of element \({e_i}\) appearing given cultural context c is estimated using kernel density estimation on co-occurrence statistics. Figure 6c shows how constraints affect the diffusion process through generation trajectory graphs, with red trajectories representing unconstrained generation and blue trajectories representing constrained generation. Figure 6d shows comparison samples of patterns generated under different constraint strengths, with constraint strength gradually increasing from left to right.

Fig. 6
Fig. 6
Full size image

Cultural constraints implementation.

The generation control mechanism adopts a conditional guidance method, guiding the generation direction by injecting control signals at each step of the diffusion process. The calculation formula for control signals is:

$${\varepsilon _\theta }({x_t},t,c)={\varepsilon _\theta }({x_t},t)+s \cdot {\nabla _{{x_t}}}\log p(c|{x_t})$$
(4)

Where \({\varepsilon _\theta }\) is the noise prediction network, \({x_t}\) is the noise image at time step t, c is conditional information, and s is the guidance strength.

Fine-grained control is achieved by introducing a local editing mechanism, which allows users to make precise adjustments to specific regions of generated patterns. Figure 7 details the workflow of local editing. Figure 7a shows the process of users selecting editing regions through the interactive interface. Figure 7b displays the feature extraction and encoding process of the editing region. Figure 7c depicts the fusion strategy of local features and global features, using gradient masks to ensure natural transitions at editing boundaries. Figure 7d shows the effect comparison after multiple iterative edits, demonstrating the flexibility and precision of this mechanism.

Fig. 7
Fig. 7
Full size image

Local editing mechanism.

Model training strategy and optimization methods

Model training adopts a progressive strategy, gradually transitioning from low resolution to high resolution. This method not only improves training efficiency but also helps the model learn the hierarchical structure of patterns. The training process is divided into three stages: the basic feature learning stage is conducted at 64 × 64 resolution, mainly learning the basic forms of cultural elements; the structure refinement stage is conducted at 256 × 256 resolution, focusing on optimizing the combinatorial relationships between elements; the detail enhancement stage is conducted at 512 × 512 resolution, focusing on the generation of textures and decorative details. Figure 8 shows the changes in key indicators during the training process through multiple subgraphs. Figure 8a shows the loss function curves at different stages, where changes in feature learning patterns can be observed during stage transitions. Figure 8b displays the trend of generation quality evaluation metrics FID and LPIPS with training epochs. Figure 8c shows the clustering of cultural features in latent space through t-SNE visualization, with different colors representing different cultural categories. Figure 8d uses a heatmap to show the model’s generation capability scores for various cultural elements at different training stages.

Fig. 8
Fig. 8
Full size image

Training process and optimization.

To isolate and validate the contribution of each proposed component, we conducted comprehensive ablation studies systematically removing individual modules from the full CA-SD architecture. Table 4 presents the ablation study results, demonstrating the incremental contribution of each component to overall performance. The baseline Stable Diffusion model achieves FID of 42.3 and cultural authenticity score of 0.72. Adding the cultural-aware attention mechanism alone reduces FID to 35.6 (15.8% improvement) and increases cultural authenticity to 0.81, representing the largest single-component contribution. The multi-scale feature fusion module further improves performance to FID 31.2 and cultural authenticity 0.85. Finally, incorporating the cultural consistency loss achieves the full model performance of FID 28.7 and cultural authenticity 0.89. These results demonstrate that all proposed components contribute meaningfully to the final performance, with the cultural-aware attention mechanism providing the most substantial improvement in both technical quality and cultural fidelity.

Table 4 Ablation study results.

Table 5 presents additional ablation results examining the effect of different hyperparameter configurations, demonstrating the robustness of our chosen settings and the sensitivity of performance to key parameters.

Table 5 Hyperparameter sensitivity analysis.

Data augmentation strategies are specifically designed for the characteristics of cultural patterns, including culture-rule-based augmentation and style-transfer-based augmentation. Culture-rule-based augmentation expands the dataset by performing culturally logical transformations on original patterns, such as symmetric copying and module recombination. Style-transfer-based augmentation uses pre-trained style transfer networks to convert different manifestations of the same cultural theme on different carriers. The mathematical representation of the augmentation process is:

$$\tilde {x}={T_{cultural}}(x,\theta )+\epsilon$$
(5)

Where \({T_{cultural}}\) represents culture-preserving transformation, θ is the transformation parameter, and ε is the added small perturbation.

The optimization algorithm uses the AdamW optimizer with cosine annealing learning rate scheduling strategy. The learning rate changes according to:

$${\eta _t}={\eta _{min}}+\frac{1}{2}({\eta _{max}} - {\eta _{min}})(1+\cos (\frac{{t\pi }}{T}))$$
(6)

Where \({\eta _{max}}\) and \({\eta _{min}}\) are the maximum and minimum learning rates respectively, t is the current epoch, and T is the total number of epochs.

Model evaluation adopts a multidimensional evaluation system including both technical indicators and cultural indicators. Technical indicators include generation quality (FID score), diversity (LPIPS distance), and generation efficiency (inference time). Cultural indicators are obtained through expert scoring and user surveys, including dimensions such as cultural authenticity, innovation, and aesthetic value.

Construction of user co-creation mechanism model

Theoretical model of participatory design

The user co-creation mechanism builds upon Sanders and Stappers’ participatory design theory, integrating social cognitive theory and innovation diffusion theory to construct the CPDC (Cultural Participatory Design Cycle) model. This framework conceptualizes user participation as a dynamic system encompassing four stages: perception, understanding, creation, and sharing, with each stage involving interactions between users, cultural content, technology platforms, and community members.

Figure 9 presents the CPDC architecture through three concentric levels. The central core represents participation stage flow, while surrounding circles denote individual (cognitive processes, motivations), community (collaboration, knowledge dissemination), and cultural (values, symbols) interaction levels. Spiraling connections indicate progressive relationships, forming an adaptive, self-organizing system.

Fig. 9
Fig. 9
Full size image

CPDC model architecture.

Creative motivation emerges from intrinsic and extrinsic factors. Intrinsic motivation encompasses cultural identity through emotional attachment and value identification, alongside self-realization needs including ability display and creative expression. Extrinsic motivation includes social recognition via peer appreciation and expert certification, plus economic rewards from commercial opportunities. Cultural inheritors prioritize identity preservation, designers seek creative expression, while ordinary users value social connection and recognition.

Four key moderators influence participation behavior: cultural cognitive level affects work authenticity; technical ability reduces participation barriers; time investment reflects commitment depth; and social network density influences knowledge acquisition. Users with deeper cultural knowledge demonstrate stronger intrinsic motivation and produce more culturally authentic outputs.

Social learning operates through dual channels. Observational learning occurs when users analyze exemplary works to expand creative repertoires. Interactive learning emerges through comments, collaborations, and mentorship providing direct feedback. Knowledge dissemination follows small-world network principles—maintaining local connections within interest groups while enabling rapid global spread of innovations. Opinion leaders and cultural experts serve as bridge nodes, connecting communities and facilitating creative cross-pollination.

To empirically validate the CPDC theoretical framework and address concerns regarding causal demonstration of mechanism effectiveness, we employed structural equation modeling (SEM) to analyze the hypothesized relationships among key constructs. The SEM analysis was conducted using Mplus 8.3 with maximum likelihood estimation on data collected from 486 participants throughout the six-month experimental period. Model fit indices indicated acceptable fit to the observed data: CFI = 0.94, TLI = 0.92, RMSEA = 0.058 (90% CI: 0.048–0.068), SRMR = 0.045, and χ²/df = 2.34. These values meet or exceed conventional thresholds for good model fit (CFI > 0.90, TLI > 0.90, RMSEA < 0.08, SRMR < 0.08).

Figure 10 presents the validated CPDC path model with standardized path coefficients and significance levels. The analysis revealed that all hypothesized paths were statistically significant at p < 0.001. Cultural cognitive level demonstrated the strongest direct effect on participation depth (β = 0.42, SE = 0.05, t = 8.40, p < 0.001), supporting the theoretical proposition that deeper cultural understanding facilitates more meaningful engagement. Technology acceptance showed a substantial positive effect on participation depth (β = 0.38, SE = 0.04, t = 9.50, p < 0.001), confirming the importance of reducing technical barriers for user engagement. Social connection strength also significantly predicted participation depth (β = 0.31, SE = 0.04, t = 7.75, p < 0.001), validating the role of community networks in sustaining user involvement.

Fig. 10
Fig. 10
Full size image

Validated CPDC path model with standardized coefficients.

Downstream effects of participation depth were equally robust. Participation depth strongly predicted creation quality (β = 0.56, SE = 0.04, t = 14.00, p < 0.001) and cultural identity enhancement (β = 0.48, SE = 0.05, t = 9.60, p < 0.001). Additionally, indirect effects were examined using bootstrap confidence intervals (5,000 samples). The indirect effect of cultural cognitive level on creation quality through participation depth was significant (indirect β = 0.24, 95% CI: 0.18–0.30), as were the indirect effects of technology acceptance (indirect β = 0.21, 95% CI: 0.16–0.27) and social connection strength (indirect β = 0.17, 95% CI: 0.12–0.23). Table 6 summarizes the complete path analysis results with effect sizes and confidence intervals.

Table 6 CPDC model path analysis results.

To further strengthen causal inference beyond the SEM analysis, we conducted additional validation analyses. First, a longitudinal panel analysis with fixed effects was performed using monthly measurements across the six-month period, controlling for time-invariant individual characteristics. The fixed effects model confirmed that within-person changes in participation depth significantly predicted within-person changes in creation quality (β = 0.35, p < 0.001), ruling out confounding by stable individual differences. Second, we employed propensity score matching (PSM) to compare high-engagement (weekly active time > 10 h) and low-engagement (weekly active time < 3 h) groups, matching on baseline cultural knowledge, age, education, and professional background. After matching, the two groups showed no significant differences on baseline characteristics (all p > 0.10), and the effect of engagement level on creation quality remained significant with a large effect size (Cohen’s d = 0.72, 95% CI: 0.58–0.86).

This framework reveals that successful platforms must balance accessibility versus quality, tradition versus innovation, and individual expression versus collective identity. The CPDC model provides analytical tools for understanding these dynamics and designing interventions supporting sustainable cultural ecosystem development.

Multi-level user participation framework

Participation level division and role definition

The multi-level user participation framework divides participants into four levels: cultural guardians, creative leaders, active contributors, and ordinary participants, with each level undertaking different functions and responsibilities. This hierarchical design is not a fixed hierarchical system but a dynamic classification system based on user behavior characteristics. Users can flow between different levels through continuous participation and ability improvement.

Fig. 11
Fig. 11
Full size image

User capability radar and transformation paths.

Figure 11 shows the capability characteristic distribution and transformation paths of four types of users through a combination of radar charts and flow paths. The six dimensions of the radar chart are cultural knowledge, creative skills, social activity, innovation ability, influence, and persistence. Polygons of different colors represent the capability distribution characteristics of different user types. The arrows and percentage values in the figure show typical transformation paths and expected conversion rates between adjacent levels. For example, the conversion rate from ordinary participants to active contributors is designed to be 15–20%. These values are determined based on behavioral science theory and empirical data from similar platforms.

The dynamic evaluation model of user levels is based on multidimensional indicators:

$${L_i}={w_1} \cdot {C_i}+{w_2} \cdot {Q_i}+{w_3} \cdot {A_i}+{w_4} \cdot {I_i}$$
(7)

Where \({L_i}\) is the level score of user i, \({C_i}\) is the creation quantity, \({Q_i}\) is the work quality score, \({A_i}\) is the activity level, \({I_i}\) is the influence index, and \({w_1}\) to \({w_4}\) are the corresponding weights.

Participation path design and advancement mechanism

The design of user participation paths follows progressive principles, starting from simple browsing and collecting, gradually transitioning to commenting, adaptation, independent creation, and collaborative creation. This design is based on scaffolding theory, providing appropriate support and guidance to help users gradually build capabilities and confidence.

Table 7 details the stage design and support mechanisms of participation paths. Each stage sets clear behavioral goals, required capabilities, platform support, and advancement conditions, forming a complete user growth system. The novice guidance stage focuses on cultivating users’ familiarity with the platform and basic aesthetic abilities. The primary participation stage begins to build users’ expression abilities and social connections. The intermediate creation stage strengthens creative skills and cultural understanding. The advanced collaboration stage cultivates leadership and innovation abilities. The expert leadership stage ultimately achieves the goals of knowledge inheritance and community building.

Table 7 User participation path stage design.

Collaboration mode and team creation mechanism

Figure 12 shows the structural characteristics of the collaboration network using a two-layer network diagram. The upper layer shows the macro team organization network, with nodes representing different creative teams and line thickness indicating collaboration frequency between teams. The lower layer shows the micro individual collaboration network, displaying role division and interaction patterns among team members. The two network layers are connected through vertical lines, clearly presenting the collaborative hierarchical relationship from individuals to teams. Network analysis shows that the ideal team size is 3–5 people, which ensures creative diversity while avoiding excessive coordination costs.

Fig. 12
Fig. 12
Full size image

Two-layer collaboration network.

Creative contribution evaluation model

Multidimensional evaluation indicator system

The evaluation of creative contributions needs to balance multiple dimensions including cultural value, innovation degree, technical quality, and community impact. This research constructs a comprehensive evaluation system containing four first-level indicators and sixteen second-level indicators. The design of the evaluation system follows the basic principles of Analytic Hierarchy Process (AHP), determining the relative importance of each indicator through expert consultation and the Delphi method.

Figure 13 shows the hierarchical structure and weight configuration of the evaluation indicator system through a combination of tree structure diagram and weight distribution pie charts. The tree diagram clearly shows the decomposition process from overall evaluation to first-level indicators and then to second-level indicators, with each node labeled with indicator name and local weight. The four pie charts on the right show the weight distribution of second-level indicators under each first-level indicator respectively, intuitively reflecting evaluation priorities. The cultural value dimension (weight 30%) focuses on examining the accuracy of cultural element usage and the communication effect of cultural connotations. The innovation degree dimension (weight 25%) evaluates innovative performance while maintaining cultural characteristics. The technical quality dimension (weight 25%) focuses on visual effects and technical completion. The community impact dimension (weight 20%) reflects the dissemination effect and inspirational role of works.

Fig. 13
Fig. 13
Full size image

Evaluation indicator system.

To ensure transparent and reproducible evaluation, we developed a detailed scoring rubric for each dimension with explicit anchoring criteria. Table 8 presents the cultural authenticity evaluation framework, which serves as the primary assessment instrument for expert reviewers. Each sub-dimension is scored on a 1–10 scale with clearly defined anchoring points to minimize subjective interpretation variance.

Table 8 Cultural authenticity evaluation rubric.

The calculation of comprehensive scores uses the weighted average method:

$${S_{total}}=\sum\limits_{{i=1}}^{4} {{w_i}} \left( {\sum\limits_{{j=1}}^{4} {{w_{ij}}} \cdot {s_{ij}}} \right)$$
(8)

Where \({w_i}\) is the first-level indicator weight, \({w_{ij}}\) is the second-level indicator weight, and \({s_{ij}}\) is the specific score.

Combination mechanism of collective intelligence and expert review

The evaluation mechanism adopts a combination of collective intelligence and expert review, achieving effective balance between the two through algorithm optimization. Group evaluation collects scores and comments from ordinary users through crowdsourcing, using anomaly detection algorithms to filter malicious evaluations. Expert review is conducted by a review panel composed of cultural experts and design experts, focusing on cultural accuracy and artistic value.

Table 9 shows the weight allocation strategy of the hybrid evaluation mechanism. Weight allocation is dynamically adjusted according to work type and evaluation stage. Traditional replica works rely more on expert evaluation to ensure cultural accuracy, while innovative fusion works give higher weight to group evaluation to reflect market acceptance. In the initial evaluation stage, expert weight is higher. As group evaluation data accumulates and credibility increases, the weight of group evaluation gradually increases. This dynamic adjustment mechanism ensures both the professionalism of evaluation and fully leverages the advantages of collective intelligence.

Table 9 Weight allocation strategy for group evaluation and expert review.

To address concerns regarding evaluation reliability and expert scoring consistency, we implemented comprehensive quality assurance procedures for the expert evaluation process. The expert panel comprised 15 specialists selected from three complementary domains: 5 folklore scholars with expertise in Jingchu cultural traditions, 5 professional designers with experience in traditional pattern applications, and 5 cultural heritage conservation specialists from regional museums. All experts possessed a minimum of 10 years of professional experience in their respective fields and demonstrated familiarity with Jingchu folk art traditions.

Prior to the formal evaluation, all experts participated in a four-hour calibration training session. The training included detailed review of the scoring rubric, discussion of boundary cases, and practice scoring of 20 standardized samples with known quality levels. Experts discussed their scores collectively to establish shared understanding of evaluation criteria. Following calibration, a pilot evaluation was conducted on an additional 30 samples, and inter-rater reliability was assessed. Any expert whose scores deviated more than 1.5 standard deviations from the group mean on more than 20% of samples received additional individual calibration.

Table 10 presents the inter-rater reliability metrics for the expert evaluation panel. The Intraclass Correlation Coefficient (ICC) using a two-way random effects model for absolute agreement was 0.87 (95% CI: 0.82–0.91), indicating excellent reliability. Kendall’s coefficient of concordance (W) was 0.81, and Fleiss’ Kappa for multi-rater agreement was 0.78, both indicating substantial agreement according to conventional benchmarks. These reliability metrics were computed across the full set of 500 randomly selected works evaluated during the empirical study.

Table 10 Expert evaluation Inter-rater reliability metrics.

Bias control procedures were implemented throughout the evaluation process. All evaluations were conducted using a double-blind protocol where experts were unaware of creator identity, user level, or creation method (AI-assisted vs. manual). Works were presented in randomized order unique to each expert to prevent order effects. Each work was independently evaluated by a minimum of three experts, and the final score was computed as the median of all expert scores to reduce the influence of outlier judgments. For works where expert scores differed by more than 3 points (on the 10-point scale), a fourth senior expert conducted an additional review, and discrepancies were resolved through structured discussion.

The calculation of evaluation credibility considers the evaluator’s historical accuracy and professional level:

$${C_{eval}}=\alpha \cdot AC{C_{hist}}+\beta \cdot EX{P_{level}}+\gamma \cdot CO{N_{groupS}}$$
(9)

Where \({C_{eval}}\) is the evaluation credibility, \(AC{C_{hist}}\) is the historical accuracy rate, \(EX{P_{level}}\) is the professional level, and \(CO{N_{groupS}}\) is the consistency with the group.

Dynamic feedback and continuous improvement mechanism

Evaluation results are promptly delivered to creators through dynamic feedback mechanisms, helping them understand the strengths and weaknesses of their works. The presentation of feedback information combines visualization and text analysis, providing not only quantitative scores but also qualitative feedback through word clouds, sentiment analysis, and other methods. The continuous improvement mechanism encourages creators to iteratively optimize their works based on feedback. The platform records the improvement effects of each iteration, forming a creative growth trajectory.

Intellectual property protection and incentive mechanism design

The blockchain-based copyright system employs consortium blockchain architecture with distributed nodes maintained by cultural institutions and platform operators. It should be noted that this intellectual property protection framework is presented primarily as a conceptual contribution. While the technical architecture has been fully designed and a functional prototype has been developed, comprehensive empirical validation in real commercial environments was beyond the scope of the current six-month experimental period. The following description presents both the theoretical design and preliminary validation results from limited-scale testing.

Digital fingerprint generation combines perceptual hashing with deep learning features to ensure uniqueness and collision resistance:

$$FP=H(PHash(I)||DeepFeatures(I)||metadata)$$
(10)

Where PHash(I) extracts perceptual hash features invariant to minor modifications, DeepFeatures(I) captures high-level semantic patterns through pre-trained CNNs, and metadata includes timestamp, creator ID, and cultural tags. The concatenated features undergo SHA-256 hashing to produce immutable 256-bit fingerprints. Smart contracts automatically execute copyright operations including transfers, licensing, and revenue distribution based on predefined rules encoded in Solidity.

Rights distribution adopts an improved Shapley value algorithm accounting for creative contributions:

$${\phi _i}=w_{i}^{{original}} \cdot {R_{base}}+\sum\limits_{{S \subseteq N \setminus i}} {\frac{{|S|!(|N| - |S| - 1)!}}{{|N|!}}} [v(S \cup i) - v(S)]$$
(11)

Where \({\phi _i}\) represents participant i’s revenue share, \({R_{base}}\) is the base revenue pool, \(w_{i}^{{original}}\) weights original contribution, and v(S) evaluates coalition S’s value function. This mechanism ensures fair distribution proportional to marginal contributions while incentivizing both original creation and collaborative enhancement.

The incentive effect prediction model considers user-incentive matching:

$${E_{incentive}}=\sum\limits_{{k=1}}^{K} m atch({U_i},{I_k}) \cdot value({I_k})$$
(12)

Where \({U_i}\) represents user characteristic vectors (cultural knowledge, technical skill, social influence), \({I_k}\) denotes incentive types (points, badges, commercial opportunities), match() calculates compatibility scores through cosine similarity, and value() estimates perceived incentive worth.

During the experimental period, we conducted preliminary validation of the blockchain copyright system through a small-scale pilot study involving 86 participants who volunteered for the prototype testing phase. The pilot ran for eight weeks concurrent with the main experiment, allowing us to assess basic system functionality and user acceptance. Table 11 summarizes the preliminary validation results. Copyright registration completion rate reached 92.3%, with users successfully registering 847 original works on the blockchain. The average time from submission to confirmed registration was 2.3 min, meeting the design target of under 5 min. User satisfaction with system transparency was measured at 4.2 out of 5 through post-pilot surveys. However, several limitations were identified: 12.7% of users reported difficulty understanding the Shapley value distribution mechanism, and the system has not yet processed any actual commercial transactions, leaving the revenue distribution functionality untested in real economic conditions.

Table 11 Blockchain copyright system preliminary validation results (N = 86).

The pilot study also revealed important user feedback regarding the incentive mechanism. Participants expressed strong support for transparent attribution (mean agreement = 4.4/5) and the concept of fair contribution-based distribution (mean agreement = 4.3/5). However, concerns were raised about the complexity of the Shapley value calculation (23.4% of participants requested simpler explanations) and uncertainty about long-term economic sustainability of the platform (18.6% expressed concerns). These findings inform future development priorities.

Given these preliminary results, we explicitly acknowledge that the blockchain-based copyright and incentive mechanism represents a conceptual contribution with initial prototype validation rather than a fully empirically validated system. Comprehensive validation including real commercial transactions, long-term economic sustainability analysis, cross-platform interoperability testing, and large-scale user adoption studies constitute priority directions for future research. The theoretical framework and technical architecture presented here provide a foundation for such future empirical investigation.

Empirical research and model validation

Experimental design and data collection

The empirical part of this research adopts a six-month longitudinal research design, conducted from March 2024 to August 2024 in Wuhan and surrounding areas, recruiting a total of 486 participants for platform testing and data collection. The composition of participants fully considers representativeness and diversity, including 68 intangible cultural heritage inheritors, 124 design professionals and students, 178 cultural enthusiasts, and 116 ordinary citizens, with ages ranging from 18 to 72 years old, ensuring that perspectives on cultural inheritance from different generations are represented. This study was approved by the Institutional Review Board of Hubei Institute of Fine Arts. All procedures were conducted in accordance with the ethical principles of the World Medical Association Declaration of Helsinki for research involving human participants. Written informed consent has been obtained from the subjects to publish this paper.

Table 12 shows the detailed demographic characteristics and cultural background distribution of participants. The data shows basically balanced gender ratio among participants (48.6% male, 51.4% female). This balanced distribution helps avoid the impact of gender bias on research results. The education level distribution shows obvious normal characteristics, with bachelor’s degree holders accounting for the highest proportion (42.8%), followed by master’s degree and above (28.4%) and associate degree (20.1%), while high school and below only accounts for 8.7%. This educational structure reflects the basic requirements of digital platforms for users’ cultural literacy, and also suggests the need to strengthen support for low-education groups in the future. In terms of cultural background, 56.2% of participants have some understanding of Jingchu culture, with 23.5% having relevant creative experience. This difference in knowledge structure provides a good foundation for studying participation patterns of users with different cognitive levels.

Table 12 Demographic characteristics distribution of experimental participants.

The experimental process was conducted in three stages, with each stage setting clear research objectives and data collection focuses. Figure 14 shows the experimental progress and data accumulation through a combination of timeline and data volume changes. The Gantt chart in the upper half clearly marks the time distribution and main tasks of the three stages. The initial stage (March-April) focuses on platform deployment, user recruitment, and basic training. The development stage (May-June) mainly conducts creative practice and community building. The mature stage (July-August) focuses on deep creation and effect evaluation. The area chart in the lower half shows the cumulative growth curves of different types of data. Behavioral log data shows nearly linear growth, ultimately reaching 3.2 million entries. User-generated content shows rapid growth during the development stage, then stabilizes. Social interaction data shows accelerated growth, reflecting the formation of community network effects.

Fig. 14
Fig. 14
Full size image

Experimental progress and data accumulation.

Data collection adopts a multi-source data fusion strategy, ensuring the comprehensiveness and depth of the research. Platform-automatically recorded behavioral data covers every user operation, including login time, browsing path, dwell time, click behavior, creation process, saved versions, and other fine-grained information. In terms of user-generated content, the platform cumulatively collected 12,847 Jingchu folk pattern works, including 3,562 original works (27.7%), 5,831 adapted works (45.4%), and 3,454 collaborative works (26.9%). This distribution of work types reflects users’ growth path from imitative learning to independent creation to collaborative innovation.

Multidimensional evaluation of generation quality

The quality evaluation of generated patterns adopts a comprehensive evaluation system combining technical indicators with cultural indicators, comprehensively measuring model performance through the complementarity of quantitative analysis and qualitative evaluation. To ensure fair and meaningful comparison across different generation approaches, we implemented rigorous baseline control procedures and expanded the comparison to include additional relevant models.

Table 13 details the baseline model configurations and fairness control measures implemented in our comparative evaluation. The original Stable Diffusion model was fine-tuned on the identical Jingchu pattern dataset using the same training-validation-test split as CA-SD, with matched training epochs (100) and comparable computational resources. StyleGAN2 was trained from scratch on the same dataset following the original paper’s recommended configuration, as this architecture requires end-to-end training rather than fine-tuning. For DALL-E 2, direct fine-tuning was not possible due to the closed-source nature of the model; therefore, we conducted zero-shot generation using the official API with carefully designed prompts. To provide more relevant baselines within the diffusion model family, we additionally implemented SD + LoRA (Low-Rank Adaptation) and SD + ControlNet configurations, both fine-tuned on our dataset using established protocols.

Table 13 Baseline model configurations and fairness control measures.

Prompt engineering was strictly controlled across all diffusion-based models to ensure fair comparison. All models used identical prompt templates: “[pattern_type] pattern in traditional Jingchu folk style, featuring [element], [color_scheme], high quality, detailed, traditional Chinese art”. Negative prompts were also standardized: “blurry, low quality, modern style, western elements, distorted, deformed, ugly, duplicate”. The CFG (Classifier-Free Guidance) scale was set to 7.5 for all applicable models. For each pattern category, we generated 100 samples per model using identical random seeds to enable paired comparison.

Figure 15 shows the comparative analysis results of six models across eight evaluation dimensions. The radar chart clearly presents the advantages of the CA-SD model proposed in this research (red solid line) relative to the five baseline models. The CA-SD model achieves a high score of 0.89 in the cultural authenticity dimension, an improvement of 23.6% over the original SD model (0.72), mainly due to the introduction of the cultural-aware attention mechanism. Notably, the SD + LoRA configuration achieved 0.82 cultural authenticity, demonstrating that domain adaptation improves performance, while our full CA-SD architecture provides additional gains through the specialized cultural attention mechanism. SD + ControlNet achieved 0.79 on cultural authenticity, suggesting that structural control alone is insufficient for capturing cultural semantics. In terms of detail fidelity, the CA-SD model scores 0.86, second only to StyleGAN2 (0.88) which specifically optimizes detail generation, but CA-SD has obvious advantages in generation speed. Style consistency is another highlight of the CA-SD model, with a score of 0.91 indicating that the model can stably generate patterns conforming to Jingchu cultural characteristics. It’s worth noting that although DALL-E 2 achieves the highest score in the innovation dimension (0.85), its cultural authenticity is relatively low (0.68). This trade-off relationship reveals the limitations of general-purpose generation models in specific cultural domain applications.

Fig. 15
Fig. 15
Full size image

Comparative analysis of six models across eight evaluation dimensions.

Table 14 presents the complete quantitative comparison results with statistical significance testing. For each metric, we report mean values with standard deviations across all test samples, and conduct pairwise t-tests between CA-SD and each baseline model. All improvements of CA-SD over baseline models on cultural authenticity and style consistency are statistically significant (p < 0.001 after Bonferroni correction for multiple comparisons). The FID improvement over Original SD (28.7 vs. 42.3) represents a 32.2% reduction, while the improvement over SD + LoRA (28.7 vs. 31.5) represents a more modest but still significant 8.9% reduction, demonstrating the incremental value of our cultural-aware architecture beyond standard adaptation techniques.

Table 14 Complete quantitative comparison results with statistical significance.

It is important to acknowledge limitations in our baseline comparisons. The DALL-E 2 comparison is inherently constrained by the closed-source nature of the model, preventing fine-tuning on our domain-specific dataset. Therefore, the DALL-E 2 results primarily demonstrate the importance of domain adaptation rather than providing a fair architectural comparison. Future work should include comparisons with other open-source large-scale models such as SDXL and Stable Diffusion 3 when cultural fine-tuning becomes feasible.

Table 15 details the generation quality indicators for different types of patterns. Lower FID (Fréchet Inception Distance) scores indicate better generation quality. The data shows that cloud pattern generation has the most stable quality, with an FID mean of only 23.4 and standard deviation of 3.2, possibly related to the regularity of cloud patterns and the model’s good learning ability for curve features. In contrast, the generation difficulty of composite patterns significantly increases, with an FID mean reaching 38.7 and standard deviation of 8.6, reflecting the challenges the model faces when processing multi-element combinations and spatial relationships. Expert scores show some negative correlation with FID scores (r=-0.76), but not complete consistency. Particularly for fusion innovation patterns, although the FID score is relatively high (35.2), the expert score reaches 8.1, indicating differences between technical indicators and cultural value assessment that need comprehensive consideration.

Table 15 Generation quality evaluation results for different pattern types.

The evaluation of cultural authenticity invited a review panel of 15 folklore experts to conduct blind reviews of 500 randomly selected generated works. Figure 16 shows the distribution of expert scores across different evaluation dimensions through box plots. Cultural element accuracy scores the highest and most stable, with a median of 8.8 and an interquartile range of only 0.7, indicating that the model has successfully learned the basic element characteristics of Jingchu culture. The scores for compositional logic rationality are more dispersed, with a median of 7.5 and some outliers (lowest score 4.2). These low-scoring samples mainly involve inappropriate combinations of elements from different historical periods or usage scenarios. The two dimensions of overall style coordination and innovative performance show complementary relationships. Traditional style works score higher in coordination (median 8.2), while fusion innovation works excel in innovation (median 8.0). Cultural connotation communication, as the most subjective dimension, has the most dispersed score distribution but good overall level (median 7.6), indicating that the generated patterns not only approach tradition in form but also have certain cultural depth in meaning expression.

Fig. 16
Fig. 16
Full size image

Box plot of expert score distribution.

User satisfaction surveys collected 2,847 valid responses through an online questionnaire system, covering 58.6% of active users. Satisfaction data shows obvious differences among user groups. Although the professional designer group has strict requirements for aesthetics (average score 3.8, lower than the overall average of 4.2), they highly praise the platform’s innovation (4.6). The cultural inheritor group is most concerned with cultural expression, with their score (4.5) significantly higher than other groups, validating the model’s success in cultural feature learning. Ordinary users value ease of use and entertainment more, have higher tolerance for generation effects, with overall satisfaction reaching 4.3.

User participation behavior analysis

The analysis of user participation behavior reveals the complex patterns and evolution rules of user interaction in digital cultural heritage platforms. Through in-depth mining of behavioral data during the six-month experimental period, the research identified four typical participation patterns: exploratory, creative, social, and hybrid.

Figure 17 intuitively shows the dynamic evolution process of user behavior patterns through a Sankey diagram. The figure clearly shows that in the initial stage, 67% of users (325 people) exhibited exploratory behavior, mainly conducting passive activities such as browsing and collecting. As familiarity with the platform increased and skills improved, user behavior patterns underwent significant divergence: 42% of exploratory users (137 people) converted to creative type, beginning to attempt independent creation; 28% (91 people) converted to social type, actively participating in comments and collaboration; the remaining 30% maintained exploratory type or churned. In the later experimental period, the proportion of hybrid users increased significantly to 31%. These users simultaneously exhibited dual characteristics of creation and social interaction, becoming the core force of the community ecosystem. The evolution paths of behavior patterns show obvious stage characteristics, with most users needing to experience 2–3 weeks of exploration before determining their participation direction.

Fig. 17
Fig. 17
Full size image

Sankey diagram of user behavior pattern evolution.

Time distribution analysis of creative behavior revealed interesting regularities. Table 16 details the characteristics of creative activities at different time periods, showing obvious time preferences for creative activities. The evening period (19:00–22:00) is the creative peak, contributing 43.2% of work output, and the quality score of works during this period (average 7.8) is 12.3% higher than daytime creation. This phenomenon may be related to creators being more likely to be inspired in a relaxed state, and also reflects that most users treat cultural creation as a hobby. Weekend creation volume is 1.8 times that of weekdays, but interestingly, works created on weekdays score higher in innovation (8.2 vs. 7.6), possibly because time pressure stimulates creative efficiency and innovative thinking. Although late-night periods (22:00–02:00) have less creation volume, they produce some of the most experimental works. While these works may deviate from tradition, they provide valuable exploration for cultural innovation.

Table 16 Analysis of user creative behavior characteristics at different time periods.

Analysis of social interaction behavior revealed network effects of knowledge dissemination and cultural learning. Figure 18 uses a force-directed network diagram to show the interaction relationship structure in the community. Node size represents users’ influence index, line thickness represents interaction frequency, and color depth reflects interaction quality (based on content cultural relevance scores). Network analysis identified 23 core nodes (opinion leaders). Although these users account for only 4.7% of the total, their published content received 38.6% of interactions. The network shows obvious small-world characteristics, with an average path length of only 3.2 and a clustering coefficient of 0.42, indicating that information can spread quickly while maintaining local tight connections. Particularly noteworthy is that cultural inheritor nodes (red in the figure), although few in number, often occupy bridging positions in the network, connecting different user groups and playing key roles in cultural knowledge dissemination.

Fig. 18
Fig. 18
Full size image

Social interaction network diagram.

User retention and activity analysis shows the platform’s stickiness characteristics. Figure 19 shows multidimensional analysis of user retention rate and activity through a composite chart. The retention rate curve in the upper part shows a 30-day retention rate of 42.7% and a 90-day retention rate of 28.3%, higher than the average level of general creative communities (35% and 20% respectively). More importantly, the retention curve tends to flatten after day 60, forming a stable core user group. The stacked area chart in the lower part shows the active time allocation of users with different participation depths. Deep participation users (weekly active time > 10 h) spend 41% of their time on creation, 38% on browsing and learning, and 21% on social interaction. This relatively balanced time allocation reflects good integration of platform functions. Moderate participation users are more inclined to browse (52%) and socialize (28%), while light users mainly browse (76%).

Fig. 19
Fig. 19
Full size image

User retention rate and activity analysis.

To move beyond descriptive statistics and establish causal relationships between participation mechanisms and user outcomes, we conducted comprehensive statistical analyses including analysis of variance, multiple regression, longitudinal panel analysis, and propensity score matching. These analyses address the concern that observed correlations between participation depth and outcomes might be driven by confounding factors or self-selection effects.

First, we performed one-way analysis of variance (ANOVA) to test whether users at different participation depth levels showed significantly different 90-day retention rates. Participants were categorized into three groups: light users (weekly active time < 3 h, N = 178), moderate users (3–10 h, N = 203), and deep users (> 10 h, N = 105). The ANOVA revealed significant differences among groups, F(2, 483) = 45.67, p < 0.001, η² = 0.16. Post-hoc Tukey HSD tests showed that deep users had significantly higher 90-day retention (67.8%) than moderate users (38.2%, p < 0.001, d = 0.89) and light users (18.5%, p < 0.001, d = 1.42). Moderate users also had significantly higher retention than light users (p < 0.001, d = 0.68). The large effect size (η² = 0.16) indicates that participation depth explains a substantial portion of variance in retention outcomes.

Table 17 presents the results of hierarchical multiple regression analysis predicting 90-day retention rate. In Model 1, we entered demographic control variables (age, gender, education level, professional background). In Model 2, we added the three key predictors from the CPDC framework: participation depth, cultural cognitive level, and social connection strength. Model 1 explained only 8.2% of variance (R² = 0.082), while Model 2 explained 38.4% (R² = 0.384), representing a significant improvement (ΔR² = 0.302, F change = 78.45, p < 0.001). Participation depth emerged as the strongest predictor (β = 0.42, p < 0.001), followed by cultural cognitive level (β = 0.23, p < 0.001) and social connection strength (β = 0.18, p < 0.001). Among control variables, age showed a small negative effect (β = -0.08, p = 0.008), while education level had a modest positive effect (β = 0.06, p = 0.046).

Table 17 Hierarchical multiple regression analysis predicting 90-day retention rate.

To strengthen causal inference, we conducted longitudinal panel analysis using fixed effects models that control for time-invariant individual characteristics. Monthly measurements of participation depth, creation quality, and cultural identity were collected for all 486 participants across the six-month period, yielding 2,916 person-month observations. The fixed effects model specification was:

$${Y_{it}}={\alpha _i}+{\beta _1}ParticipationDept{h_{it}}+{\beta _2}Tim{e_t}+{\epsilon _{it}}\quad$$
(13)

where \({Y_{it}}\) is the outcome for individual i at time t, \({\alpha _i}\) captures all time-invariant individual characteristics, and \(Tim{e_t}\) controls for temporal trends. Table 18 presents the fixed effects panel regression results. After controlling for individual fixed effects, within-person changes in participation depth significantly predicted within-person changes in creation quality (β = 0.35, SE = 0.04, t = 8.75, p < 0.001). This result rules out the possibility that the cross-sectional correlation between participation and quality is entirely driven by stable individual differences (e.g., innate talent or pre-existing cultural knowledge). The Hausman test confirmed that fixed effects specification was appropriate compared to random effects (χ² = 34.56, p < 0.001).

Table 18 Fixed effects panel regression results (N = 486 individuals, 2916 observations).

Finally, to address potential self-selection bias—the possibility that users who choose to engage deeply differ systematically from those who do not—we employed propensity score matching (PSM). We estimated propensity scores for high engagement (weekly active time > 10 h) based on baseline characteristics measured during the first week of participation: initial cultural knowledge score, age, education level, professional background, prior digital platform experience, and initial technology comfort level. Using nearest-neighbor matching with caliper of 0.1 standard deviations, we matched 98 high-engagement users with 98 low-engagement users (< 3 h weekly). Table 19 shows that after matching, the two groups were well-balanced on all baseline covariates (all standardized mean differences < 0.10, all p > 0.10), indicating successful matching.

Table 19 Propensity score matching balance check.

On the matched sample, high-engagement users showed significantly higher 90-day retention (68.4% vs. 21.4%, χ² = 43.67, p < 0.001) and creation quality (7.82 vs. 6.45, t = 6.89, p < 0.001). The effect size for creation quality (Cohen’s d = 0.72, 95% CI: 0.58–0.86) represents a large effect according to conventional benchmarks. These results from the matched sample strengthen the causal interpretation that participation depth directly influences outcomes, rather than being merely correlated through confounding individual characteristics.

Participation depth shows strong positive correlation with retention rate (r = 0.82), with deep participation users having a 90-day retention rate as high as 67.8%, emphasizing the importance of cultivating deep user participation. The convergent evidence from ANOVA, regression, panel analysis, and PSM collectively supports the causal claim that participatory mechanisms effectively promote user engagement and creative outcomes, providing robust empirical validation for the CPDC theoretical framework.

Quantitative analysis of cultural diversity impact

The quantitative evaluation of cultural diversity adopts a method combining improved Shannon-Wiener index with cultural distance measurement, assessing the cultural impact of co-creation mechanisms by analyzing diversity changes in platform-generated content.

To address concerns about ambiguous metric definitions, we provide a rigorous mathematical formulation of the cultural diversity index and its component measures. The cultural diversity index employed in this study is an improved Shannon-Wiener index adapted for cultural content analysis:

$$H^{\prime}= - \sum\limits_{{i=1}}^{n} {({p_i} \times \ln ({p_i}))} \times (1+{D_C})\quad$$
(14)

where \({p_i}\) represents the proportion of the i-th cultural element category in the analyzed sample, n is the total number of cultural element categories (n = 12 in our classification scheme), and \({D_C}\) is the element combination diversity coefficient. The combination diversity coefficient is calculated as:

$${D_C}=\sqrt {\frac{{{C_{actual}}}}{{{C_{max}}}}} \quad$$
(15)

where \({C_{actual}}\) is the number of unique element combinations observed in the sample, and \({C_{max}}\) is the theoretical maximum number of combinations given the element categories present. This formulation extends the classic Shannon-Wiener index by incorporating not only element frequency distribution but also the richness of combinatorial creativity.

Table 20 details the cultural element classification scheme and the specific calculation procedure. We categorized all cultural elements into 12 primary classes based on traditional Jingchu pattern taxonomy established by folklore experts. For each monthly sample, we counted the frequency of each element category and computed proportions. The actual combination count was determined by identifying unique pairs or triplets of elements co-occurring within single works. The theoretical maximum was calculated based on the number of active categories in that month.

Table 20 Cultural element classification and diversity index calculation procedure.

The diversity index interpretation framework was established through calibration with reference datasets. We computed H’ values for three reference conditions: (1) a uniform distribution where all 12 categories have equal frequency (H’_max = 4.17), (2) a highly concentrated distribution where one category dominates 90% of content (H’_min = 0.78), and (3) the empirical distribution in traditional museum collections (H’_reference = 2.85). Based on this calibration, we established the following interpretation thresholds: H’ < 2.0 indicates low diversity (cultural expression dominated by few elements); 2.0 ≤ H’ < 3.0 indicates moderate diversity (healthy but limited variation); 3.0 ≤ H’ < 4.0 indicates high diversity (rich cultural expression); H’ ≥ 4.0 indicates very high diversity (approaching maximum possible variation). The threshold of 3.0 as “high diversity” was validated by expert consensus that samples in this range demonstrated satisfactory representation of Jingchu cultural breadth.

Table 21 presents the monthly evolution of diversity index components throughout the experimental period, demonstrating the robustness and interpretability of our measurement approach.

Table 21 Monthly evolution of cultural diversity index components.

Figure 20 shows the dynamic changes of the cultural diversity index during the experimental period and decomposition of influencing factors. The main curve shows that the diversity index continuously rises from an initial 2.31, reaches an inflection point in the third month (H’=3.24), then growth slows, finally stabilizing around 3.67. This S-shaped growth curve conforms to expectations of innovation diffusion theory: in the initial period, users mainly imitate traditional patterns, with slow diversity growth; in the middle period, as user skills improve and innovation consciousness awakens, diversity increases rapidly; in the later period, due to constraints of cultural norms and convergence of community aesthetics, diversity growth tends to saturate. The contribution decomposition chart below shows that new element introduction contributes 42% of diversity growth, element recombination contributes 35%, and style innovation contributes 23%. This decomposition reveals the main paths of cultural innovation: users first try to introduce new visual elements, then explore combination methods of different elements, and finally innovate overall style.

Fig. 20
Fig. 20
Full size image

Dynamic changes in cultural diversity index.

To validate the robustness of our diversity measurements, we conducted several sensitivity analyses. First, we varied the number of element categories from 8 to 16 and found that while absolute H’ values changed, the temporal pattern and relative rankings remained consistent (Spearman ρ > 0.95 across all configurations). Second, we computed diversity indices using alternative formulations including Simpson’s index and Berger-Parker index, finding strong convergent validity (correlations with H’ > 0.88). Third, we bootstrapped 95% confidence intervals (1,000 resamples) for each monthly H’ estimate, confirming that the increasing trend was statistically robust (non-overlapping CIs between months 1–2 and months 5–6).

Analysis of cultural element usage frequency reveals users’ preferences for different cultural symbols and innovation tendencies. Table 22 statistics the usage of TOP20 cultural elements and their change trends. Traditional core elements maintain stable high usage rates, with cloud patterns topping the list at 18.7% usage rate, decreasing only 1.2% points in six months, showing its status as a representative element of Jingchu culture. Worth noting is that some originally marginalized elements gained new life on the platform. Chu script usage increased from 0.8% to 3.4% (325% increase), bronze vessel patterns from 1.2% to 4.1% (242% increase). This “long tail effect” not only enriches the diversity of cultural expression but also provides new approaches for protecting endangered cultural elements. The moderate introduction of modern design elements (from 0 to 2.8%) indicates that while maintaining cultural authenticity, the platform also leaves room for innovation.

Table 22 Analysis of cultural element usage frequency changes (TOP20).

Innovation pattern analysis identified four main cultural innovation paths. Figure 21 shows the characteristics and evolutionary relationships of different innovation modes through an innovation path analysis diagram. Element recombination mode accounts for 38.2%, the most common innovation method, where users create novelty by arranging different traditional elements in new spatial arrangements. Style fusion mode (26.7%) produces unique effects by mixing artistic styles from different periods or regions. Modern translation mode (20.4%) reinterprets traditional patterns using modern design language. Concept extension mode (14.7%) is the most challenging, achieving innovation by extracting cultural concepts and giving them new visual forms. The arrow thickness in the path diagram represents conversion frequency between modes. Data shows most users start with element recombination and gradually try more complex innovation modes. Different innovation modes have significantly different acceptance rates, with element recombination having a community acceptance rate as high as 78.3%, while concept extension is only 45.2%, reflecting the balance between conservatism and radicalism in cultural innovation.

The evaluation of cultural dissemination effects is conducted by tracking changes in users’ cultural cognition and dissemination behavior. Pre- and post-experiment cultural knowledge tests show that participating users’ Jingchu cultural knowledge scores improved by an average of 34.7%, with implicit knowledge gained through platform learning accounting for a large proportion. The comparative radar chart in Fig. 21 clearly shows the multidimensional improvement in users’ cultural cognition: cultural historical knowledge improved from 5.8 to 7.2 (24.1% improvement); pattern recognition ability from 6.2 to 8.1 (30.6% improvement); creative skills from 4.5 to 7.3 (62.2% improvement); aesthetic judgment from 5.9 to 7.6 (28.8% improvement); cultural confidence from 6.1 to 8.4 (37.7% improvement). The significant improvement in creative skills indicates that practical participation is an effective way of cultural learning, while the notable enhancement of cultural confidence shows that the platform not only disseminated knowledge but more importantly changed users’ attitudes toward traditional culture.

Fig. 21
Fig. 21
Full size image

Innovation paths and cultural cognition analysis.

Social media dissemination analysis found that 27.3% of users actively shared their own creations or others’ works on platforms like WeChat and Weibo, cumulatively generating over 150,000 secondary disseminations. The young group (18–25 years old) has the highest sharing rate (41.2%) and tends to engage in secondary creation and topic discussion on social media. This spontaneous dissemination behavior greatly expands the influence range of Jingchu culture. Analysis of disseminated content shows that works gaining high dissemination volume often have three characteristics: strong visual impact, rich cultural storytelling, and ease of understanding and imitation. These findings provide important insights for future cultural dissemination strategies.

Conclusion

This research addressed three interrelated research questions concerning the integration of AI generation technology with participatory design for cultural heritage preservation. RQ1 examined how to design a culturally-aware AI generation architecture that enhances cultural authenticity while maintaining technical quality. RQ2 investigated what user participation mechanisms effectively promote digital co-creation and sustained engagement. RQ3 explored how community participatory platforms influence cultural diversity and users’ cultural cognitive development. Through a six-month longitudinal empirical study with 486 participants, this research provides systematic answers to these questions with differentiated levels of empirical validation.

Regarding RQ1, the proposed CA-SD architecture achieved substantial performance improvements through its cultural-aware attention mechanism and progressive training strategy. The CA-SD model attained 89% cultural authenticity scores while maintaining competitive generation quality (average FID 28.7), with particularly strong results in cloud pattern generation (FID 23.4 ± 3.2) that surpassed baseline models including Original SD, SD + LoRA, and SD + ControlNet. Comprehensive ablation studies isolated the contribution of each architectural component: the cultural-aware attention mechanism contributed the largest improvement (FID reduction of 15.8%), the multi-scale feature fusion module provided additional gains, and the cultural consistency loss further enhanced cultural fidelity. These technical contributions are fully empirically validated through rigorous comparative experiments with statistical significance testing.

Regarding RQ2, the CPDC (Cultural Participatory Design Cycle) theoretical framework identified cultural cognitive level (β = 0.42), technology acceptance (β = 0.38), and social connection strength (β = 0.31) as critical determinants of user engagement depth. Structural equation modeling confirmed good model fit (CFI = 0.94, TLI = 0.92, RMSEA = 0.058), and longitudinal panel analysis with fixed effects demonstrated that within-person changes in participation depth significantly predicted creation quality improvements (β = 0.35, p < 0.001). Collaborative creators demonstrated 67.8% 90-day retention rates compared to 31.2% for individual creators, with propensity score matching confirming large effect sizes (Cohen’s d = 0.72) after controlling for self-selection bias. These mechanism-related findings are empirically validated through comprehensive statistical analyses including ANOVA, hierarchical regression, and causal inference techniques.

Regarding RQ3, the platform successfully generated 12,847 high-quality Jingchu folk pattern works, with the cultural diversity index (improved Shannon-Wiener index) increasing from 2.31 to 3.67 over the experimental period—a 58.9% improvement representing transition from moderate to high diversity according to our calibrated interpretation framework. The diversity growth followed an S-shaped curve consistent with innovation diffusion theory, with new element introduction contributing 42%, element recombination 35%, and style innovation 23% to overall growth. Users’ cultural knowledge scores improved by 34.7% through platform engagement, and 27.3% of participants actively disseminated their creations through social media channels, generating over 150,000 secondary propagations. These cultural impact findings are validated through pre-post cognitive testing, longitudinal diversity tracking, and behavioral log analysis.

The blockchain-based copyright system utilizing improved Shapley value algorithms for fair rights distribution represents a conceptual contribution with preliminary validation rather than a fully empirically validated component. Prototype testing with 86 participants demonstrated technical feasibility (92.3% registration completion rate, average confirmation time 2.3 min) and acceptable user satisfaction (4.2/5 for transparency). However, comprehensive validation including real commercial transactions, long-term economic sustainability, and large-scale adoption remains as priority future work.

This research makes several contributions to the fields of digital cultural heritage and AI-enabled creativity. First, it establishes a technical paradigm for culturally-aware generative AI that explicitly encodes domain knowledge into model architecture, demonstrating that general-purpose generation models can be systematically adapted for cultural heritage applications through specialized attention mechanisms and loss functions. Second, it provides a validated theoretical framework (CPDC) for understanding technology-mediated cultural co-creation, offering analytical tools for designing participatory systems that balance accessibility with quality, tradition with innovation, and individual expression with collective identity. Third, it demonstrates methodological approaches for quantitatively assessing cultural diversity in AI-generated content, including rigorous metric definition, calibration procedures, and sensitivity analysis protocols.

Several limitations of this research should be acknowledged, and these constrain the scope of claims that can be made based on the empirical findings. First, participants were geographically concentrated in Wuhan and surrounding areas of Hubei Province, and results may be influenced by local cultural context, familiarity with Jingchu traditions, and regional digital literacy levels. The specific empirical findings—including user behavior patterns, diversity evolution curves, retention rates, and cultural knowledge improvement metrics—should not be directly generalized to other regional cultures without replication studies. Second, the elderly inheritor population (age 56+) represented only 4.9% of participants, and this group reported higher technical barriers to platform usage. The observed participation patterns may therefore underrepresent the perspectives and creative contributions of senior cultural practitioners who possess the deepest traditional knowledge. Third, the six-month experimental period, while substantial for an empirical study of this scope, may be insufficient to observe long-term cultural ecosystem dynamics, including potential plateau effects in diversity growth, evolution of community governance structures, and sustainability of user engagement beyond initial novelty effects. Fourth, the blockchain copyright system was validated only at prototype scale, and its effectiveness in real commercial environments with economic stakes remains untested.

It is essential to distinguish between the transferability of the proposed frameworks and the generalizability of specific empirical results. The CA-SD technical architecture—including the cultural-aware attention mechanism, multi-scale feature fusion, and cultural consistency loss—represents a methodological approach that can be adapted to other cultural domains by replacing the cultural feature extraction pipeline and retraining on domain-specific datasets. The CPDC theoretical framework provides generalizable constructs (cultural cognition, technology acceptance, social connection) and analytical approaches that should apply across participatory cultural heritage contexts. The diversity assessment methodology, including the improved Shannon-Wiener index formulation and calibration procedures, offers transferable tools for evaluating cultural content in other domains. However, the specific parameter values, threshold settings, behavioral patterns, and outcome metrics reported in this study are derived from the Jingchu cultural context and the particular characteristics of our participant sample. Future research should conduct replication studies in different cultural contexts—such as other Chinese regional cultures (e.g., Cantonese, Hui, Tibetan), East Asian artistic traditions (e.g., Japanese ukiyo-e, Korean minhwa), or entirely different cultural heritage domains (e.g., African textile patterns, Celtic knotwork, Islamic geometric art)—to establish the boundary conditions and necessary adaptations for cross-cultural application.

Future research directions emerge from both the limitations identified and the opportunities revealed by this study. At the technical level, extending CA-SD to multimodal generation encompassing music and dance traditions would enable more comprehensive digital cultural ecosystems, and investigating cross-cultural transfer learning could reduce the data requirements for adapting the system to new cultural domains. At the mechanism level, comparative studies across different cultural contexts would test the generalizability of CPDC constructs and identify culture-specific moderators of participation behavior. At the application level, investigating business models that balance commercial viability with public cultural welfare would address sustainability concerns, and developing accessibility features for elderly and low-digital-literacy populations would broaden participation. The blockchain copyright system requires comprehensive validation through pilot commercial deployments, user studies of economic incentive effects, and analysis of intellectual property disputes in collaborative creation contexts. Finally, decade-scale longitudinal studies would reveal the long-term impact of AI-mediated participatory platforms on cultural evolution, addressing fundamental questions about how digital tools shape the trajectory of living cultural traditions.