Fine-tuning diffusion model to generate new kite designs for the revitalization and innovation of intangible cultural heritage

Zhou, Yaqin; Liu, Yu; Shao, Yuxin; Chen, Junming

doi:10.1038/s41598-025-92225-z

Download PDF

Article
Open access
Published: 04 March 2025

Fine-tuning diffusion model to generate new kite designs for the revitalization and innovation of intangible cultural heritage

Yaqin Zhou¹,
Yu Liu²,
Yuxin Shao² &
…
Junming Chen^3,4

Scientific Reports volume 15, Article number: 7519 (2025) Cite this article

3076 Accesses
3 Citations
Metrics details

Subjects

Abstract

Traditional kite creation often relies on the hand-painting of experienced artisans, which limits the revitalization and innovation of this intangible cultural heritage. This study proposes using an AI-based diffusion model to learn kite design and generate new kite patterns, thereby promoting the revitalization and innovation of kite-making craftsmanship. Specifically, to address the lack of training data, this study collected ancient kite drawings and physical kites to create a Traditional Kite Style Patterns Dataset. The study then introduces a novel loss function that incorporates auspicious themes in style and motif composition, and fine-tunes the diffusion model using the newly created dataset. The trained model can produce batches of kite designs based on input text descriptions, incorporating specified auspicious themes, style patterns, and varied motif compositions, all of which are easily modifiable. Experiments demonstrate that the proposed AI-generated kite design can replace traditional hand-painted creation. This approach highlights a new application of AI technology in kite creation. Additionally, this new method can be applied to other areas of cultural heritage preservation. Offering a new technical pathway for the revitalization and innovation of intangible cultural heritage. It also opens new directions for future research in the integration of AI and cultural heritage.

The application of artificial intelligence-assisted technology in cultural and creative product design

Article Open access 28 December 2024

Database construction and remodeling method on traditional Yi nationality patterns of China with GAN model

Article Open access 17 May 2025

Development and evaluation of an augmented reality serious game to enhance 21st century skills in cultural tourism

Article Open access 18 April 2025

Introduction

Background and motivation

Kite-making craftsmanship is one of China’s precious intangible cultural heritages, with a history of over 2,000 years¹. This craftsmanship embodies traditional Chinese ideology, mythological beliefs, and cultural customs, making it an essential part of Chinese culture². The process of kite-making craftsmanship involves four steps: crafting the bamboo frame, affixing the paper, hand-painted creation, and stringing and flying the kite, as illustrated in Fig. 1. Among these steps, hand-painted creation is particularly significant³. Experienced artisans draw inspiration from historical images that depict mythological beliefs, creating varied styles and motifs that vividly represent auspicious themes from these myths^2,3, as detailed in Fig. 2. Through hand-painted creation, artisans produce kites with a variety of styles and motifs. Flying kites to pray is a key cultural custom in traditional Chinese festivals⁴. During these festivals, people participate in kite flying to express various auspicious themes, as shown in Fig. 3. Consequently, kites have become an important medium for transmitting mythological beliefs and traditional cultural customs across generations^2,4. However, kite-making craftsmanship is facing challenges of discontinued inheritance and a lack of innovation^5,6,7, putting this valuable intangible cultural heritage at risk of being lost.

Two main factors contribute to the risk of the kite-making craftsmanship being lost. Firstly, kite creation faces significant difficulties, and secondly, the cumbersome nature of the hand-painting. Regarding the difficulty of kite creation, young artisans typically require almost a year to master kite-making craftsmanship. Figure 4 showcases varied styles and motifs, which adds difficulties to kite-making^1,3. Consequently, many young artisans perceive kite-making as highly challenging , thereby increasing the risk of lost skill⁵. Additionally, the cumbersome nature of the hand-painting method is evident in the traditional manual drawing. Most kite artisans, prioritizing livelihood, are hesitant to invest time and effort in innovating styles and motifs through laborious hand-painting. Instead, they opt to replicate existing styles and motifs to boost kite production^6,7. However, this practice results in stagnation in skill innovation. Thus, addressing the core issues of discontinued inheritance and stopped innovation, i.e., difficulty in kite creation and laborious hand-painting, is crucial to mitigate the risk of kite-making craftsmanship being lost⁷.

Problem statement and objectives

To promote the revitalization and innovation of kite-making craftsmanship as an intangible cultural heritage, new methods for kite design and creation are necessary^6,7,8,9.

In recent years, diffusion models based on artificial intelligence (AI) have undergone rapid development^10,11,12 and have become mainstream generative models^13,14,15 due to their exceptional capabilities for image generation^13,16,17. These models learn from a vast amount of pairing data between text descriptions and images^18,19,20, enabling them to generate a wide range of high-quality and diverse images from input text descriptions^10,11,17. Using diffusion models facilitates the creation of relevant images and simplifies creation methods²¹, allowing creators to efficiently produce image designs. Currently, conventional diffusion models are trained on large datasets sourced from the internet²². These datasets lack specialized knowledge in specific technical fields and their corresponding annotations. Therefore, diffusion models fail to acquire the relevant expertise and are unable to produce specialized works for particular fields²³. Taking kite creation within intangible cultural heritage as an example, this field, as part of cultural heritage, carries unique artistic styles and profound cultural meanings. However, these cultural connotations are rarely available online. Discussions of these rich details are generally confined to professionals within specific fields. Therefore, models trained solely on internet data struggle to generate kites with specific cultural meanings or styles. This issue becomes especially evident when specific styles and motifs are required, thus leading to poor results generated. This significantly impacts the continuation of cultural connotations in kite creation^1,24. From the perspective of preserving cultural heritages, if AI cannot be accurately applied to activate and innovate kite creation, it will hinder the transmission and development of this heritage in the digital age, thus limiting the revival of ancient skills.

Therefore, it is crucial to fine-tune diffusion models to generate new kite designs, thereby better preserving and enhancing the diversity of styles and motifs²⁵. Innovative kite designs are crucial for revitalizing and innovating kite-making craftsmanship as an intangible cultural heritage^26,27. Furthermore, in earlier research, Ho et al. provided a rigorous mathematical derivation of diffusion models²⁸. Liu et al. utilized categories as loss functions and calculated the loss based on the alignment between text and image within the CLIP model, thus enabling the generation of images by category using diffusion models²⁹. Rombach et al. proposed training diffusion models in a low-dimensional latent space and significantly improved computational efficiency and image quality³⁰. These studies provide the theoretical and practical foundation for our research. Therefore, this study proposes a method to fine-tune a diffusion model. The aim is to promote the revitalization and innovation of kite-making craftsmanship. The fine-tuned kite diffusion model can generate kite design images in batches. These designs have specified auspicious themes and varied styles and motifs. This process replaces traditional hand-painted creations. It simplifies the creation process for artisans. Moreover, it effectively promotes the revitalization and innovation of kite-making craftsmanship^1,7,25.

Methodology overview

This study proposes a five-step method to address challenges in the creation aspect of kite-making craftsmanship. The first step involves the collection and processing of kite images. The research team spent eight months collecting kite images and performing electronic preprocessing. The second step is the creation of the kite dataset. The team organized 1,200 kite images into 80 categories. They manually annotated the professional information of the image content. This resulted in the creation of a dataset called the “Traditional Kite Style Patterns Dataset” (TKSPD-80). This dataset provides the data foundation for fine-tuning the diffusion model. The third step is designing a new loss function and using TKSPD-80 to fine-tune the model. By fine-tuning the conventional diffusion model, the researchers effectively trained a diffusion model named “Chinese Kite Diffusion Model of Intangible Cultural Heritage” (i.e., CKDM-ICH), providing a powerful tool for generating kite designs. The fourth step is the application of the model. Artisans can input text descriptions of kite designs into the CKDM-ICH model, which generates kite designs and allows for modifications based on text descriptions. This AI-generated kite design method replaces the traditional hand-painted creation used by artisans. The final step involves artisans selecting different styles and motifs from the batch-generated kite designs and choosing those that match the shape of the kite paper. This research method provides effective technical support for kite creation. It meets the need for innovative and varied styles and motifs. Additionally, it opens new possibilities for the revitalization and innovation of kite-making craftsmanship as an intangible cultural heritage. The framework of this study is illustrated in Fig. 5.

Figure 6 compares mainstream diffusion models with our proposed CKDM-ICH model for generative design. Current mainstream models are unable to generate kite designs that meet the requirements of traditional kite-making craftsmanship. DALL.E3³¹ (far left) can generate an illustration featuring a bat motif but cannot create kite designs with specified styles. Midjourney³² (second from left) produces an image with a generally symmetrical style, but it does not align with the specified auspicious theme. Stable Diffusion³⁰ (third from left) generates images with errors that are unsuitable for kite design. These generated images fail to address the challenges in kite creation and traditional hand-painting methods. In contrast, the CKDM-ICH model proposed in this study (far right) effectively resolves these issues.

The CKDM-ICH model proposed in this study enables generative design. It can batch-generate kite design proposals. These proposals feature specified auspicious themes and diverse styles and motifs. Trained on a dataset comprising patterns of 10 different kite styles and 8 different auspicious themes, CKDM-ICH can produce 80 different categories of kite designs through various combinations of these variables. This method ensures the generation of kite designs with specified auspicious themes and diverse styles and motifs. Figure 7 showcases a selection of kite styles generated by the CKDM-ICH model, encompassing 25 kite images derived from 5 styles and 5 auspicious themes.

Main contribution

The primary contributions of this study are outlined as follows:

1.
Proposing a method for revitalizing and innovating intangible cultural heritage.
2.
Fine-tuning the diffusion model to generate kite designs with specified auspicious themes and patterns.
3.
Introducing a generative design approach to replace hand-painted creation and painting.
4.
Establishing a Traditional Kite Style Patterns Dataset containing auspicious themes.
5.
Demonstrating the advantage of this study through comparison with other mainstream diffusion models.

Related work

Challenges in revitalization and innovation of intangible cultural heritage

Kite-making craftsmanship, as a form of intangible cultural heritage⁵, involves four steps: crafting the bamboo frame, affixing the paper, hand-painted creation, and stringing and flying. Hand-painted creation is a crucial step in kite-making craftsmanship³. Specifically, hand-painted creation refers to the traditional method of manual drawing using a Chinese brush, while creation involves the innovative design of diverse styles and motifs. Traditional hand-painted creation in kite making requires artisans to accumulate creative content, patiently adhere to traditional hand-painting techniques, and possess innovative design capabilities. Creative content refers to the styles and motifs of kites used to express auspicious themes. Hand-painting involves manual drawing, which is used in the workflow of sketching, line drawing, color rendering, and modifying color drawings of kites. Innovative design capabilities involve diversified, innovative designs for traditional kite styles and motif composition⁴. For example, in hand-painted creation with prosperity themes, artisans need to accumulate traditional motif composition of peonies, butterflies, and cats, as well as traditional plump swallow styles suitable for this theme⁸. Artisans must also use hand-painting methods. This process is painstaking and helps diversify the composition of the three motifs. It is applied across the various styles of the plump swallow. Therefore, traditional hand-painted creation is one of the most challenging aspects of kite-making craftsmanship, and ensuring the diversity of kite patterns is crucial for the protection and inheritance of intangible cultural heritage^5,6,7.

However, among the challenges faced in the inheritance of these skills, the root cause of the difficulty in hand-painted creation of kites lies in the artisan’s lack of creative content and the laborious nature of hand-painting^1,3. In traditional kite creation, artisans typically rely on accumulated creative content over the years, a resource that is often lacking among younger artisans. Consequently, they perceive kite creation as exceptionally challenging. Furthermore, in traditional hand painted creation, the entire creation process relies on manual drawing. It usually takes an older artisan 1-2 days to hand-paint a single style and motif, and if modifications are necessary, they must repeat the laborious process for another 1-2 days. Artisans commonly find traditional hand-painting to be excessively intricate, considering both time-consuming and physically demanding in hand-painted creation.

The critical challenge in kite-making lies in whether artisans possess a sustained, high-level capacity for innovative design^7,8. Older artisans, with their adept, innovative prowess, can swiftly produce kites adorned with sought-after styles and motifs, thereby securing higher personal earnings. However, younger artisans often struggle to attain such proficiency in innovative design in the short term, making it arduous for them to earn significant income, thus leading to a decline in their pursuit of the craft. Furthermore, artisans must continually innovate their designs. This ensures the diversity of styles and motifs. In doing so, they help revitalize and innovate kite-making craftsmanship as an intangible cultural heritage. Nevertheless, with the passing of older artisans and the scarcity of younger ones, this scenario has resulted in a lack of innovative design in kite-making craftsmanship, causing stagnation in this intangible cultural heritage.

Due to the challenges associated with hand-painted creation by artisans and the insufficiency in innovative design, the present kite-making craftsmanship face obstacles in both inheritance and innovation. We aspire to develop novel technological methods^7,8,21,24 to assist artisans in comprehending creative content more easily, streamlining the creative process, and augmenting their sustained capacity for innovative design. This is imperative for the revitalization and innovation of the intangible cultural heritage of kite-making craftsmanship.

Diffusion model fine-tuning

Diffusion models for image generation have recently gained widespread attention^11,12,33,34. Researchers have consistently improved the performance of diffusion models. As a result, they have become the new mainstream in generative models. These models also demonstrate exceptional capabilities for image generation^15,34,35,36. The diffusion module gradually transforms the original image into a noisy image, while the denoising module restores the image to its original state^37,38. This process learns from a large dataset, enabling the denoising module to effectively remove noise and generate high-quality images²⁹. Diffusion models can incorporate the guidance of text^23,39 during the image generation to produce images with specific features⁴⁰, making them easy and efficient to operate.

However, conventional diffusion models face challenges¹⁰, particularly due to the lack of corresponding image-text paired data for training in certain specific domains^23,39, resulting in poor performance of diffusion models. Therefore, improving diffusion models to enhance their performance and applicability is a necessary solution^10,13,16,18.

There are primarily two methods for improving diffusion models. The first common approach is to retrain the entire diffusion model, which requires a large amount of training data and computational resources^30,32,41, but this method may make it difficult to obtain sufficient training data in specific professional domains. Compared to retraining the entire model, the second method is fine-tuning the diffusion model to enhance its performance in specific domains, which is more flexible and efficient⁹. Obtaining a large amount of training data for kite creation may be difficult. Therefore, fine-tuning the diffusion model is a more feasible choice for improving its performance.

The four common methods for fine-tuning models are as follows. The first method is text reverse^36,41,42. It embeds new knowledge into the original model. This is done by providing appropriate embedding vectors for new training data, without changing the model weights. This method has a fast training speed, but since it simply adds new embedding vectors, the generated results often perform mediocrely. The second method is hypernetworks⁴³, which adds additional networks to the intermediate layers of the original diffusion model to influence the results of image generation. The third method, LoRA (Local Response Activation)⁴⁴, alters the image generation effect by applying influence to the diffusion model’s cross-attention layers. The fourth method is Dreambooth⁹, which adjusts the weights of all neural network layers, specifies specific descriptors for training images, and prevents overfitting of the training results by designing new loss functions^45,46. This method requires only a small amount of images and corresponding text descriptions for training, thereby improving the quality of image generation in specific domains^9,47. Using Dreambooth, as it fine-tunes the entire model, often yields optimal generation results.

This study uses fine-tuning techniques to improve diffusion models for kite creation. The goal is to generate kite designs that promote the revitalization and innovation of kite-making, an intangible cultural heritage.

Datasets

Open datasets play a critical role in the field of AI^10,23,48, providing essential support for its rapid development. Take, for instance, the Open Images Dataset⁴⁹, boasting millions of user-uploaded images meticulously annotated with labels, covering a broad range of subjects and objects. It serves as a rich training resource for researchers in computer vision and machine learning. Another example is the Visual Genome⁵⁰, which integrates images with textual descriptions, offering comprehensive information for tasks related to visual comprehension and image generation. The Conceptual Captions Dataset⁵¹ helps machines understand image content. It provides concise textual descriptions. This dataset is a valuable resource for research in image understanding and natural language processing. Leveraging these datasets drives progress in AI, laying the groundwork with datasets for various tasks.

Presently, there is a lack of traditional kite datasets annotated with auspicious themes and patterns, thus impeding the fine-tuning of diffusion models. This absence of specialized datasets impedes diffusion models in generating images from text^10,23, rendering conventional diffusion models incapable of producing kite designs embodying specified auspicious themes and a diverse range of patterns. Hence, it becomes imperative to develop and compile a new dataset of traditional styles and motifs annotated with auspicious themes.

Material and methods

In text-to-image generation, diffusion models have made significant strides in recent years. DALL.E3³¹, Midjourney³², Stable Diffusion³⁰, and Dreambooth⁹ have emerged as prominent models in this domain. They have showcased exceptional performance across different application scenarios. However, there is still room for improvement in the performance of diffusion models, especially in generating kite designs with specified auspicious themes and diverse styles and motifs. This is particularly crucial for the safeguarding, inheritance, revitalization, and innovation of intangible cultural heritage^1,7,25.

The generative kite design proposed in this study comprises five stages. The first stage involves collecting and processing images, which lays the groundwork for creating the dataset. In the second stage, a dataset is established to address any gaps in the existing dataset. The third stage involves designing a new loss function to learn the creative content of kites, encompassing various auspicious themes and their corresponding styles and motifs. The diffusion model is fine-tuned using the dataset and the new loss function to obtain a model capable of generating kite designs. In the fourth stage, the model is employed for the generation or modification of designs by inputting text descriptions, thus replacing the traditional hand-painted creation of artisans. Lastly, the fifth stage involves selecting different styles and motifs for kite designs.The flowchart of the design process using the CKDM-ICH method in Fig. 8.

Further clarification of the new method proposed in this study is provided below. The first stage involves collecting and processing images of traditional kites. First, our research team spent over eight months collecting more than 2,000 kite images from various sources. These included kite images recorded in the Chinese kite classic Nan Yao Bei Yuan Kao Gong Zhi, kites from the Hebei Chian Grand Canal Intangible Cultural Heritage Exhibition Hall and the Shandong World Kite Museum of Weifang, as well as kite works collected by intangible cultural heritage inheritors in regions such as Beijing, Tianjin, and Hebei. Various collection methods were employed to ensure the dataset’s diversity and comprehensiveness. The researchers then filtered the images based on quality, removing more than 800 low-quality images that were blurry, color-distorted, or too small. Finally, Adobe Photoshop was used to correct the images, isolate the kite images, modify the backgrounds to white, and resize the images to a uniform 512 $*$ 512 pixels. This resulted in the digital preprocessing of 1,200 high-quality traditional kite images. This stage laid the groundwork for creating the dataset.

The second stage involves creating the dataset. First, the research team invited five university professors from the field of cultural heritage studies in Beijing, Tianjin, and Hebei Province, along with five artisans with over ten years of experience in the craft, to contribute their expertise. With the knowledge of these professors and artisans, the team categorized the 1,200 kite images into 80 distinct categories, which included 10 styles and 8 auspicious themes. Each image was assigned to one style type and one theme, thus resulting in a total of 80 categories. Subsequently, the professors and artisans assisted the researchers in manually annotating the dataset, providing professional annotations for the auspicious themes and style patterns in each kite image. This process led to the creation of a dataset called the “Traditional Kite Style Patterns Dataset” (TKSPD-80), which was developed to address the lack of specialized training data in diffusion models.

The third stage encompasses the construction of the loss function and the fine-tuning of the diffusion model.

In the construction of the loss function, this study introduces an innovative composite loss function based on the traditional loss function (Eq. 1). This function adds auspicious themes and style patterns as extra loss terms to form the composite loss function (Eq. 2). With the inclusion of these elements, the diffusion model can better learn the auspicious themes and style patterns during training, thus reducing the loss. Through the training to minimize this loss, the model is able to generate kite designs with specified auspicious themes and style patterns.

The basic diffusion model is presented by Eq. 1:

$$\begin{aligned} L_{Z,h,\epsilon ,t}\left[ w_t\left| \left| \widehat{Z_\theta }(\alpha _tZ+\sigma _t\epsilon ,h)-Z\right| \right| _2^2\right] \end{aligned}$$

(1)

In Eq. 1, L represents the average loss, and the objective of model training is to decrease this value. A lower loss indicates higher quality in image generation. $\widehat{Z_\theta }$ is an iteratively-evolving diffusion model that continuously receives noise image vectors $\alpha _tZ+\sigma _t\epsilon$ and text vectors h, generating predicted images. These predicted images are compared with traditional kite images Z in terms of content, and the content discrepancy between them is treated as the loss to be optimized. Square loss is employed to measure the content discrepancy between the predicted image and the traditional kite image. $w_t$ is a weight parameter used to regulate the weight variation of the diffusion model across different motifs. N denotes the accumulation of all image losses, which is then divided by the total number of images to obtain the average loss per image. During training, the diffusion model adjusts its parameters to diminish the content discrepancy between the generated image and the traditional kite image, ultimately minimizing L.

The composite loss function proposed in this study is presented in Eq. 2:

$$\begin{aligned} L_{Z,h,\epsilon ,\epsilon ^{\prime },t}\left[ w_t\left| \left| \widehat{Z_\theta }(\alpha _tZ+\sigma _t\epsilon ,h)-Z\right| \right| _2^2+\lambda w_{t^{\prime }}\left| \left| \widehat{Z_\theta }(\alpha _{t^{\prime }}Z_{pr}+\sigma _{t^{\prime }}\epsilon ^{\prime },h_{pr})-Z_{pr}\right| \right| _2^2\right] \end{aligned}$$

(2)

The improved loss function (Eq. 2) addresses the limitations of conventional diffusion models in generating kite design proposals with specified auspicious themes and diverse styles and motifs. Eq. 2 combines auspicious themes, styles, motif composition, and prior knowledge as part of the loss function based on Eq. 1. Eq. 2 consists of two main components. he first component measures the content discrepancy between images generated by the trained model and those generated by the pre-trained diffusion model. $\widehat{Z_\theta }$ represents the new diffusion model, which incorporates losses related to auspicious themes, styles, and motif composition. The images generated by this model differ from the images Z created by the pre-trained diffusion model. This difference leads to the loss of the first component. The second component is the prior knowledge loss, which compares the images generated by the new diffusion model (i.e., $\widehat{Z_\theta }(\alpha _{t^{\prime }}Z_{pr}+\sigma _{t^{\prime }}\epsilon ^{\prime },h_{pr})$) with those generated by the pre-trained diffusion model (i.e., $Z_pr$). A smaller content difference between these two indicates better retention of the general foundational knowledge of the original model by the newly trained model. $\lambda w_{t^{\prime }}$ is an automatically learned weight used to adjust the weights of these two parts of the loss function automatically to achieve better generation results. The combination of the first and second parts of the loss function helps the new diffusion model retain the knowledge of the pre-trained model. It also allows the model to learn creative content related to auspicious themes, styles, and motif composition. Therefore, the fine-tuned diffusion model can generate kite design proposals with specified auspicious themes and diverse styles and motifs.The algorithm structure is illustrated in Fig. 9.

During the process of fine-tuning the diffusion model, the trained model is capable of batch-generating kite design proposals, each featuring specified auspicious themes and diverse styles and motifs. This trained model is referred to as the “Chinese Kite Diffusion Model of Intangible Cultural Heritage” (i.e., CKDM-ICH). The CKDM-ICH model has learned the creative content of various auspicious themes, along with their corresponding style variations and diverse motif composition found in traditional kite images. Consequently, this model ensures the production of kite designs that embody specified auspicious themes and exhibit a variety of styles and motifs, thus satisfying the demand for innovative design within kite-making craftsmanship. Not only does the CKDM-ICH model address the fundamental issues of artisans lacking creative content and the arduous nature of traditional methods, but it also tackles the critical challenge of enabling artisans to handle high-level and sustained innovation ability.

The fourth stage involves using the fine-tuned diffusion model (i.e., CKDM-ICH) to generate kite designs. Intangible cultural heritage inheritors or designers (collectively referred to as artisans) can use the CKDM-ICH model on a computer to create and modify kite designs. In the phase of design generation, artisans can input a textual description of the desired kite design, which typically includes auspicious themes, styles, and other relevant details. Based on this input, CKDM-ICH will generate a batch of diverse designs. In the phase of design modification, designers can simply update the textual description to modify or regenerate the kite design. This generative method for creating kite designs can largely replace the artisans’ traditional hand drawing and enable them to produce various kite design proposals quickly.

The fifth stage is the process of selecting kite design proposals. In this step, artisans must choose from the batch of generated kite designs that meet the requirements for shape, style, and creativity. They can then transfer the selected designs onto paper and proceed to create new kites.

The new method proposed in this study replaces traditional hand-painted creation with AI-generated design. Visual comparisons from Fig. 10 reveal that the workflow of the traditional method is more cumbersome compared to ours. In the traditional approach, kite artisans rely solely on the manual process of sketching, drawing lines, and coloring, typically taking 1-2 days to complete the creation of a style and motif. In contrast, our new method can generate a kite design image with a resolution of 512 $*$ 512 in approximately 9.6 seconds on a computer with 24 GB of GPU memory, resulting in approximately 6 kite designs generated per minute. During the phase of design modification, while the traditional method requires repeating the entire hand-painting process, the new method only requires modifying the text description to regenerate the kite design.

Therefore, this new method offers significant advantages in accumulating creative content for kite making, optimizing creation methods, and enhancing artisans’s capabilities for innovative design. By batch-generating kite design proposals with specified auspicious themes and diverse styles and motifs, this method meets the creative needs for revitalizing and innovating kite-making craftsmanship, aligning with the goals of cultural heritage preservation, inheritance, revitalization, and innovation.

Experiment and results

Implementation details

The diffusion model was trained on a computer running Windows 10, equipped with 64GB of RAM and an NVIDIA 4090 graphics card featuring 24GB of memory. The training utilized PyTorch, with each image processing involving 100 iterations. For image preprocessing, input images were resized proportionally to a maximum resolution of 512 pixels on the longer side. Horizontal flipping was used for data augmentation. The model was trained with a learning rate of 0.00001, a batch size of 1, and accelerated computations were achieved using Xformers and FP16. The complete fine-tuning process for the diffusion model took 3.5 hours.

TKSPD-80 dataset

This study focuses on using text-based methods to generate kite design proposals in batches. These designs feature specific auspicious themes and a variety of patterns for practical use. Considering the absence of specialized data on auspicious themes and corresponding patterns in conventional large datasets within the kite design, this study has developed a dataset known as “Traditional Kite Style Patterns Dataset” (i.e., TKSPD-80). The phase of dataset collection involved four approaches: (1) Digitally redrawing kite designs from ancient texts, resizing the original designs that are small, and scanning them into high-resolution digital images; (2) Capturing photographs of traditional kite exhibits at the Chinese Intangible Cultural Heritage Center and creating high-definition digital images; (3) Gathering kite artworks from the personal collections of five kite artisans, professionally photographing them, and producing high-quality digital images; (4) Inviting professional graphic designers to assess image quality, eliminating over 800 low-quality images, and standardizing them with a consistent white background and image dimensions, resulting in 1200 high-quality traditional kite images after electronic preprocessing. Various methods were employed by the research team to ensure diversity and comprehensiveness in dataset construction. The phase of dataset annotation involved the participation of four kite artisans and ten professional graphic designers. With their assistance, researchers organized the 1200 images into 80 categories and manually annotated them based on these categories. Professional information about auspicious themes, styles, and motif composition within the content of the 1200 traditional kite images was manually annotated by the research team. Under the guidance of experts and with the support of the design team, the research team ensured the objectivity and professionalism of dataset annotation. Through the collection and annotation efforts, the TKSPD-80 dataset was created.

Based on the artisans of kite crafting and graphic designers, this study categorized different kite styles into ten classes: “plump swallow style,” “small swallow style,” “baby swallow style,” “slender swallow style,” “lean swallow style,” “double swallow style,” “animal style,” “insect style,” “human style,” and “geometric style”^1,2,3. Each style and design were annotated separately in the image content. Additionally, different auspicious themes of kites were categorized into eight classes: “happiness theme,” “celebration theme,” “longevity theme,” “fortune theme,” “prosperity theme,” “health theme,” “academic theme,” and “evil dispelling theme”^4,6. Annotations were provided for the various motif composition associated with each auspicious theme. Table 1 illustrates the distribution of images annotated under different category labels in this dataset.

From Table 1, researchers can observe that in the TKSPD-80 dataset, when categorized by style, the “insect style” has the highest number of images (46 pictures), while the “double swallow style” has the fewest (12 pictures). Notably, the six categories of “swallow” styles - plump swallow, small swallow, baby swallow, slender swallow, lean swallow, and double swallow - collectively comprise 50% of the total number of images (152 images). These “swallow” styles are crucial creative components in kite making. When sorted by auspicious themes, the “longevity theme” exhibits the highest number of images (236 pictures), whereas the “evil dispelling theme” exhibits the fewest (32 pictures). Notably, the “longevity theme” accounts for 26% of all auspicious theme images. This demonstrates the artisans’ enthusiasm for creating styles and motifs with the theme of longevity, which aligns with the Chinese people’s profound desire for longevity. These patterns resonate with the enduring pursuit of longevity in traditional Chinese ideology^1,2. The TKSPD-80 dataset comprises a total of 1200 images. Figure 11 showcases a selection of samples from TKSPD-80.

Table 1 The distribution of annotated images across different categories in the TKSPD-80 dataset.

Full size table

Evaluation metrics

The evaluation of the generative design for kites includes both subjective and objective aspects. Traditional objective evaluation typically uses computer technology to assess image clarity and compositional coherence^52,53. However, this study focuses on evaluating the consistency and rationality of the image content concerning auspicious themes and patterns, as well as their revitalization and innovation. This requires subjective assessment by experts. Therefore, the study does not use conventional objective evaluation methods, facing significant challenges in evaluating the content of generative designs. Traditional automated image evaluations are not effective for assessing the content of these designs.

To address this challenge, we invited six experienced kite artisans and six graphic designers to collaborate with the research team. Together, they developed six evaluation criteria to assess the consistency and reasonableness of the generated images. These criteria include “style,” “auspicious theme,” “motif composition,” “element layout,” “color matching,” and “design details.” Among these, “style” and “auspicious theme” were identified as key criteria. In professional evaluations, “style” indicates that the image content should fall within the reasonable scope of traditional kite styles to verify the basic usability of the generated kite designs. The “auspicious theme” criterion requires evaluators to confirm that the image content belongs to the category of the auspicious theme. Other evaluation criteria include the following aspects: “motif composition”: Assess whether the image content matches the pattern described in the text. “element layout”: Evaluate the symmetry, hierarchy, and size of the elements in the image content. “color matching”: Determine whether the colors used and their combinations are reasonable. “design details”: Conduct a comprehensive assessment to ensure there are no errors in the image content. The purpose of these six criteria is to verify the applicability of the CKDM-ICH model, with evaluation work including both visual and quantitative assessments.

Meanwhile, the research team also devised three evaluation criteria targeting the vitality and innovativeness of the generated design images: “inheritance,” “novelty,” and “attractiveness.” In the professional assessment, “inheritance” reflects the evaluator’s judgment. It indicates whether the image content effectively represents auspicious themes and related styles and motifs inspired by mythological beliefs. “Novelty” requires evaluators to scrutinize whether the image content introduces fresh styles and motif composition. “Attractiveness” demands evaluators to observe if the image content can captivate the viewer’s interest. These three criteria serve the purpose of validating the efficacy of the CKDM-ICH model, with evaluation focusing on vitality and innovation.

The nine evaluation criteria provide a comprehensive assessment of the quality of the generated design images. They highlight the practical value of the CKDM-ICH model in terms of applicability and effectiveness. This framework allows the research team to evaluate and refine kite design proposals more accurately, ensuring the designs meet the expected visual impact.

Visual evaluation

This study compared the visual effects of generated design images using several mainstream diffusion models with the CKDM-ICH model proposed in this research. The selected mainstream models include DALL.E3, Midjourney, and Stable Diffusion, which have broad and active user bases. Two versions of the model were trained in this study: one version utilized the Contrastive Language-Image Pretraining method (i.e., CLIP)⁵⁴ for automatic annotation of image content, referred to as the Testing Model; the other version is the method we proposed, which involved manual annotation by experts and designers, i.e., CKDM-ICH. Using these five models, we generated 25 kite designs and visually compared them with the CKDM-ICH proposed in this research. The generated image content is illustrated in Fig. 12. By comparing the visual effects of these images, researchers could assess the visual differences among these five models in generating kite designs. Such comparisons aid researchers in understanding the applicability of diffusion models in generating kite designs and identifying potential areas for improvement.

Through the analysis of Fig. 12, the researchers summarized the key findings regarding the generative capabilities of the five models. First, DALL.E3 and Midjourney, both LLMs for commercial use, excelled in element layout, color coordination, and design details, producing images that are visually stunning and intricate. However, these models have certain limitations. Neither supports fine-tuning, indicating they cannot undergo additional training for specific style categories or auspicious themes relevant to this study. Consequently, DALL.E3 and Midjourney are unable to generate kites with specified styles and themes. Next, the basic Stable Diffusion model generated images that contained notable errors in style and auspicious themes. Regarding the details, there were severe issues with asymmetry in the patterns and disorganization in the element layout, thus preventing the model from meeting the expectations for specified content generation. This limitation hindered its applicability. Subsequently, this study fine-tuned two diffusion models: the Testing Model and CKDM-ICH. The Testing Model employed the CLIP method for automated labeling of images in the dataset. However, the experiments revealed that automated labeling led to many suboptimal label terms and resulted in unsatisfactory outcomes. The images generated exhibited random deformations in the specified styles and performed poorly in generating the intended auspicious themes, and the results fell short of expectations. Therefore, this study ultimately introduced the CKDM-ICH method, which used manual labeling by experts to significantly enhance the training outcomes. In comparison with other models, the CKDM-ICH method outperformed the others across all specific evaluation metrics.

Table 2 further outlines the advantages and disadvantages of all the methods compared. The table reveals that the method proposed in this study outperforms all other tested methods. DALL.E3, Midjourney, Stable Diffusion, and Testing Model are found to be unsuitable for generative kite design.

Table 2 Comparison of design images generated by different diffusion models.

Full size table

Quantitative evaluation

In this study, DALL.E3, Midjourney, Stable Diffusion, Testing Model, and CKDM-ICH proposed in this study were employed for generating kite designs in batches. Each model generated 100 images per category, covering a total of 80 different categories, including 10 styles and 8 auspicious themes in motif composition. Together, these five models generated a collective total of 500 images. To assess the consistency and rationality of these images concerning auspicious themes and corresponding patterns, we enlisted the evaluation of 8 kite artisans. The evaluation criteria encompassed “styles,” “auspicious themes,” “motif composition,” “element layout,” “color matching,” and “design details.” The scoring uses a comparative method⁵⁵. Studies indicate that compared to assessing individual images, using pairs of images for comparison yields greater accuracy, ensuring the evaluation’s objectivity and fairness. In this process, two images are randomly chosen from the generated design images and evaluated by 8 kite artisans regarding their image content. Details of the subjective evaluation from the questionnaire are provided in Fig. 13. Initially, each criterion of every model begins with a score of 500 points. Subsequently, artisans compare the two design images generated by each model across 6 evaluation criteria and assign scores. If an artisan deems one image superior to the other in a specific evaluation criterion, the corresponding model of the superior image gains one point in that criterion; otherwise, one point is deducted. This scoring mechanism adopts a simple ELO model⁵⁶, dynamically adjusting the scores of each model’s criteria by adding or subtracting points based on the superior image’s corresponding model. This reflects the evaluation of generative design images by 8 kite artisans. The ELO mechanism is a commonly used scoring algorithm that effectively portrays the relative performance of different models in the evaluation. As the evaluation progresses, it swiftly adjusts each model’s scores, enhancing the accuracy and reliability of the assessment results⁵⁶. Through this scoring process, the research team obtained quantitative scores for each model across various evaluation criteria.

The final scores of different diffusion models are depicted in Fig. 14. It is evident from the figure that there are significant differences among the five models in the quantitative evaluation of generative design. CKDM-ICH outperforms DALL.E3, Midjourney, Stable Diffusion, and Testing Model across all evaluation metrics. Compared to the second-ranked Midjourney model, CKDM-ICH demonstrates significant advantages in key metrics such as “styles” and “auspicious themes,” scoring 411 and 408 points, respectively. Moreover, the CKDM-ICH model achieves high scores of 879, 792, 843, and 839 in the metrics of “motif composition,” “element layout,” “color matching,” and “design details,” respectively, proving its capability to generate kite designs with both reasonable auspicious themes and applicable patterns.

This study finds CKDM-ICH suitable for generative kite design. In the key metrics of “styles” and “auspicious themes,” the CKDM-ICH method scores 983 and 895 points, respectively. The Testing Model scores 496 and 417 points lower than CKDM-ICH in the two key metrics of “styles” and “auspicious themes.” Meanwhile, the other three models, DALL.E3, Midjourney, and Stable Diffusion, are deemed unsuitable for generative design, especially with scores of only 258, 572, and 200 points, respectively, in the metrics of “styles.”

Evaluation of revitalization and innovation

For the evaluation of revitalization and innovation, this study utilized CKDM-ICH to generate kite designs in batches featuring the same auspicious theme across five style categories: Baby swallow, slender swallow, animal, insect, and geometric, resulting in 25 design images. Simultaneously, researchers selected 25 photos of traditional kite objects. These images were extracted to match the same five style categories and auspicious themes. Eight professional kite artisans were once again invited for evaluation to assess the revitalization and innovation of these image contents, using metrics including “inheritance,” “novelty,” and “attractiveness.” The images are depicted in Fig. 15. By comparing the visual effect of the design images with those of the photos, researchers could effectively measure the efficacy of CKDM-ICH.

Once again, the comparative method was used for scoring⁵⁵, employing the same questionnaire with subjective evaluation as depicted in Fig. 13 and employing the ELO model⁵⁶ as the scoring mechanism. The scores of the generated design images and those of real kites are presented in Table 3. Table 3 clearly shows that the CKDM-ICH-generated design images excel in “novelty” and “attractiveness” metrics. Compared to the scores of the actual kite images, they surpass them by 427 and 457 points, respectively. There is no significant difference between the design images and the images of actual kites in terms of the “inheritance” metrics, with the former only scoring 38 points lower than the latter. This demonstrates that CKDM-ICH possesses the potential for the revitalization and innovation of kite-making craftsmanship.

Table 3 Quantitative evaluation of images generated by CKDM-ICH and Photos of actual kites.

Full size table

Details of the generative design of kites

Figure 16 presents the kite design of prosperity theme with plump swallow style generated by CKDM-ICH trained by our research team. From the images, it’s evident that this design embodies the auspicious theme specified in the text, with the style and motif composition meeting the innovative demands of kite crafting. This underscores the capability of our trained CKDM-ICH model to generate kite design proposals. These designs feature specified auspicious themes and exhibit diverse styles and motifs.

Researchers carefully analyzed the generated design. They found that the longevity theme in the plump swallow style closely resembled traditional paper kites and showed no noticeable flaws in style. The plume swallow style effectively conveys the theme of prosperity. The generated image content incorporates motif composition aligned with the auspicious theme. For instance, the kite’s head features decorative patterns reminiscent of Chinese peony leaves, while the expansive wings showcase composition primarily based on Chinese peony flowers, complemented by some cat and butterfly motifs. The continuous motif that resembles the Chinese fret pattern and small peony flower motifs on the waist, along with the distinct cat motifs on the wings and tail, further highlight the CKDM-ICH model’s exceptional ability in detailed design. The model adeptly manages element layout and color matching, effectively presenting the interplay of design details. However, there is still room for improvement in certain aspects of the CKDM-ICH model. For example, achieving more precise control over the specific size of traditional Chinese peony flower motifs in the element layout could be beneficial.

Although the designs have some detailed requirements, those generated by the CKDM-ICH diffusion model are highly practical and effective. Consequently, the model improves efficiency in creating and selecting design proposals for kite artisans. This outcome provides a potent tool and method for kite creation, which has the potential to drive innovation and progress.

Discussion

This study offers comprehensive qualitative and quantitative evidence to assess the applicability and effectiveness of the proposed method. Researchers conducted intuitive comparisons to evaluate the fine-tuned diffusion model’s performance. The results highlighted its ability to generate high-quality kite designs with specified auspicious themes and diverse styles. It outperformed other mainstream models in this regard. Quantitatively, researchers evaluated multiple key metrics, confirming the superiority of this method over others. For instance, in assessing model applicability across six evaluation criteria, the CKDM-ICH model, particularly in “styles” and “auspicious themes,” demonstrated significant advantages, scoring 411 and 408 points higher, respectively, than the second-ranked Midjourney. Moreover, it achieved high scores of 879, 792, 843, and 839 in “motif composition,” “element layout,” “color matching,” and “design details,” thus affirming its ability to generate kite designs with specified auspicious themes and applicable patterns. Regarding model effectiveness, it scored high in “inheritance,” “novelty,” and “attractiveness,” obtaining scores of 481, 783, and 697, respectively, further substantiating its capability to effectively revitalize and innovate kite-making craftsmanship.

This study introduces CKDM-ICH, replacing traditional methods of kite creation with a simpler design approach. This method avoids laborious manual workflows such as hand-drawn sketches, line drawings, and color rendering. CKDM-ICH proposed in this study demonstrates a clear advantage in efficiency for generating and modifying kite designs. In contrast, creating a design image using traditional methods typically takes 1-2 days, whereas the CKDM-ICH can generate approximately 6 design images per minute, indicating a significant improvement in efficiency. Additionally, kite artisans can use CKDM-ICH to batch-generate kite design proposals with specified auspicious themes and diverse styles and motifs. Through this new method, artisans can easily accumulate kite design content and expedite the kite-making process. In summary, the method proposed in this study represents an innovative approach to kite design.

Since the images generated in this study involve cultural connotations, it is recommended that experts in the field carry out the data annotation. Furthermore, after the annotation is completed, experts should verify the accuracy of the annotations. Finally, once the model training is finished, experts can test whether the model-generated images align with the expected results. This approach helps prevent the misappropriation of traditional customs due to annotation errors.

This study also encounters several limitations. Specifically, they are reflected in aspects such as evaluation metrics, interface development, intellectual property, control over specific design features, and issues related to specific constraints. Firstly, assessing the image content’s quality of generative design presents challenges, as establishing comprehensive quantitative evaluation metrics requires consideration from various angles. Although the research team has devised some quantitative evaluation metrics, further developing the dimensions of evaluation is necessary to achieve a more quantified assessment of subjective perceptions. Secondly, the current user interface of CKDM-ICH is program-specific. However, consideration should be given to kite artisans’ operational habits for further refinement and optimization of the user interface. Furthermore concerning intellectual property, since the dataset collection and organization are independently conducted by the team, this study does not address this issue. Nonetheless, other researchers employing this method may encounter intellectual property concerns.Fourth, when generating images based on text descriptions using conventional diffusion models, controlling the position and size of the image is difficult, which results in suboptimal control over specific design features. Future work could explore the integration of a control network. Studies have shown that outlining regions with a brush to set spatial constraints can effectively control the size and position of patterns within the outlined area, thereby offering a viable approach for controlling specific design features. The model trained in this study is subject to certain constraints. The dataset used for training was sourced from kite image materials that were collected and organized by the researchers, and it covers 80 categories (10 styles and 8 themes). This means that, for optimal results, users should only use the prompts defined in this study to generate kites with the specified combinations. For themes not included in these 80 categories, the model’s generalization ability is limited, and the generated results may not exhibit quality that is as high. Furthermore, as a text-to-image generation model, it creates visual content based on text descriptions, but precise control over the position and size of the generated objects during this process remains challenging. Finally, the model was trained at a resolution of 512 $*$ 512 pixels. Therefore, generating images at different resolutions may lead to a decline in output quality. Despite these limitations, the CKDM-ICH still holds significant potential for application and demonstrates excellent performance in innovative design aspects with diverse styles and motifs.

Conclusion

This study employs AI and diffusion models to learn the creative content of kite making, aiming to invigorate and innovate in this craftsmanship. To reduce reliance on hand-drawn sketches in traditional kite-making, we use the Traditional Kite Style Patterns Dataset. This dataset is built from ancient kite images and real kites. This approach successfully addresses the challenge of limited training data. Based on this foundation, the study introduces a novel loss function that considers auspicious themes, styles, and motif composition for fine-tuning diffusion models. This model can generate diverse kite design proposals through training. It uses input text descriptions to create designs with specified auspicious themes, styles, and motifs. These designs are easily modifiable. Experimental results indicate that the proposed method of generative design supplants artisans’ traditional hand-painted creation, optimizing traditional creation methods.

This method of innovative design leverages AI to introduce new applications in kite creation, thus achieving remarkable results in both innovation and revitalization. In terms of innovation, it quantitatively analyzes various style elements and generates diverse combinations. Based on specific algorithms, this method enables the new kite designs to merge traditional folk and modern artistic styles, combining rustic charm, auspicious symbolism, and contemporary minimalist fashion. In terms of revitalization, the novel and diverse kite designs play a crucial role in enhancing public attention to and appreciation of traditional culture. Moreover, this method can reduce costs and time required, improve efficiency, lower learning barriers, and facilitate the inheritance of endangered kite-making craftsmanship. It also offers broad promotional potential and holds significant value for interdisciplinary research.

Future research could explore the following directions: Firstly, Future research could focus on enriching the dataset of kite knowledge through multimodal data. While preserving the format of existing data, it would be beneficial to actively incorporate audio explanations. Professional and engaging audio can offer detailed insights into kite designs, flying techniques, and cultural significance, thereby enabling AI to grasp multidimensional knowledge. The integration of multimodal data can result in a more comprehensive dataset and provide robust data support for kite design. Additionally, the potential for application in generative kite design is vast. In the future, its core technology could be extended to the creation of other cultural heritage images, such as props for Chinese shadow puppetry or woodblock New Year prints. By developing specialized datasets, the key technologies of this method can be flexibly adapted and bring new life into the image creation of intangible cultural heritage. Next, efforts could focus on refining evaluation methods for image content, utilizing deep learning techniques and computer vision to construct more precise evaluation models, enabling a comprehensive and objective assessment of the image content’s quality. Lastly, optimizing automated design by incorporating intelligent algorithms and tools could enhance design efficiency and content quality, thus accelerating the revitalization and innovation of kite-making craftsmanship. These avenues would deepen understanding of generative design, drive interdisciplinary research and practice, and offer additional possibilities and opportunities for the revitalization and innovation of intangible cultural heritage.

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request. The datasets generated during the current study are available from the corresponding author on reasonable request.

References

Dao-xi, G. An analysis of the folk culture born by the kite. J. Weifang Univ. 11(2), 127–129 (2011).
MATH Google Scholar
Chen, Q. The formation, development and derivation of bird beliefs in china. J. East China Normal Univ. 5, 19–27121 (2003).
MATH Google Scholar
Gai, X. & Xie, J. Analysis on Tianjin flavor in adornment of kite Wei. In Soft Computing in Information Communication Technology. Advances in Intelligent and Soft Computing (ed. Luo, J.) 161. https://doi.org/10.1007/978-3-642-29452-5_68 (2012).
An, L. The revival of traditional crafts from the perspective of historical functionalism-the case of Weifang kites. Ethnic Art 4, 62–68 (2018).
MATH Google Scholar
Huang, Y., Zhao, X., Li, J., Yin, F. & Wang, L. Research on the influencing factors of kite culture inheritance based on an adversarial interpretive structure modeling method. IEEE Access 9, 42140–42150. https://doi.org/10.1109/ACCESS.2021.3065711 (2021).
Article Google Scholar
An, L. & Zhang, Y. Discourse and institution: The protection and inheritance of Weifang kites. Ethnic Arts 2, 89–95. https://doi.org/10.16564/j.cnki.1003-2568.2020.02.009 (2020).
Article MATH Google Scholar
Alivizatou-Barakou, M. et al. Intangible cultural heritage and new technologies: Challenges and opportunities for cultural preservation and development. Mixed Reality and Gamification for Cultural Heritage 129–158. https://doi.org/10.1007/978-3-319-49607-8_5 (2017).
Liang, J. & Sharul Azim, S. The application of participatory design in Weifang Kites, China. J. Educ. Educ. Res. 8(1), 38–41 (2024).
Article Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M. & Aberman, K. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. https://arxiv.org/abs/2208.12242 (2023).
Zhang, C., Zhang, C., Zhang, M. & Kweon, I. S. Text-to-image Diffusion Models in Generative AI: A Survey. https://arxiv.org/abs/2303.07909 (2023).
Chen, J., Shao, Z., Zheng, X., Zhang, K. & Yin, J. Integrating aesthetics and efficiency: Ai-driven diffusion models for visually pleasing interior design generation. Sci. Rep. 14(1), 3496. https://doi.org/10.1038/s41598-024-53318-3 (2024).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Chen, J. et al. Creative interior design matching the indoor structure generated through diffusion model with an improved control network. Front. Architect. Res.[SPACE]https://doi.org/10.1016/j.foar.2024.08.003 (2024).
Article Google Scholar
Xu, X., Wang, Z., Zhang, E., Wang, K. & Humphrey, S. Versatile diffusion: Text, images and variations all in one diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision 7754–7765. https://doi.org/10.48550/arXiv.2211.08332 (2023).
Chen, J. et al. Using artificial intelligence to generate master-quality architectural designs from text descriptions. Buildings 13(9), 2285. https://doi.org/10.3390/buildings13092285 (2023).
Article MATH Google Scholar
Shao, Z. et al. A new approach to interior design: Generating creative interior design videos of various design styles from indoor texture-free 3d models. Buildings 14(6), 1528. https://doi.org/10.3390/buildings14061528 (2024).
Article MATH Google Scholar
Yang, L. et al. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv. 56(4), 1–39. https://doi.org/10.48550/arXiv.2209.00796 (2023).
Article MATH Google Scholar
Chen, J., Shao, Z. & Hu, B. Generating interior design from text: A new diffusion model-based method for efficient creative design. Buildings 13(7), 1861. https://doi.org/10.3390/buildings13071861 (2023).
Article MATH Google Scholar
Saharia, C. et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural Inf. Process. Syst. 35, 36479–36494. https://doi.org/10.48550/arXiv.2205.11487 (2022).
Article MATH Google Scholar
Kim, G., Kang, D. U., Seo, H., Kim, H. & Chun, S. Y. Detailed Human-Centric Text Description-Driven Large Scene Synthesis. https://arxiv.org/abs/2311.18654 (2023).
Liu, R., Pang, W., Chen, J., Balakrishnan, V. A. & Chin, H. L. The application of scaffolding instruction and ai-driven diffusion models in children’s aesthetic education: A case study on teaching traditional Chinese painting of the twenty-four solar terms in Chinese culture. Educ. Inf. Technol.[SPACE]https://doi.org/10.1007/s10639-024-13135-7 (2024).
Article Google Scholar
Huang, L. & Zheng, P. Human-computer collaborative visual design creation assisted by artificial intelligence. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22(9), 1–21. https://doi.org/10.1145/35547353 (2023).
Article MATH CAS Google Scholar
Vodrahalli, K., Li, K. & Malik, J. Are All Training Examples Created Equal? An Empirical Study. https://arxiv.org/abs/1811.12569 (2018).
Grammalidis, N. et al. The i-treasures intangible cultural heritage dataset. In Proceedings of the 3rd International Symposium on Movement and Computing 1–8. https://doi.org/10.1145/2948910.2948944 (2016).
Hou, Y., Kenderdine, S., Picca, D., Egloff, M. & Adamou, A. Digitizing intangible cultural heritage embodied: State of the art. J. Comput. Cult. Herit. 15(3), 837. https://doi.org/10.1145/3494837 (2022).
Article Google Scholar
Ruggles, D. F. & Silverman, H. From tangible to intangible heritage. In Intangible Heritage Embodied 1–14 (Springer, 2009). https://doi.org/10.1007/978-1-4419-0072-2_1
Wenji, Z., Cui, R. & Li, N. The innovative practice of artificial intelligence in the inheritance of Chinese Xiangjin Art. Sci. Programm.[SPACE]https://doi.org/10.1155/2022/6557374 (2022).
Article MATH Google Scholar
Zhang, Y. & He, D. Inheritance and innovation of traditional handicraft skills based on artificial intelligence. Trans. Comput. Sci. Intell. Syst. Res. 2, 163–169. https://doi.org/10.62051/rc69jc38 (2023).
Article MATH Google Scholar
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Liu, X. et al. More Control for Free! Image Synthesis with Semantic Diffusion Guidance. https://arxiv.org/abs/2112.05744 (2022).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10674–10685. https://doi.org/10.1109/CVPR52688.2022.01042 (2022).
Ai, O. DALL.E 3. Version 4.0. https://openai.com/dall-e-3 (2023).
Borji, A. Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2. https://arxiv.org/abs/2210.00586 (2023).
Le, T. V. et al. Anti-DreamBooth: Protecting users from personalized text-to-image synthesis. https://arxiv.org/abs/2303.15433 (2023).
Song, J., Meng, C. & Ermon, S. Denoising Diffusion Implicit Models. https://arxiv.org/abs/2010.02502 (2022).
Jolicoeur-Martineau, A., Piché-Taillefer, R., Combes, R. T. & Mitliagkas, I. Adversarial score matching and improved sampling for image generation. https://arxiv.org/abs/2009.05475 (2020).
Dhariwal, P. & Nichol, A. Diffusion Models Beat GANs on Image Synthesis. https://arxiv.org/abs/2105.05233 (2021).
Nichol, A. & Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. https://arxiv.org/abs/2102.09672 (2021).
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning 2256–2265 (PMLR, 2015). https://doi.org/10.48550/arXiv.1503.03585.
Alhazmi, K., Alsumari, W., Seppo, I., Podkuiko, L. & Simon, M. Effects of annotation quality on model performance. 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) 063–067. https://doi.org/10.1109/ICAIIC51459.2021.9415271 (2021).
Gonzalez, S. & Miikkulainen, R. Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization. https://arxiv.org/abs/1905.11528 (2020).
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. https://arxiv.org/abs/2204.06125 (2022).
Gal, R. et al. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. https://arxiv.org/abs/2208.01618 (2022).
Oswald, J., Henning, C., Grewe, B. F. & Sacramento, J. Continual learning with hypernetworks. https://arxiv.org/abs/1906.00695 (2022).
Hu, E. J. et al. LoRA: Low-Rank Adaptation of Large Language Models. https://arxiv.org/abs/2106.09685 (2021).
Lee, J., Cho, K. & Kiela, D. Countering Language Drift via Visual Grounding. https://arxiv.org/abs/1909.04499 (2019).
Lu, Y., Singhal, S., Strub, F., Pietquin, O. & Courville, A. Countering Language Drift with Seeded Iterated Learning. https://arxiv.org/abs/2003.12694 (2020).
Saqlain, A. S., Fang, F., Ahmad, T., Wang, L. & Abidin, Z.-U. Evolution and effectiveness of loss functions in generative adversarial networks. China Commun. 18(10), 45–76. https://doi.org/10.23919/JCC.2021.10.004 (2021).
Article Google Scholar
Chen, J., Shao, Z., Cen, C. & Li, J. Hynet: A novel hybrid deep learning approach for efficient interior design texture retrieval. Multimed. Tools Appl.[SPACE]https://doi.org/10.1007/s11042-023-16579-0 (2023).
Article PubMed PubMed Central MATH Google Scholar
Kuznetsova, A. et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vis. 128(7), 1956–1981. https://doi.org/10.1007/s11263-020-01316-z (2020).
Article Google Scholar
Krishna, R. et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https://arxiv.org/abs/1602.07332 (2016).
Sharma, P., Ding, N., Goodman, S., Soricut, R.: Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) vol. 1, 2556–2565 (Association for Computational Linguistics, 2018). https://doi.org/10.18653/v1/P18-1238.
Zhou, W., Wang, Z. & Chen, Z. Image Super-Resolution Quality Assessment: Structural Fidelity Versus Statistical Naturalness. https://arxiv.org/abs/2105.07139 (2021).
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612. https://doi.org/10.1109/TIP.2003.819861 (2004).
Article ADS PubMed MATH Google Scholar
Ai, O. CLIP. Version 4.0. https://openai.com/index/clip/ (2021).
Mantiuk, R. K., Tomaszewska, A. & Mantiuk, R. Comparison of four subjective methods for image quality assessment. Comput. Graph. Forum 31, 2478–2491. https://doi.org/10.1111/j.1467-8659.2012.03188.x (2012).
Article MATH Google Scholar
Pinho Zanco, D. G., Szczecinski, L., Kuhn, E. V. & Seara, R. Stochastic analysis of the elo rating algorithm in round-robin tournaments. Digit. Signal Process. 145, 104313. https://doi.org/10.1016/j.dsp.2023.104313 (2024).
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Technical College for the Deaf, Tianjin University of Technology, Tianjin, 300384, China
Yaqin Zhou
School of Art and Design, Tianjin University of Technology, Tianjin, 300384, China
Yu Liu & Yuxin Shao
School of Art and Design, Guangzhou University, Guangzhou, 510006, China
Junming Chen
Faculty of Humanities and Arts, Macau University of Science and Technology, Taipa, 999078, China
Junming Chen

Authors

Yaqin Zhou
View author publications
Search author on:PubMed Google Scholar
Yu Liu
View author publications
Search author on:PubMed Google Scholar
Yuxin Shao
View author publications
Search author on:PubMed Google Scholar
Junming Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.Z. designed the study, undertook experimental work and wrote the draft manuscript. Y.L. worked on designing the study, data collection and drafting the manuscript. Y.S. worked in collecting and verifying the datasets, and creating all the kite design images. J.C. worked on designing the study. All authors reviewed the manuscript.

Corresponding author

Correspondence to Junming Chen.

Ethics declarations

Conflict of interests

The authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, Y., Liu, Y., Shao, Y. et al. Fine-tuning diffusion model to generate new kite designs for the revitalization and innovation of intangible cultural heritage. Sci Rep 15, 7519 (2025). https://doi.org/10.1038/s41598-025-92225-z

Download citation

Received: 26 July 2024
Accepted: 26 February 2025
Published: 04 March 2025
DOI: https://doi.org/10.1038/s41598-025-92225-z

Subjects

Abstract

Similar content being viewed by others

The application of artificial intelligence-assisted technology in cultural and creative product design

Database construction and remodeling method on traditional Yi nationality patterns of China with GAN model

Development and evaluation of an augmented reality serious game to enhance 21st century skills in cultural tourism

Introduction

Background and motivation

Problem statement and objectives

Methodology overview

Main contribution

Related work

Challenges in revitalization and innovation of intangible cultural heritage

Diffusion model fine-tuning

Datasets

Material and methods

Experiment and results

Implementation details

TKSPD-80 dataset

Evaluation metrics

Visual evaluation

Quantitative evaluation

Evaluation of revitalization and innovation

Details of the generative design of kites

Discussion

Conclusion

Availability of data and materials

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links